phi-3-mini: The 3.8B Powerhouse Reshaping LLM Performance on Your Phone | HackerNoon
Briefly

Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens. It exhibits performance comparable to larger rivals like Mixtral 8x7B and GPT-3.5, achieving 69% on MMLU and 8.38 on MT-bench. Its capabilities are enhanced through a unique dataset consisting of filtered public web data. To further this technology, phi-3-small and phi-3-medium models, containing 7B and 14B parameters respectively, achieve even better performance on benchmarks. Additionally, phi-3-vision model demonstrates substantial reasoning skills with textual and visual inputs.
Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens, demonstrating competitive performance to models like Mixtral 8x7B and GPT-3.5.
The innovations in phi-3-mini stem from a newly scaled-up training dataset composed of heavily filtered publicly available web data and synthetic data.
Initial parameter-scaling results show that phi-3-small and phi-3-medium models show significant improvement over phi-3-mini, achieving 75% and 78% on MMLU, respectively.
Phi-3-vision, based on phi-3-mini, features strong reasoning capabilities and operates well with both image and text prompts.
Read at Hackernoon
[
|
]