Meta has unveiled the next generation of its open-source large language model, Meta Llama 3. This release features pretrained and instruction-fine-tuned language models with 8B and 70B parameters, capable of supporting a wide range of use cases.
Llama 3 models will soon be accessible on various platforms, including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, hardware platforms from AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm will provide support for Llama 3. This widespread availability underscores Meta's reach, enabling developers to leverage Llama 3 across many diverse applications.
The release includes updated trust and safety tools, such as Llama Guard 2, Code Shield, and CyberSec Eval 2. These tools aim to mitigate potential risks associated with large language models, such as generating insecure code or producing problematic responses.
Meta plans to introduce additional capabilities to Llama 3, including multimodality, multilingual conversation and longer context windows. The company will also release the Llama 3 research paper, providing insights into the development process and the model's underlying architecture.
Llama 3 technology has been integrated into Meta AI, which users can access through Facebook, Instagram, WhatsApp, Messenger, and the web. This puts AI at the fingertips of an incredible amount of people.
Meta's design philosophy for Llama 3 focused on four key ingredients: model architecture, pre training data, scaling up pretraining, and instruction fine-tuning.
The pre training data for Llama 3 is seven times larger than that used for Llama 2, with over 15T tokens collected from publicly available sources. Meta developed sophisticated data-filtering pipelines, incorporating techniques like heuristic filters, NSFW filters, semantic deduplication, and text classifiers.
Scaling up pre training involved the development of detailed scaling laws for downstream benchmark evaluations, enabling optimal data mix selection and informed decisions on training compute allocation. Meta's most efficient implementation achieved a compute utilisation of over 400 TFLOPS per GPU when trained on 16K GPUs simultaneously, resulting in a three-fold increase in training efficiency compared to Llama 2.
Instruction fine-tuning played a crucial role in unlocking the potential of Llama 3 in chat use cases. The combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct preference optimisation (DPO) greatly improved the model's performance on reasoning and coding tasks.