Meta Unveils Llama 3.2: A Leap Forward in Edge AI and Vision Capabilities

Meta has announced the release of Llama 3.2, expanding its family of large language models with new capabilities in vision and edge computing. The announcement, made on September 25, introduces models that promise much for AI applications on mobile and edge devices.

Llama 3.2 includes two categories of models:

1. Vision Language Models (VLMs): 11B and 90B parameter models capable of image understanding and reasoning.

2. Lightweight Text Models: 1B and 3B parameter models optimised for edge and mobile devices.

The new vision models are designed to be drop-in replacements for their text-only counterparts, offering capabilities such as document-level understanding of charts and graphs, image captioning, and visual grounding tasks. Meta claims these models are competitive with leading closed-source models like Claude 3 Haiku on image recognition and visual understanding tasks.

A key feature of the lightweight models is their ability to support a context length of 128K tokens, making them state-of-the-art for on-device use cases such as summarisation, instruction following, and rewriting tasks. These models are optimised for Arm processors and are enabled for Qualcomm and MediaTek hardware from day one.

The company is making Llama 3.2 models available for download on llama.com and Hugging Face, as well as for immediate development on a broad ecosystem of partner platforms. This release is supported by over 25 companies, including major tech players like AMD, AWS, Google Cloud, Microsoft Azure, and NVIDIA.

In addition to the new models, Meta is introducing the Llama Stack, a set of standardised interfaces and tools designed to simplify working with Llama models across various environments. This includes a command-line interface, client code in multiple programming languages, and Docker containers for distribution.

Meta has also addressed safety concerns by releasing Llama Guard 3 11B Vision, designed to filter text and image input prompts or text output responses, and Llama Guard 3 1B, a significantly reduced-size model for efficient deployment in constrained environments.

The company continues to emphasise its commitment to open-source AI development. "We continue to share our work because we believe openness drives innovation and is good for developers, Meta, and the world," the announcement states.

Sign up for AI-360

Sign up for AI-360