
The viral success of OpenAI’s conversational chatbot, the generative AI landscape has erupted into a full-blown arms race.
Once a relatively quiet domain, the AI field mostly generated buzz only within its own academic and technical circles. That changed dramatically when OpenAI introduced ChatGPT, transforming it into a mainstream internet sensation overnight.
In response, Meta launched its own contender, LLaMA, followed by an improved iteration, LLaMA 2 and later, a worthy successor, the LLaMA 3.
This time, Meta has unveiled Llama 4, the latest addition to its family of AI models, which now starts powering the Meta AI assistant across platforms like the web, WhatsApp, Messenger, and Instagram.
The release includes two new models: LLaMA 4 Scout, a compact model designed to "fit in a single Nvidia H100 GPU", and LLaMA 4 Maverick, a more powerful model positioned as a competitor to GPT-4o and Gemini 2.0 Flash.
Both models are available for download via Meta or Hugging Face.
Today is the start of a new era of natively multimodal AI innovation.
Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality.
Llama 4 Scout
• 17B-active-parameter model… pic.twitter.com/Z8P3h0MA1P— AI at Meta (@AIatMeta) April 5, 2025
According to Meta, LLaMA 4 Scout features a 10-million-token context window—an indicator of the model's working memory—and surpasses Google's Gemma 3, Gemini 2.0 Flash-Lite, and the open-source Mistral 3.1 "across a broad range of widely reported benchmarks," while still "fitting in a single Nvidia H100 GPU."
Meta also highlights Maverick’s competitive edge over GPT-4o and Gemini 2.0 Flash, noting it performs comparably to DeepSeek-V3 in coding and reasoning tasks using "less than half the active parameters."
As for the LLaMA 4 Behemoth, the model boasts 288 billion active parameters out of a total 2 trillion. Though still in training, Meta claims Behemoth can outperform GPT-4.5 and Claude Sonnet 3.7 "on several STEM benchmarks."
The LLaMA 4 series adopts a mixture of experts (MoE) architecture, which improves efficiency by activating only the parts of the model necessary for each task. Meta is expected to reveal more about its AI roadmap during the LlamaCon conference on April 29th.
While Meta brands the LLaMA 4 collection as "open-source," it continues to face criticism over its license limitations.
Notably, the license mandates that any commercial user with over 700 million monthly active users must seek Meta’s approval—an approach that, as the Open Source Initiative pointed out in 2023, "takes it ‘out of the category of ‘Open Source.’”
Llama 4 Scout and Llama 4 Maverick’s industry-leading performance is in large part thanks to distillation from Llama 4 Behemoth, our most powerful model yet.
Be on the lookout for more details on Llama 4 Behemoth at a future date! pic.twitter.com/A6SrhOKWiq— AI at Meta (@AIatMeta) April 5, 2025
Meta hasn’t shied away from bold claims, asserting that its latest LLaMA 4 models "outperform" the competition.
According to Meta, LLaMA 4 introduces two major architectural advancements: early-fusion multimodality and a sparse Mixture of Experts (MoE) design.
With early fusion, the model treats text, images, and video frames as a single sequence of tokens from the start.
This enables seamless understanding and generation across multiple media types—perfect for tasks like summarizing documents with visuals or analyzing video content with transcripts.
Meanwhile, the sparse MoE architecture boosts efficiency by activating only a few expert sub-models per task.
This allows LLaMA 4 to scale performance without overwhelming compute resources, making it ideal for enterprise use where speed, scale, and cost matter.
We can’t wait to see the rich experiences people build in the new Llama ecosystem!
Even more details on the Llama 4 herd in the model card https://t.co/iLGc3Bmi6z— AI at Meta (@AIatMeta) April 5, 2025