[Link] The Llama 4 herd

06 Apr 2025

[Link] The Llama 4 herd

Meta has finally released the Llama 4 family of models that Zuckerberg hyped up so much. The Llama 4 models are open-source, multi-modal, mixture-of-experts models. First impression, these models are massive. None of these models will be able to run in the average computer with a decent GPU or any single Mac Mini. This is what we have:

Llama 4 Scout

The small model in the family. A mixture-of-experts with 16 experts, totaling 109B parameters. According to Meta, after an int-4 quantization, it fits in an H100 GPU, which is 80GB of VRAM. It's officially the model with the largest context window ever, with a supported 10M context window. However, a large context window takes a big toll on the already high VRAM requirements, so you might want to keep the context window contained. As they themselves write in their new cookbook example notebook for Llama 4:

Scout supports up to 10M context. On 8xH100, in bf16 you can get upto 1.4M tokens.

Llama 4 Maverick

The mid-sized model. This one has 128 experts, totaling 400B parameters. This one "only" features a 1M context window, due to its larger size. Maverick, as of today, has reached the second place in LMArena with 1417 ELO, only surpassed by Gemini 2.5 Pro. Which is scary, knowing this is not even the best model in the family.

Llama 4 Behemoth

The big brother in the family. 16 experts, 2 TRILLION parameters. Easily surpasses Llama 3.1 405B, which was the largest Llama model until today. This model has not yet been released, as according to Meta is still training, so we don't know anything about its capabilities.

Llama 4 Reasoning

We have no details on what it's going to be, just the announcement that it's coming soon.

Overall, these look like very capable frontier models that can compete with OpenAI, Anthropic and Google while at the same time being open-source, which is a huge win. Check out Meta's post on the models' architecture and benchmarks and also check the models on HuggingFace.