Meta’s Latest AI Model: Llama 3

Meta, the technology giant behind social media platforms like Facebook, Instagram, and WhatsApp, has unveiled its latest artificial intelligence (AI) model, Llama 3. This open-weights large language model (LLM) is designed to power text composition, code generation, and chatbots, and can be used for a range of applications.

Llama 3 is available in two parameter sizes: 8 billion (8B) and 70 billion (70B), both of which are available for free download from Meta’s website with a sign-up. The model comes in two versions: pre-trained and instruction-tuned, each with an 8,192 token context limit. Meta trained the model using two custom-built, 24,000-GPU clusters, and the 70B model was trained on around 15 trillion tokens of data.

In a podcast interview, Meta’s CEO Mark Zuckerberg discussed the development of Llama 3, stating that the model did not reach “saturation” during training and that it is possible that it could have improved with even more training data.

Mark Zuckerberg also mentioned that they added more coding task training data to Llama 3, as it seems to make LLMs better at reasoning. Our experience with Llama 3 8B shows that it does seem to be better at overall reasoning. For example, tasks involving summarizing have better key points, and knowledge it’s not sure about will come back with an “I don’t know” answer instead of hallucinating. This wasn’t backed by hard data though.

	Meta Llama 3	Gemma – It	Mistral instruct	Meta Llama 3	Gemini Pro 1.5	Claude 3 Sonnet
Parameter	7B	7B	7B	70B	?	?
Source		Measured	Measured		Published	Published
MMLU	68.4	53.3	58.4	82.0	81.9	79.0
GPQA	34.2	21.4	26.3	39.5	41.5	38.5
HumanEval	62.2	30.5	36.6	81.7	71.9	73.0
GSM-8K	79.6	30.6	39.9	93.0	91.7	92.3
MATH	30.0	12.2	11.0	50.4	58.5	40.5

How Llama 3 compares to other instruction-tuned LLMs provided by Meta.

In terms of comparison, Zuckerberg said the 8B version is nearly as powerful as the biggest version of Llama 2 that they released.

Llama 3 also brings a larger context window size at 8k vs Llama 2 with a 4k context size. This makes providing more context for more complex tasks easier. This can be thought of similar to short-term memory for people. Arguably, a larger short-term memory allows you to be more fluid with your understanding of a problem space. In terms of LLMs, this means you can provide more data to a request without doing training, which is expensive.

Beyond the 8B and 70B versions, Meta also announced that it is currently training a 400B parameter version of Llama 3, which is expected to perform similarly to GPT-4 Turbo, Claude 3 Opus, and Gemini Ultra on benchmark tests. According to the podcast, the 400B is estimated to be release some time this year.