Google’s Gemini 1.5 Finds Needles in Haystacks

Summary

Google's Gemini Pro 1.5 is a new AI model with a 10M token context window, which shows promising results in handling large amounts of data with high efficiency and accuracy, including near-perfect recall of specific details within extensive text.

Abstract

Google has released an updated version of its AI model, Gemini Pro 1.5, which boasts a significant increase in the token context window to 10 million, with 1 million tokens currently in production. This model is designed to handle complex tasks such as querying large code repositories, processing full-length videos, and managing large datasets with improved efficiency. Gemini 1.5 utilizes a Mixture-of-Experts architecture to achieve these advancements. The model excels in maintaining high recall rates for specific information, termed "needle" recall, even within vast contexts equivalent to 70 books' worth of data. This capability is crucial for the long-term adoption of generative AI, as it addresses the challenge of models forgetting or hallucinating information in large contexts. The success of Gemini 1.5 in remembering critical details in extensive datasets could potentially revolutionize the utility of AI in various high-value applications.

Opinions

The Gemini Pro 1.5 model is considered a significant advancement, with evaluation scores comparable to Ultra 1.0 and improved computational efficiency, leading to shorter response times.
The model's ability to remember specific details, such as a pass code within a large prompt, is seen as a critical improvement for the adoption of generative AI in more complex and valuable tasks.
There is an acknowledgment that previous large language models (LLMs) struggled with remembering details in large contexts, often leading to incorrect information or "hallucinations."
The paper on Gemini 1.5 suggests that the model's performance could be a game changer, with anticipation for Google's announcement on inference costs and pricing models to fully understand its economic impact.

Google’s Gemini 1.5 Finds Needles in Haystacks

Gemini Pro 1.5 has arrived. 10M token context window (1M in production for now), comparable evals (so far) to Ultra 1.0, and considerably more compute efficient — shorter response times.

It uses the Mixture-of-Experts architecture. to improve efficiency.

This model will be available in Google AI Studio and supports use cases like:

Upload multiple large files (<= 1 million tokens) and ask questions
- Query an entire code repository <= 300K LOC
- Add a full-length video <= 10 hours long

From the Gemini 1.5 Paper.

The model achieves near-perfect “needle” recall (>99.7%). This is big.

LLMs often misremember and hallucinate even with a sophisticated #RAG chain.

The more they have to remember [the larger the prompt chain’s context for example], the more they forget.

Hiding a chunk of text like. “the pass code is 135790" in a an average size prompt [1000 tokens] and after asking the model for the pass code you’ll get the right answer.

But that same line buried in a 10,000-word prompt will confuse the model and it very likely will make stuff up.

Solving this is critical to long term adoption of genAI, specifically an #LLM.

If it can’t remember a small but important detail of a long financial report or the PSA level on a blood test the next gen LLMs will continue to occupy low value niches.

Gemini 1.5 finds the pass code, 99.7% of the time in a prompt of 10 million tokens — equivalent to. remembering after being fed 70 books worth of information.

This could be a. game changer — waiting to hear from Google on inference costs and a pricing model.

#Gemini #Google

“