avatarJim Clyde Monge

Summary

Google has released Gemini 1.5, a significant update to its AI language model, featuring a 1 million token context window, faster response times, improved information retrieval, and enhanced coding capabilities, positioning it as a competitor to OpenAI's GPT-4.

Abstract

Google's AI team has introduced Gemini 1.5, an advanced version of their language model, aiming to rival OpenAI's GPT-4. This update includes a substantial increase in the context window to 1 million tokens, which is unprecedented in the industry. Gemini 1.5 also adopts a Mixture of Experts (MoE) architecture to speed up responses and improve efficiency. The model shows remarkable progress in retrieving specific details from extensive text, audio, and video data, and it is particularly adept at analyzing large codebases. The new features are designed to make Gemini more appealing to developers, with easier tuning options, new developer surfaces, and more affordable pricing plans. Despite the promising enhancements, there is skepticism about Google's AI product launches due to past missteps, and the actual performance of Gemini 1.5 is yet to be proven through real-world benchmarks.

Opinions

  • The author has tried the new Gemini Advanced and found the initial experience to be "embarrassingly bad."
  • There is excitement about the new features of Gemini 1.5, especially the 1 million token context window, which is seen as a game-changer for analyzing complex data like movies, books, and codebases.
  • The author expresses skepticism about Google's ability to deliver on the promises of Gemini 1.5, given the botched launch of Bard and the underwhelming performance of Gemini Ultra compared to GPT-4.
  • The author suggests that Google's pattern of rushed launches and internal struggles to keep pace with competitors like OpenAI might be affecting the quality of their AI product releases.
  • Despite the concerns, the author acknowledges that Google is making a strong comeback in the AI race with Gemini 1.5 and its potential to process vast amounts of data efficiently.
  • The author recommends that real-world benchmarks are still needed to fully assess Gemini Pro's capabilities and reliability.

Google Releases Gemini 1.5 With 1M Context Window

Image by Jim Clyde Monge

Google’s AI team has been under intense pressure to keep pace with OpenAI’s groundbreaking GPT-4 language model. I have been trying out the recently launched Gemini and even upgraded to the $20 per month Gemini Advanced—so far, the experience is embarrassingly bad.

Today, Google dropped a bombshell—Gemini 1.5—a dramatically improved version of their flagship language model.

What’s new in Gemini 1.5?

Gemini 1.5 presents major enhancements designed to address the initial version’s shortcomings:

  1. 1,000,000 token context window: This is currently the largest context window of any large-scale foundation model. OpenAI’s GPT-4 has a 128K context window.
  2. It will have a faster response: Google is adopting the Mixture of Experts MoE architecture that likely powers GPT-4. This enables the model to break down a prompt into subtasks and route them to specialized “experts,” dramatically boosting efficiency and performance.
  3. Fast information retrieval: The new model demonstrates a significantly improved ability to pinpoint specific details within an enormous volume of text, video, or audio data.
  4. Better at coding: The large context window enables a deep analysis of an entire codebase, helping Gemini models grasp complex relationships, patterns, and understandings of code.

The 1M token Context Window

Perhaps most shocking is the upgrade in context window size. While most current large language models (LLMs) max out around 128,000 or so tokens, Gemini 1.5 Pro’s experimental build can process a jaw-dropping 1 million tokens.

Google

This capacity translates to:

  • 1 hour video
  • 11 hours audio
  • More than 30K lines of codes
  • More than 700,000 words

This is an absolute game-changer—imagine feeding the LLM an entire feature-length movie script, thousands of lines of complex code, or an extensive book. It offers enough context to analyze nuanced interactions, track character development, or find code errors on a massive scale.

Google

Think of it as the difference between asking a chatbot to analyze a 30-second conversation versus dissecting character motivations across the entire Lord of the Rings trilogy.

Developers, Rejoice!

As a developer, the most impressive feature is likely the ability to upload entire code repositories and ask Gemini to build entire modules in minutes. How cool is that?

Google

In addition to bringing the latest model innovations, Google is also making it easier for you to build with Gemini.

  • Easy tuning: There will be a set of examples that you can customize Gemini for your specific needs in minutes from inside Google AI Studio.
  • New developer surfaces: Integrate the Gemini API to build new AI-powered features today with new Firebase Extensions, across your development workspace in Project IDX, or with our newly released Google AI Dart SDK.
  • Cheaper Gemini 1.0 Pro: Today’s stable version is priced 50% less for text inputs and 25% less for outputs than previously announced. The upcoming pay-as-you-go plans for AI Studio are coming soon.

Gemini 1.5 in action

Google’s whitepaper showcases impressive real-world use cases for Gemini 1.5:

In the example below, they fed the 45-minute Buster Keaton movie “Sherlock Jr.” (1924) (2,674 frames at 1FPS, 684k tokens). Gemini 1.5 Pro retrieves and extracts textual information from a specific frame and provides the corresponding timestamp.

Google

Another example is that with the entire text of Les Misérables in the prompt, Gemini 1.5 Pro identified and located a famous scene based on a hand-drawn sketch.

Google

Google also showcased Gemini Pro 1.5’s ability to process across 100,000 lines of code and a series of multimodal prompts.

Google

If they can manage to actually pull this off, this is going to be wild!

Is Gemini now worth the upgrade?

On paper, Gemini 1.5 is definitely worth upgrading.

However, Google’s recent track record of AI product launches raises valid concerns.

  • Google’s first launch of Bard was botched.
  • Gemini’s “launch” video, which was fundamentally a marketing edit, did not show the real product and was heavily criticized by many.
  • Gemini Ultra was supposed to be really good, even better than GPT-4, but my initial tests proved that it’s still far from GPT-4.

Should we be excited by ambitious upgrades announced mere weeks after previous releases stumbled? It’s understandable to wonder if this pattern reflects rushed launches or an internal struggle to keep pace.

Right now, I don’t trust anything coming out of Google that isn’t an instantly testable input form.

Things to keep in mind

  • Gemini 1.5 Pro is supposed to be on par with Gemini Ultra in terms of performance.
  • Starting today, developers and enterprise customers can access a limited preview of 1.5 Pro via AI Studio and Vertex AI.
  • In case you’re confused with the name, like me, here’s a summary:
Image by Jim Clyde Monge

Final Thoughts

Google surprised me. The context window size—if it really works as advertised—is pretty ground-breaking.

While real-world benchmarks are still needed, there’s no denying that Google is back in the game and smelling blood. The pressure is on OpenAI to raise the bar again.

There’s no news yet when Gemini Pro will be released to consumers. Gemini Ultra 1.5 is already in the pipeline, and it seems to be going to be highly capable. 1.5 Pro is already very, very capable.

This story is published on Generative AI. Connect with us on LinkedIn and follow Zeniteq to stay in the loop with the latest AI stories. Let’s shape the future of AI together!

Technology
Artificial Intelligence
Google
Gemini
Llm
Recommended from ReadMedium