News / AI

Google Unveils Gemini 1.5 Pro: LLM Giant with a Million Token Memory

Google Unveils Gemini 1.5 Pro: LLM Giant with a Million Token Memory

By Admin on February 20, 2024

Google Unveils Gemini 1.5 Pro: LLM Giant with a Million Token Memory image

Remember Google Bard? It's gone, replaced by the ever-expanding Gemini family. Now, Gemini 1.5 Pro enters the scene, surpassing even the efficiency of Google's previous flagship, Gemini Ultra. While benchmarks show slight edges over Ultra, we await a comprehensive comparison.


The MoE advantage and a million-token memory

Gemini 1.5 Pro leverages a new Mixture-of-Experts (MoE) architecture, outperforming its predecessor in 87% of benchmarks. It's available through Google One AI Premium, usurping Gemini Pro (now confusingly named Gemini 1.0 Pro) despite its recent upgrade.

But what sets it apart? Aside from improved efficiency and specific performance gains, the headline feature is its 128,000 token context window, expandable to a mind-boggling 1 million. This dwarfs GPT-4 Turbo's 128,000 tokens and Claude 2.1's 200,000.

Imagine processing an entire book, 11 hours of audio, or an hour of video in one go. That's the power of 1 million tokens. Google emphasizes, however, that Gemini 1.5 Pro remains a "mid-size" model focused on scalability and versatility.


Is it a GPT-4 killer?

Not in raw power, but for tasks requiring vast information, it might have the edge. Google showcased this by making 1.5 Pro understand details from the 402-page Apollo 11 mission transcripts and identify scenes in "Sherlock Jr." from descriptions and sketches.


From Kalamang to code: Pushing the boundaries

Another impressive feat: translating English to the complex Guinean language, Kalamang, even though it wasn't part of the training data. Armed with instructional materials in its context window, 1.5 Pro learned on the fly. It also tackled analyzing and solving problems within 100,000 lines of code, showcasing its potential for real-world applications.


A research powerhouse: Unlocking multimodality

Google's accompanying research paper, aptly titled "Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context," sheds light on the model's capabilities. It achieves near-perfect recall on long-context retrieval tasks across modalities, setting new standards in long-document QA, long-video QA, and long-context ASR.


Is it enough to sway users away from ChatGPT?

For most users, the benefits might be minimal unless they're dealing with massive datasets. But for researchers and specific use cases, the extended context window offers undeniable advantages.




Limited access and a crowded family: Google's gamble

Currently, Gemini 1.5 Pro is under wraps, available only to a select few. Pricing and accessibility remain questions with whispers of tiered pricing based on context window size. This exclusivity fuels speculation about the hefty investment required. Some argue that by the time it's widely available, competitors will have evolved, raising concerns about Google alienating potential users.


From Bard to 1.5 Pro: A confusing landscape

The rapid-fire release of Gemini models, coupled with Bard's short-lived existence and rebranding of 1.0 Pro, has created a convoluted product landscape. OpenAI's strategic approach with the ChatGPT brand and limited access seems, in contrast, more user-friendly.


Nuclear ambition, but can it avoid meltdown?

Google's aggressive push into generative AI with Gemini is undeniable. However, the increasingly complex product family and limited access might hinder its full potential. Only time will tell if Gemini 1.5 Pro, with its million-token memory, can truly challenge the dominance of ChatGPT or if Google's AI strategy risks self-inflicted confusion.