Remember Google Bard? It's gone, replaced by the ever-expanding Gemini family. Now, Gemini 1.5 Pro enters the scene, surpassing even the efficiency of Google's previous flagship, Gemini Ultra. While benchmarks show slight edges over Ultra, we await a comprehensive comparison.
The MoE advantage and a million-token memory
Gemini 1.5 Pro leverages a new Mixture-of-Experts (MoE) architecture, outperforming its predecessor in 87% of benchmarks. It's available through Google One AI Premium, usurping Gemini Pro (now confusingly named Gemini 1.0 Pro) despite its recent upgrade.
But what sets it apart? Aside from improved efficiency and specific performance gains, the headline feature is its 128,000 token context window, expandable to a mind-boggling 1 million. This dwarfs GPT-4 Turbo's 128,000 tokens and Claude 2.1's 200,000.
Imagine processing an entire book, 11 hours of audio, or an hour of video in one go. That's the power of 1 million tokens. Google emphasizes, however, that Gemini 1.5 Pro remains a "mid-size" model focused on scalability and versatility.
Is it a GPT-4 killer?
Not in raw power, but for tasks requiring vast information, it might have the edge. Google showcased this by making 1.5 Pro understand details from the 402-page Apollo 11 mission transcripts and identify scenes in "Sherlock Jr." from descriptions and sketches.
From Kalamang to code: Pushing the boundaries
Another impressive feat: translating English to the complex Guinean language, Kalamang, even though it wasn't part of the training data. Armed with instructional materials in its context window, 1.5 Pro learned on the fly. It also tackled analyzing and solving problems within 100,000 lines of code, showcasing its potential for real-world applications.
A research powerhouse: Unlocking multimodality
Google's accompanying research paper, aptly titled "Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context," sheds light on the model's capabilities. It achieves near-perfect recall on long-context retrieval tasks across modalities, setting new standards in long-document QA, long-video QA, and long-context ASR.
Is it enough to sway users away from ChatGPT?
For most users, the benefits might be minimal unless they're dealing with massive datasets. But for researchers and specific use cases, the extended context window offers undeniable advantages.