Google DeepMind’s Gemini Has Surpassed GPT-4

Share your love

A short while ago, Google and Google DeepMind unveiled their long-anticipated AI innovation, Gemini. While practical insights on its performance are still limited, the proclaimed excellence is noteworthy: it outshines GPT-4 across a broad spectrum.

Gemini specifications, dimensions (Ultra, Pro, Nano), and accessibility
Gemini constitutes a suite of models presented in three variants: Ultra, Pro, and Nano. Below encapsulates the technical report’s section on Gemini sizes and their attributes.

Gemini Ultra stands out, achieving state-of-the-art (SOTA) benchmarks, surpassing GPT-4 across various assessments (as elaborated shortly). Tailored for data centers, installing it on personal computers is impractical. Currently undergoing red-teaming safety evaluation, it will debut in early 2024 on Google’s revamped chatbot, Bard Advanced.

Gemini Pro, akin to GPT-3.5 (though not consistently superior), prioritizes “cost and latency optimization.” Opting for Pro is logical if top-tier performance is unnecessary, reminiscent of ChatGPT with GPT-3.5 being preferable for most tasks over paying $20/month for GPT-4. Gemini Pro is already accessible on Bard (its most significant enhancement yet) in 170 countries (excluding EU/UK) in English, with broader availability planned.

Gemini Nano targets devices. While Ultra and Pro’s parameter counts remain undisclosed, Nano comprises two tiers, Nano 1 (1.8B) and Nano 2 (3.25B), catering to low- and high-memory devices. Integrated into Google’s Pixel 8 Pro, this transforms the smartphone into an AI-enhanced marvel. Gemini will extend to various products and services like Search, Ads, Chrome, and Duet AI, specifics pending.

All models share a 32K context window, notably smaller than Claude 2 (200K) and GPT-4 Turbo (128K). Determining the optimal context window size remains challenging, given the reported tendency to forget context knowledge with excessive size. Gemini models purportedly “effectively utilize their context length,” alluding to addressing retrieval challenges.

Details about training or fine-tuning datasets, and model architectures are absent (aside from being “built on top of Transformer decoders” and “enhanced with improvements”). Meta’s next model may shed light. AlphaCode 2, released alongside Gemini by Google DeepMind, outperformed its precursor, solving 1.7× more problems and surpassing 85% of competitors, especially relevant in competitive programming.

Gemini Ultra outshines GPT-4

Scientifically and commercially, this signifies a pivotal moment. After nearly a year, an AI model has surpassed GPT-4. Gemini Ultra excels across 30 of 32 widely-used academic benchmarks, notably achieving 90.0% on MMLU, the first model to surpass human experts. It also scores 59.4% on the new MMMU benchmark, showcasing deliberate reasoning across multimodal tasks.

Gemini Ultra outperforms GPT-4 in 17 of 18 benchmarks, including MMLU (90% vs. 86.4%) and the new multimodality benchmark MMMU (59.4% vs. 56.8%). The marginal improvement underscores the challenge of enhancing these systems, possibly indicating the difficulty of surpassing OpenAI. Comparative performance is detailed across various benchmarks.

For a deeper understanding of Gemini’s real-world capabilities, including reasoning, problem-solving, etc., watching videos from Google DeepMind’s interactive blog post and CEO Sundar Pichai’s comprehensive demo is recommended.

Despite impressive capabilities, limitations persist. “Hallucinations” generated by LLMs require ongoing research, with challenges in high-level reasoning tasks. Gemini’s native multimodal design is highlighted, encompassing text, code, images, audio, and video, contributing to a richer understanding of the world.

Gemini’s unique natively multimodal architecture stands out. Traditional multimodal models often struggle with complex reasoning. Gemini, unlike GPT-4, is pre-trained and fine-tuned on multimodal data from inception, a departure from adding separate modules.

The next frontiers involve planning and robotics, with Gemini and OpenAI’s Q* representing strides in that direction. Google DeepMind’s plans for 2024 hint at agent-based systems and planning integration with Gemini.

Initial impressions from available information
Google delivers on implicit promises, with Gemini surpassing GPT-4 across benchmarks. While the margin is modest, it marks the first time in four years that any model outpaces OpenAI. Testing against GPT-4 Turbo in early 2024 will provide a clearer picture. Questions loom about Gemini’s capacity for improvement compared to GPT over time.

Numbers on benchmark evaluations show slight superiority over GPT-4, indicating the challenge of advancing models with current approaches. Google DeepMind’s adoption of closeness is notable, emphasizing business product over strict scientific exploration. Planning, agents, and robotics are the imminent challenges, with a slower advance predicted compared to language modeling.

Gemini’s release signals a shift in AI paradigms, potentially moving beyond transformer-based models. Future developments will unveil the longevity of this change.

Share your love
Articles: 67

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay informed and not overwhelmed, subscribe now!