'Google's Gemini': Largest AI model to redefine text, image, video, audio and coding

Published : Dec 7, 2023, 10:10 AM IST

Updated : Dec 7, 2023, 10:16 AM IST

Google Wednesday introduced Gemini, its highly successful and multiple formats like text, code, audio, image, and video, with modern overall performance averaging many main benchmarks, in 3 iterations- Ultra, Pro and Nano. The latest model's primary advantage is its ability to operate across different formats.

Google has launched its largest AI model Gemini which will be available in 3 iterations- Ultra, Pro and Nano. The latest model is set to operate across different information types like text, code, audio, image, and video.

Representative picture of Google's Gemini

Hyderabad: In the never-ending race of the tech giants in the realm of Artificial Intelligence, Google has entered with their latest AI model, Gemini, designed to excel in processing text, images, audio, video, and code.

Google CEO Sundar Pichai highlighted Gemini's "state-of-the-art performance across many leading benchmarks" and introduced three versions: Ultra (for highly complex tasks), Pro (for scaling across a wide range of tasks) and Nano (on-device tasks).

Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023 " class="align-text-top noRightClick twitterSection" data="
Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023
">
Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023

These models aim to generalise and seamlessly operate across different information types like text, code, audio, image, and video. Gemini's capabilities were showcased in demos, emphasising its ability to perceive like a human eye, evaluate real-time data, and suggest actions.

Gemini Ultra, the largest model, targets highly complex tasks, while Gemini Pro excels at scaling across various tasks. Gemini Nano is tailored for on-device functions, and it has already been integrated into Pixel 8 Pro for features like Summarise in the Recorder app and Smart Reply via Gboard on WhatsApp.

Google plans to expand Gemini's integration into other products and services, such as Search, Ads, Chrome, and Duet AI. "These are the first models of the Gemini era and the first realisation of the vision we had when we formed Google DeepMind earlier this year," said Alphabet and Google CEO Sundar Pichai.

According to Demis Hassabis, CEO of Google DeepMind, Gemini Ultra outperforms existing large language models on 30 out of 32 widely used academic benchmarks. It boasts to surpass human experts on the massive multitask language understanding benchmark, which assesses knowledge and problem-solving skills across 57 subjects. Gemini Pro has also demonstrated superiority over GPT-3.5 in various benchmarks.

The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023 " class="align-text-top noRightClick twitterSection" data="
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023
">
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023

"With a score of 90 per cent, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as maths, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities," Google said.

Gemini's flexibility is emphasised as it can efficiently run on both data centres and mobile devices. This versatility enhances developers' ability to build and scale AI applications. Hassabis envisions Gemini as a step closer to creating AI models that feel like expert helpers or assistants, providing useful and intuitive interactions.

The model's multimodal reasoning capabilities enable it to comprehend complex written and visual information, extracting insights from extensive datasets.

The first version of Gemini exhibits proficiency in understanding nuanced information, answering complex questions in subjects like maths and physics, and generating high-quality code in popular programming languages. Hassabis acknowledges ongoing efforts to enhance Gemini's capabilities, including advancements in planning and memory, and expanding the context window for improved responses.

Google plans to roll out Gemini incrementally, starting with its integration into the chatbot Bard for English language settings. It will be available in over 170 countries and territories, with developers gaining access through Google Cloud's API from December 13.

Gemini's compact version will power suggested messaging replies on Pixel 8 smartphones, with further integration into Google products anticipated in the coming months. The most powerful version of Gemini is expected to debut in 2024, subject to extensive trust and safety checks.

To address potential risks, Alphabet, Google's parent company, is implementing new protections in alignment with safety policies and AI principles.

More on Artificial Intelligence-
Explained: ChatGPT maker OpenAI CEO Sam Altman's rollercoaster ride from departure to return, amidst leadership shakeup
Artificial Intelligence: World is finally starting to regulate artificial intelligence, what to expect from US, EU and China's new laws
ChatGTP topped list of most used chatbots, beats Google Bard and Microsoft's Bing
Grok vs GPT-4 Turbo: Decoding rivalry between tech giants' AI chatbots
Elon Musk calls AI "one of most disruptive forces in history", discusses risks with Rishi Sunak

Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023 " class="align-text-top noRightClick twitterSection" data="
Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023
">
Introducing Gemini, Google’s largest and most capable AI model. 🧵 #GeminiAI https://t.co/T0tIw9HQyO
— Google (@Google) December 6, 2023

The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023 " class="align-text-top noRightClick twitterSection" data="
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023
">
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses https://t.co/VUu1277bC2 pic.twitter.com/pKyBxXwdYw
— Demis Hassabis (@demishassabis) December 6, 2023

The model's multimodal reasoning capabilities enable it to comprehend complex written and visual information, extracting insights from extensive datasets.

To address potential risks, Alphabet, Google's parent company, is implementing new protections in alignment with safety policies and AI principles.

More on Artificial Intelligence-
Explained: ChatGPT maker OpenAI CEO Sam Altman's rollercoaster ride from departure to return, amidst leadership shakeup
Artificial Intelligence: World is finally starting to regulate artificial intelligence, what to expect from US, EU and China's new laws
ChatGTP topped list of most used chatbots, beats Google Bard and Microsoft's Bing
Grok vs GPT-4 Turbo: Decoding rivalry between tech giants' AI chatbots
Elon Musk calls AI "one of most disruptive forces in history", discusses risks with Rishi Sunak

Last Updated : Dec 7, 2023, 10:16 AM IST

TAGGED:

Artificial Intelligence

'Google's Gemini': Largest AI model to redefine text, image, video, audio and coding

Quick Links / Policies

Get Us On

Featured

Exclusive: Vijay Sethupathi Spills about Maharaja, Says 'I Knew This Had to Be My 50th Film'

Modi 3.0: Brand Modi Faces Turbulence As Coalition Partners Resort to Massive Bargaining

Hunt Begins For Next BJP president: Vinod Tawde, Sunil Bansal & Who Else Are Frontrunners?

Walking Helps Lose Weight. How Far Should One Walk to Lose 1kg Bodyweight?