Introducing Google Gemini: A comprehensive guide to the new generative AI platform

Image Credits: TechCrunchGoogle’s endeavoring to create a stir with Gemini, a flagship collection of generative AI models, apps, and services. However, while Gemini seems promising in some aspects, it’s lacking in others — as our informal assessment revealed.
So what exactly is Gemini? How is it used? And how does it compare to the competition?
To make staying up to date with the latest Gemini developments more manageable, we’ve compiled this convenient guide, which we’ll keep current as new Gemini models and features are introduced.
What Exactly is Gemini?
Gemini is Google’s long-awaited, next-generation GenAI model family, created by Google’s AI research labs DeepMind and Google Research. It comes in three variations:

Gemini Ultra, the primary Gemini model.
Gemini Pro, a “lite” Gemini model.
Gemini Nano, a smaller “distilled” model designed for use on mobile devices like the Pixel 8 Pro.
All Gemini models were trained to be “natively multimodal” — in other words, able to operate with and utilize more than just words. They were pre-trained and fine-tuned on a variety of audio, images, videos, a vast collection of codebases, and text in multiple languages.
This sets Gemini apart from models such as Google’s own LaMDA, which was exclusively trained on text data. LaMDA cannot comprehend or generate anything other than text (e.g., essays, email drafts), but that’s not the case with Gemini models.
What’s the contrast between the Gemini apps and Gemini models?
Image Credits: Google
Google, once again proving its lack of a skill for branding, did not initially clarify that Gemini is distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface for accessing specific Gemini models — think of it as a client for Google’s GenAI.
Additionally, the Gemini apps and models are entirely separate from Imagen 2, Google’s text-to-image model available in some of the company’s development tools and environments. Don’t worry — you’re not the only one confused by this.
What are the capabilities of Gemini?
Because the Gemini models are multimodal, in theory, they can perform a variety of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. A few of these capabilities have not yet reached the product stage (more on that later), but Google promises all of them — and more — in the not-too-distant future.
Of course, it’s somewhat challenging to take the company at its word.
Google fell short with the original Bard launch, and more recently, it upset people with a video supposedly exhibiting Gemini’s capabilities that was heavily doctored and was mostly aspirational.
However, assuming Google is being somewhat truthful with its claims, here’s what the different tiers of Gemini will be capable of once they realize their full potential:
Gemini Ultra
Google claims that Gemini Ultra — thanks to its multimodality — can be utilized to assist with tasks such as physics homework, solving problems step-by-step on a worksheet, and identifying potential errors in already filled-in answers.
Gemini Ultra can also be employed to tasks such as identifying scientific papers relevant to a specific problem, extracting information from those papers, and “updating” a chart from one by generating the necessary formulas to recreate the chart with more recent data.
Gemini Ultra technically supports image generation, as mentioned earlier. However, that capability has not yet made its way into the productized version of the model, perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Instead of feeding prompts to an image generator (like DALL-E 3, in ChatGPT’s case), Gemini generates images “natively,” without an intermediary step.
Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s web-based tool for app and platform developers. It also powers the Gemini apps — but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires subscribing to the Google One AI Premium Plan, priced at $20 per month.
The AI Premium Plan also connects Gemini to your wider Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets, and Google Meet recordings. That’s useful for, say, summarizing emails or having Gemini capture notes during a video call.
Gemini Pro
Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning, and understanding capabilities.
An independent study by Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. However, the study also found that, like all large language models, Gemini Pro particularly struggles with math problems involving several digits, and users have found plenty of examples of bad reasoning and mistakes.
Google has promised improvements, though — and the first arrived in the form of Gemini 1.5 Pro.
Designed to be a drop-in replacement, Gemini 1.5 Pro (currently in preview) is improved in several areas compared to its predecessor, perhaps most significantly in the amount of data that it can process. Gemini 1.5 Pro can (in limited private preview) take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle. And — the model being multimodal — it’s not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in various languages, albeit slowly (e.g., searching for a scene in a one-hour video takes 30 seconds to a minute of processing).
Gemini Pro is also available via API in Vertex AI to accept text as input and generate text as output. An additional endpoint, Gemini Pro Vision, can process text and imagery — including photos and video — and output text along the lines of OpenAI’s GPT-4 with Vision model.
Using Gemini Pro in Vertex AI. Image Credits: Gemini

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.
In AI Studio, there are workflows for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the output’s creative range and provide examples to give tone and style instructions — and also tune the safety settings.
Gemini Nano
Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which allows users to record and transcribe audio, includes a Gemini-powered summary of recorded conversations, interviews, presentations, and other snippets. Users receive these summaries even without a signal or Wi-Fi connection available — and in a nod to privacy, no data leaves their phone in the process.
Gemini Nano is also in Gboard, Google’s keyboard app, as a developer preview. There, it powers a feature called Smart Reply, which helps suggest the next thing you’ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp but will come to more apps in 2024, Google says.
Is Gemini superior to OpenAI’s GPT-4?
Google has boasted several times about Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development.” The company says that Gemini Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming, and writing than GPT-3.5.
But setting aside the question of whether benchmarks genuinely indicate a better model, the scores Google cites seem to be only marginally better than OpenAI’s corresponding models. And — as mentioned earlier — some early impressions haven’t been great, with users and academics pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations, and gives poor coding suggestions.
How much will Gemini cost?
Gemini Pro is free to use in the Gemini apps and, for now, AI Studio and Vertex AI.
Once Gemini Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).
Assuming a 500-word article contains 2,000 characters, summarizing that article with Gemini Pro would cost $5. Meanwhile, generating an article of a similar length would cost $0.1.
Ultra pricing has yet to be announced.
Where can you try Gemini?
Gemini Pro
The easiest place to experience Gemini Pro is in the Gemini apps. Pro and Ultra are answering queries in a range of languages.
Gemini Pro and Ultra are also accessible in preview in Vertex AI via an API. The API is free to use “within limits” for the time being and supports certain regions, including Europe, as well as features like chat functionality and filtering.
Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps — or export the code to a more fully featured IDE.
Duet AI for Developers, Google’s suite of AI-powered assistance tools for code completion and generation, is now using Gemini models. And Google has brought Gemini models to its dev tools for Chrome and Firebase mobile dev platform.
Gemini Nano
Gemini Nano is on the Pixel 8 Pro — and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a sneak peek.

Leave a Reply

Your email address will not be published. Required fields are marked *