Google Gemini’s viral video showcasing AI powers was not done in real-time or using voice commands

Google’s viral video showcasing the capabilities of its new AI model, Gemini, has been revealed to not have been carried out in real-time or in voice. This information came to light when a Google spokesperson clarified the process behind the video in a Bloomberg report. The description of the video also carries a disclaimer saying ‘For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.’

Behind the scenes of the viral video

Contrary to how the interaction looked in the video, the demonstration was not a smooth voice conversation with Gemini responding in real-time to its surroundings. Instead, the video was made by using still image frames from the footage and prompting Gemini via text. The AI in the demo was reading human-made prompts that were given to Gemini, and it was shown still images.

Google Gemini launch

Google unveiled its latest creation earlier this week. The new model is claimed to do better in understanding, summarising, reasoning, coding, and planning tasks at a higher level than existing AI models.

Gemini comes in three versions:

  • Gemini Pro: Currently available, this is the baseline version offering the core capabilities of the model. It is being integrated with Bard chatbot as well
  • Gemini Ultra: Expected early next year, this advanced version will offer enhanced performance and access to additional features.
  • Gemini Nano: This lightweight version will be released at a later date, making the technology accessible to a broader audience with limited computing resources

Google claims Gemini is the most powerful and versatile AI model ever developed. Here are some of the strengths Google highlighted: 

Multimodal Integration: Unlike other models focused on text-based interactions, Gemini is built from the ground up to integrate with multiple modalities, including text, audio, and video. This allows for a more natural and human-like interaction. However, these capabilities have not been launched to the general public yet.

Advanced Reasoning and Planning: Gemini’s ability to understand, analyse, and reason about complex situations sets it apart. This enables it to plan and solve problems more effectively than other models.

Code Generation: The first version of Gemini can understand, explain, and generate high-quality code in popular programming languages like Python, Java, C++, and Go. Google claims that this makes it one of the best AI models for coding in the world.

Also read: Google’s most powerful AI, Gemini: How to use the new ChatGPT rival in India

Also read: Google reveals its most powerful AI model Gemini which outperforms most human experts, GPT-4 in benchmarks

Danny Cyril Dcruze

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *