July 18, 2024

Google DeepMind’s Chatbot-Powered Robot: Part of a Larger Revolution

Home
/
Artificial Intelligence (AI)
/
Google DeepMind’s Chatbot-Powered Robot:...

BakingAI

Reading time

minutes

In a bustling open-plan office in Mountain View, California, a sleek, wheeled robot has taken on the role of tour guide and office assistant, thanks to a significant upgrade with Google’s Gemini large language model, as revealed by Google DeepMind. This intelligent robot can understand and follow commands, seamlessly navigating the office.

For example, when a person says, “Find me somewhere to write,” the robot promptly leads them to a nearby whiteboard. Gemini’s advanced capabilities in processing both video and text, along with its ability to learn from recorded office tours, enable this “Google helper” to understand its surroundings and make informed decisions based on commonsense reasoning. By integrating Gemini with an action-generating algorithm, the robot can respond accurately to commands and its visual inputs, ensuring it performs tasks efficiently.

MUST READ: Mobility VLA: Advanced Multimodal Robot Navigation

At Baking AI, we leverage cutting-edge AI technology to transform your business operations. Google DeepMind, the AI research company, has developed a robot that utilizes their latest large language model, Gemini, to enhance its capabilities and enable more natural human-robot interactions.

A new project from Google’s DeepMind Robotics implemented Google Gemini 1.5 Pro to teach a robot how to perform different tasks around a 9,000-square-foot office space.

Key Findings

Gemini-Powered Robot: A tall, wheeled robot in Google’s Mountain View office has been demonstrating its abilities as a tour guide and assistant, thanks to the integration of DeepMind’s advanced Gemini language model.
Multimodal Capabilities: Gemini’s ability to process both video and text data allows the robot to comprehend its environment and follow commands that require common-sense reasoning, such as navigating to a whiteboard when instructed to “find a place where I can write.”
Improved Human-Robot Interaction: The combination of Gemini and an algorithm that generates specific responses based on visual and verbal cues has significantly improved the “naturalness” of the human-robot interaction, making the robot more usable and intuitive.
Expanding Robotic Capabilities: The introduction of Gemini in December 2023 marked a significant advancement in robotic capabilities, as it enabled robots to extend their influence beyond digital platforms and into the physical realm.
Industry Interest and Investment: Academic and industry experts are actively exploring the potential of integrating large language models like Gemini into robotics, with a focus on enhancing robots’ problem-solving skills. Startups like Physical Intelligence and Skild AI are at the forefront of this innovation, attracting significant investment.
Evolving Navigation and Perception: Large language models, combined with vision-language models, are revolutionizing the way robots interact with their environment. Robots no longer require predefined maps or specific commands for navigation, as they can now interpret visual cues and verbal instructions seamlessly.
Future Potential: Researchers envision Gemini-powered robots handling more complex tasks, such as identifying a user’s preferred drink based on visual cues, showcasing the model’s potential for nuanced interactions in real-world scenarios.

The Bottom Line

Google DeepMind’s Gemini-powered robot is a significant step forward in the integration of large language models with physical robots, enabling more natural and intuitive human-robot interactions. This development is part of a larger revolution in the field of robotics, where the merging of advanced AI technologies is transforming the way robots perceive and interact with their surroundings.

Was this article helpful?

YesNo