paint-brush
Google's PaLM-E (AI Robot) Can See and Understand Languageby@whatsai
1,860 reads
1,860 reads

Google's PaLM-E (AI Robot) Can See and Understand Language

by Louis BouchardMarch 24th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

PaLM-E is an embodied multimodal language model. It is a model that can interpret and understand various types of data, including images and text from ViT and PaLM models respectively, and convert this information into actions through a robotic hand. Learn more in the video…
featured image - Google's PaLM-E (AI Robot) Can See and Understand Language
Louis Bouchard HackerNoon profile picture

Recent AI models such as ChatGPT and Midjourney have showcased impressive capabilities in generating text and images.


However, there are also models that specialize in understanding these inputs, such as the Vision Transformers (ViT) for images and Pathways language model (PaLM) for text. These models can interpret and comprehend the meaning of images and sentences.


Combining both text and image models would result in an AI that can understand various forms of data and would be able to comprehend nearly everything.


However, the capabilities of such a model may seem limited at first glance, as it would only be able to understand things. But, what if this model is integrated with a robotic system that can move in the physical world? This is where PaLM-E comes in.


What is The PaLM-E AI Model by Google?

Google's latest publication, PaLM-E, is an embodied multimodal language model.


This means that it is a model that can interpret and understand various types of data, including images and text from ViT and PaLM models respectively, and convert this information into actions through a robotic hand.


Learn more in the video…