Deepmind May Have Just Created the World's First General AI by@whatsai

Deepmind May Have Just Created the World's First General AI

May 16th 2022 5,248 reads
Read on Terminal Reader
Open TLDR
react to story with heart
react to story with light
react to story with boat
react to story with money
Gato from DeepMind was just published! It is a single transformer that can play Atari games, caption images, chat with people, control a real robotic arm, and more! Indeed, it is trained once and uses the same weights to achieve all those tasks. Gato is a multi-modal agent meaning that it can create captions for images or answer questions as a chatbot. It understands words, images, and even physics... learn more in the video transcript below below.
image
Louis Bouchard HackerNoon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

facebook social icongithub social iconyoutube social iconinstagram social icontwitter social iconlinkedin social icon

Gato from DeepMind was just published! It is a single transformer that can play Atari games, caption images, chat with people, control a real robotic arm, and more! Indeed, it is trained once and uses the same weights to achieve all those tasks. And as per Deepmind, this is not only a transformer but also an agent. This is what happens when you mix Transformers with progress on multi-task reinforcement learning agents.

As we said, Gato is a multi-modal agent. Meaning that it can create captions for images or answer questions as a chatbot. You’d say that GPT-3 can already do that, but Gato can do more… The multi-modality comes from the fact that Gato can also play Atari games at the human level or even do real-world tasks like controlling robotic arms to move objects precisely. It understands words, images, and even physics...

Learn more in the video

References

►Read the full article: https://www.louisbouchard.ai/deepmind-gato/
►Deepmind's blog post: https://www.deepmind.com/publications/a-generalist-agent
►Paper: Reed S. et al., 2022, Deemind: Gato, https://storage.googleapis.com/deepmind-media/A%20Generalist%20Agent/Generalist%20Agent.pdf
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video transcript

       0:00

Gato from deepmind was just published

0:02

it's a single transformer that can play

0:04

atari games caption images chat with

0:07

people control a real robotic arm and

0:09

more indeed is trained once and uses the

0:12

same weights to achieve all those tasks

0:15

and as per deepmind this is not only a

0:17

transformer but also an agent this is

0:20

what happens when you mix transformers

0:22

with progress on multi-task

0:23

reinforcement learning agents as we said

0:26

gato is a multi-modal agent meaning that

0:29

it can create captions for images or

0:31

answer questions as a chatbot you'd see

0:34

that gpt3 can already do that but ghetto

0:36

can do more the multimodality comes from

0:39

the fact that ghetto can also play atari

0:41

games at the human level or even do real

0:44

world tasks like controlling robotic

0:46

arms to move objects precisely it

0:48

understands words images and even

0:51

physics ghetto is the first generalist

0:54

model that performs so well on so many

0:56

different tasks and it's extremely

0:58

promising for the field it was trained

1:00

on 604 distinct tasks with varying

1:03

modalities observations and action

1:06

specifications making it the perfect

1:08

generalist and as i said it does all

1:11

that with the same network and weights

1:13

and before you ask it only needs 1.2

1:15

billion parameters compared to gpt3 that

1:18

requires

1:19

175 billion of them it's not a trap

1:22

where you have to retrain or fight unit

1:24

to all tasks you can send both an image

1:27

and text and it will work you can even

1:29

add in a few movements from a robot arm

1:32

the model can decide which type of

1:34

output to provide based on its context

1:36

ranging from text to discrete actions in

1:38

an environment if you enjoyed the video

1:41

please consider subscribing and let me

1:43

know if you like this kind of news video

1:46

i definitely do more this is possible

1:48

because of their tokenization process

1:50

tokenization is when you prepare your

1:52

inputs for the modal as they do not

1:55

understand text or images by themselves

1:57

language models and ghetto took the

1:59

total number of sub words for example 32

2:02

000 and each word has a number assigned

2:05

to it for images they follow the vit

2:08

patch embedding using a widely used

2:10

resnet block as we covered in a previous

2:12

video we also tokenized the button

2:14

presses as integer numbers for atari

2:16

games or discrete values finally for

2:19

continuous values like proprioceptive

2:21

inputs we talked about with the robotic

2:23

arms they encoded the different track

2:25

matrix into float numbers and added them

2:27

after the text tokens using all those

2:30

different inputs the agent adapts to the

2:32

current task to generate appropriate

2:34

outputs during training they use prompt

2:36

conditioning as in gpt3 with previously

2:39

sampled actions and observations the

2:42

progress in generalist rl agents in the

2:44

last years has been incredible and came

2:47

mainly from deepmind one could see that

2:49

they are moving the needle closer to

2:51

general ai or human level intelligence

2:55

if we can finally define it i love how

2:57

many details they gave in their paper

2:59

and i'm excited to see what they will do

3:01

or what other people will do using this

3:03

model's architecture the link to the

3:06

paper for more information about the

3:07

model is in the description i hope you

3:09

enjoyed this short video i just saw this

3:12

news when i woke up and i couldn't do

3:13

anything else than make this video

3:15

before starting my day it's just too

3:17

exciting i will see you next week with

another amazing paper




react to story with heart
react to story with light
react to story with boat
react to story with money
L O A D I N G
. . . comments & more!