Meta's New Model OPT is an Open-Source GPT-3

Written by whatsai | Published 2022/05/06
Tech Story Tags: artificial-intelligence | meta | facebook | machine-learning | data-science | natural-language-processing | openai | hackernoon-top-story | web-monetization | hackernoon-es | hackernoon-hi | hackernoon-zh | hackernoon-vi | hackernoon-fr | hackernoon-pt | hackernoon-ja

TLDR

We’ve all heard about GPT-3 and have somewhat of a clear idea of its capabilities. You’ve most certainly seen some applications born strictly due to this model, some of which I covered in a previous video about the model. GPT-3 is a model developed by OpenAI that you can access through a paid API but have no access to the model itself. What makes GPT-3 so strong is both its architecture and size. It has 175 billion parameters. That's twice the number of neurons we have in our brains! This immense network was pretty much trained on the whole internet to understand how we write, exchange, and understand text. This week, Meta has taken a big step forward for the community. They just released a model that is just as powerful, if not more, and has completely open-sourced it. How cool is that? Learn more in the video...via the TL;DR App

We’ve all heard about GPT-3 and have somewhat of a clear idea of its capabilities. You’ve most certainly seen some applications born strictly due to this model, some of which I covered in a previous video about the model. GPT-3 is a model developed by OpenAI that you can access through a paid API but have no access to the model itself.

What makes GPT-3 so strong is both its architecture and size. It has 175 billion parameters. That's twice the number of neurons we have in our brains!

This immense network was pretty much trained on the whole internet to understand how we write, exchange, and understand text. This week, Meta has taken a big step forward for the community. They just released a model that is just as powerful, if not more, and has completely open-sourced it. How cool is that? Learn more in the video...

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/opt-meta/
►Zhang, Susan et al. “OPT: Open Pre-trained Transformer Language Models.” https://arxiv.org/abs/2205.01068
►My GPT-3's video for large language models: https://youtu.be/gDDnTZchKec
►Meta's post: https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/
►Code: https://github.com/facebookresearch/metaseq
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
►Join Our Discord channel, Learn AI Together: https://discord.gg/learnaitogether

Video Transcript

0:00

we've all heard about gpt3 and have

0:02

somewhat of a clear idea of its

0:03

capabilities you've most certainly seen

0:06

some applications born strictly due to

0:08

this model some of which i covered in a

0:10

previous video gpd3 is a model developed

0:13

by openai that you can access through a

0:15

paid api but have no access to the model

0:18

itself what makes gpt3 so strong is both

0:21

its architecture and size it has

0:24

175 billion parameters twice the amount

0:27

of neurons we have in our brains this

0:30

immense network was pretty much trained

0:32

on the whole internet to understand how

0:34

we write exchange and understand text

0:37

this week meta has taken a big step

0:39

forward for the community they just

0:41

released a model that is just as

0:43

powerful if not more and has completely

0:46

open sourced it how cool is that we can

0:48

now have access to a gpt-like model and

0:51

play with it directly without going

0:53

through an api and limited access meta's

0:56

most recent model opt which stands for

0:59

open pre-trained transformers is

1:01

available in multiple sizes with

1:03

pre-trained weights to play with or do

1:05

any research work one of which is

1:07

comparable to gp23 and has the best

1:09

results that's super cool news for the

1:12

field and especially for us academic

1:14

researchers so just like gpg3 this new

1:17

model can generate text from user inputs

1:19

on a lot of different tasks one day it

1:22

will even be able to summarize weeks

1:24

worth of work for you in clear reports

1:26

but until then you still need to write

1:28

them yourself at least you can get some

1:30

help to make this reporting process much

1:33

more efficient using great tools like

1:35

this episode sponsor weights and biases

1:38

weights and biases allows you to easily

1:39

keep track of all your experiments with

1:41

only a handful of lines added to your

1:44

code but more specifically it's really

1:46

cool how they facilitated the creation

1:48

of amazing looking interactive reports

1:50

like this one clearly showing your team

1:53

or future itself your run matrix

1:55

hyperparameters and data configurations

1:57

alongside any notes you or your team had

2:00

at the time reports are easily done

2:02

following templates generated from your

2:04

runs metrics and you just have to add

2:06

your comments it's a powerful feature to

2:08

either add quick comments on an

2:10

experiment or create polished analysis

2:12

pieces capturing and sharing your work

2:14

is essential if you want to improve your

2:16

professional carrier so i recommend

2:18

using tools that improve communication

2:20

in your team like weights and biases try

2:23

it with the first link below and start

2:25

sharing your work like a pro

2:29

opt or more precisely opt-175b

2:33

is very similar to gpt3 so i strongly

2:36

recommend watching my video to better

2:37

understand how large language models

2:40

work gpd3 and opt cannot at least

2:42

summarize your emails or write quick

2:44

essay based on a subject it can also

2:46

solve basic math problems answer

2:49

questions and more the main difference

2:51

with gpt3 is that this one is open

2:53

source which means you have access to

2:56

its code and even pre-trained models to

2:58

play with directly another significant

3:00

fun fact is that opt's training used as

3:03

7th of the carbon footprint as gpt3

3:06

which is another step in the right

3:08

direction you can see that this new

3:10

model is very similar to gpt3 but open

3:13

source so a language model using

3:15

transformers which i covered in videos

3:18

before that was trained on many

3:19

different data sets one could say on the

3:22

whole internet to process text and

3:24

generate more text to better understand

3:27

how they work i'd again refer you to the

3:29

video i made covering gpt3 as they are

3:31

very similar models here what i really

3:34

wanted to cover is meta's effort to make

3:36

this kind of model accessible to

3:38

everyone while putting a lot of effort

3:40

into sharing its limitations biases and

3:43

risks for instance they saw that opt

3:45

tends to be repetitive and get stuck in

3:48

a loop which rarely happens for us

3:50

otherwise no one will talk to you since

3:53

it was trained on the internet they also

3:55

found that opt has a high propensity to

3:57

generate toxic language and reinforce

4:00

harmful stereotypes basically

4:02

replicating our general behaviors and

4:04

biases it can also produce factually

4:07

incorrect statements which is

4:08

undesirable if you want people to take

4:10

you seriously these limitations are some

4:13

of the most significant reasons these

4:15

models won't replace humans anytime soon

4:17

for important decision-making jobs or

4:20

even be used safely in commercial

4:22

products i invite you to read their

4:24

paper for their in-depth analysis of the

4:26

model's capacity and better understand

4:28

their efforts in making this model more

4:30

environmentally friendly and safe to use

4:33

you can also read more about their

4:34

training process and try it yourself

4:36

with their publicly available code all

4:39

the links are in the description such

4:41

open source contributions with new

4:43

models documentation and code available

4:45

are really important for the research

4:47

community to advance science and i'm

4:49

glad a big company like meta does that

4:52

thanks to them researchers from around

4:54

the world will be able to experiment

4:56

with state-of-the-art language models

4:58

instead of smaller versions i'm excited

5:00

to see all the upcoming advancements it

5:02

will create and i'd love to see what you

5:04

guys do with it feel free to comment

5:06

under the video or join our community

5:09

undiscovered and share your projects

5:10

there it's called learn ai together and

5:13

you can also find a link below i hope

5:15

you enjoyed this week's video that was a

5:17

bit different than usual covering this

5:19

exciting news and essential efforts to

5:21

share publicly available research i will

5:24

see you next week with another amazing

5:26

paper

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2022/05/06