paint-brush
Metas neues Segment Anything Model (SAM) ist ein Game Changer von@whatsai
11,683 Lesungen
11,683 Lesungen

Metas neues Segment Anything Model (SAM) ist ein Game Changer

von Louis Bouchard
Louis Bouchard HackerNoon profile picture

Louis Bouchard

@whatsai

I explain Artificial Intelligence terms and news to non-experts.

2 Mindest read2023/04/11
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript

Zu lang; Lesen

Segmentierung – das ist in der Fotowelt so etwas wie das Detektivspiel. Mit dieser Superkraft können Sie alles und jedes in einem Bild, von Objekten bis hin zu Personen, mit pixelgenauer Präzision identifizieren. Es verändert die Spielregeln für alle Arten von Anwendungen, etwa für autonome Fahrzeuge, die wissen müssen, was um sie herum vor sich geht.
featured image - Metas neues Segment Anything Model (SAM) ist ein Game Changer
Louis Bouchard HackerNoon profile picture
Louis Bouchard

Louis Bouchard

@whatsai

I explain Artificial Intelligence terms and news to non-experts.

0-item
1-item

STORY’S CREDIBILITY

Video

Video

The best videos on the Internet archived and shared on HackerNoon.

News

News

Hot off the press! This story contains factual information about a recent event.

Segmentierung – das ist in der Fotowelt so etwas wie das Detektivspiel. Mit dieser Superkraft können Sie alles und jedes in einem Bild, von Objekten bis hin zu Personen, mit pixelgenauer Präzision identifizieren.


Es ist ein entscheidender Faktor für alle Arten von Anwendungen, etwa für autonome Fahrzeuge, die wissen müssen, was sich um sie herum befindet, etwa ein Auto oder einen Fußgänger.


Du kennst dich mittlerweile sicherlich auch mit Aufforderungen aus. Aber haben Sie schon einmal von promptable Segmentation gehört? Es ist das neueste Kind im Block und es ist wirklich cool. Mit diesem neuen Trick im Ärmel können Sie Ihr KI-Modell dazu veranlassen, alles zu segmentieren, was Sie wollen – und ich meine alles! Dank Metas unglaublichem neuen SAM (Segment Anything Model) sind Ihren Möglichkeiten keine Grenzen gesetzt.


Wenn Sie neugierig sind, wie die schnelle Segmentierung und das SAM-Modell ihre Wirkung entfalten, sollten Sie sich mein Video nicht entgehen lassen. Darin erfahren Sie alles darüber, wie diese erstaunliche neue Technologie die Bildsegmentierung verändert. Lehnen Sie sich also zurück, entspannen Sie sich und lassen Sie sich von mir auf eine Reise in die Welt der promptable-Segmentierung mit SAM entführen. Vertrauen Sie mir, Sie werden es nicht bereuen!


Alles segmentieren: Metas erstaunliche neue KI


segmentation is the ability to take an

[00:00:00] : [00:00:05]

image and identify the objects people or

[00:00:05] : [00:00:08]

anything of interest it's done by

[00:00:08] : [00:00:10]

identifying which image pixels belong to

[00:00:10] : [00:00:13]

Which object and it's super useful for

[00:00:13] : [00:00:15]

tons of applications where you need to

[00:00:15] : [00:00:17]

know what's going on like a self-driving

[00:00:17] : [00:00:19]

car on the road identifying other cars

[00:00:19] : [00:00:21]

and pedestrians we also know that

[00:00:21] : [00:00:23]

prompting is a new skill for

[00:00:23] : [00:00:25]

communicating with AIS what about

[00:00:25] : [00:00:28]

promptable segmentation promptable

[00:00:28] : [00:00:30]

segmentation is a new task that was just

[00:00:30] : [00:00:33]

introduced with an amazing new AI model

[00:00:33] : [00:00:37]

by meta sum sum stands for Segment

[00:00:37] : [00:00:39]

anything modal and is able to segment

[00:00:39] : [00:00:42]

anything following a prompt how cool is

[00:00:42] : [00:00:44]

that in one click you can segment any

[00:00:44] : [00:00:47]

object from any photo or video it's the

[00:00:47] : [00:00:49]

first Foundation model for this task

[00:00:49] : [00:00:52]

trained to generate masks for almost any

[00:00:52] : [00:00:54]

existing object it's just like judge BT

[00:00:54] : [00:00:57]

for segmenting images a very general

[00:00:57] : [00:00:59]

model pretty much trained with every

[00:00:59] : [00:01:02]

type of of image and video with a good

[00:01:02] : [00:01:04]

understanding of every object and

[00:01:04] : [00:01:07]

similarly it has adaptation capabilities

[00:01:07] : [00:01:10]

for more complicated objects like a very

[00:01:10] : [00:01:12]

specific tool or machine this means you

[00:01:12] : [00:01:15]

can help it segment unknown objects

[00:01:15] : [00:01:17]

through prompts without retraining the

[00:01:17] : [00:01:20]

model which is called zero shot transfer

[00:01:20] : [00:01:22]

zero shot as in it has never seen that

[00:01:22] : [00:01:25]

in training some is super exciting for

[00:01:25] : [00:01:27]

all segmentation related tasks with

[00:01:27] : [00:01:29]

Incredible capabilities and is open

[00:01:29] : [00:01:32]

source super promising for the research

[00:01:32] : [00:01:34]

Community including myself and has tons

[00:01:34] : [00:01:36]

of applications you've seen the results

[00:01:36] : [00:01:38]

and you can see even more using the demo

[00:01:38] : [00:01:41]

linked below if you'd like we've also

[00:01:41] : [00:01:43]

had a quick overview of what it is but

[00:01:43] : [00:01:46]

how does it work and why is it so good

[00:01:46] : [00:01:49]

to answer the second question of why

[00:01:49] : [00:01:51]

it's that good we must go back to the

[00:01:51] : [00:01:55]

root of all current AI systems data it's

[00:01:55] : [00:01:57]

that good because we trained it with a

[00:01:57] : [00:02:00]

new data set which I cite is the largest

[00:02:00] : [00:02:04]

ever segmentation data set indeed the

[00:02:04] : [00:02:06]

data set called segment anything 1

[00:02:06] : [00:02:08]

billion was built specifically for this

[00:02:08] : [00:02:12]

task and is composed of 1.1 billion high

[00:02:12] : [00:02:14]

quality segmentation masks from 11

[00:02:14] : [00:02:17]

million images that represents

[00:02:17] : [00:02:20]

approximately 400 times more masks than

[00:02:20] : [00:02:23]

any existing segmentation data set to

[00:02:23] : [00:02:26]

date this is enormous and of super high

[00:02:26] : [00:02:28]

quality with really high definition

[00:02:28] : [00:02:31]

images and that's the recipe for Success

[00:02:31] : [00:02:35]

always more data and good curation

[00:02:35] : [00:02:37]

other than data which most models use

[00:02:37] : [00:02:40]

anyways let's see how the model works

[00:02:40] : [00:02:42]

and how it implements prompting into

[00:02:42] : [00:02:45]

segmentation tasks because this is all

[00:02:45] : [00:02:47]

related indeed the dataset was built

[00:02:47] : [00:02:50]

using the model itself iteratively as

[00:02:50] : [00:02:52]

you can see here on the right they use

[00:02:52] : [00:02:54]

the model to annotate the data further

[00:02:54] : [00:02:56]

train the model and repeat this is

[00:02:56] : [00:02:58]

because we cannot simply find images

[00:02:58] : [00:03:00]

with masks around objects on the

[00:03:00] : [00:03:02]

internet instead we start by training

[00:03:02] : [00:03:05]

our model with human help to correct the

[00:03:05] : [00:03:07]

predicted masks we then repeat with less

[00:03:07] : [00:03:10]

and less human involvement primarily for

[00:03:10] : [00:03:11]

the objects that the model didn't see

[00:03:11] : [00:03:14]

before but where is prompting used it's

[00:03:14] : [00:03:16]

used to say what we want to segment from

[00:03:16] : [00:03:19]

the image as we've talked in my recent

[00:03:19] : [00:03:21]

podcast episode with sander sulath

[00:03:21] : [00:03:23]

founder of learn prompting which I think

[00:03:23] : [00:03:25]

you should listen to a prompt can be

[00:03:25] : [00:03:28]

anything in this case it's either text

[00:03:28] : [00:03:31]

or spatial information like a rough box

[00:03:31] : [00:03:33]

or just a point on the image basically

[00:03:33] : [00:03:36]

asking what you want or showing it then

[00:03:36] : [00:03:38]

we use an image encoder as with all

[00:03:38] : [00:03:41]

segmentation tasks and a prompt encoder

[00:03:41] : [00:03:43]

the image encoder will be similar to

[00:03:43] : [00:03:45]

most I already covered on the channel

[00:03:45] : [00:03:47]

where we take the image and basically

[00:03:47] : [00:03:49]

extracts the most valuable information

[00:03:49] : [00:03:52]

from it using a neural network here the

[00:03:52] : [00:03:55]

novelty is our prompt encoder having

[00:03:55] : [00:03:57]

this prompt encoder separated from our

[00:03:57] : [00:03:59]

image encoder is what makes the approach

[00:03:59] : [00:04:02]

so fast and responsive since we can

[00:04:02] : [00:04:04]

simply process the image once and then

[00:04:04] : [00:04:06]

iterate prompts to segment multiple

[00:04:06] : [00:04:09]

objects as you can see by yourself in

[00:04:09] : [00:04:12]

their online demo the image encoder is

[00:04:12] : [00:04:14]

another Vision Transformer or vit that

[00:04:14] : [00:04:16]

you can learn more about in my vision

[00:04:16] : [00:04:18]

Transformer video if you'd like it will

[00:04:18] : [00:04:21]

produce our image embeddings which are

[00:04:21] : [00:04:23]

our extracted information then we will

[00:04:23] : [00:04:25]

use this information along with our

[00:04:25] : [00:04:28]

prompts to generate a segmentation but

[00:04:28] : [00:04:30]

how can we combine our text and spatial

[00:04:30] : [00:04:33]

prompts to this image embedding we

[00:04:33] : [00:04:35]

represent the spatial prompts through

[00:04:35] : [00:04:37]

the use of positional encodings

[00:04:37] : [00:04:39]

basically giving the spatial information

[00:04:39] : [00:04:42]

as is then for the text it's simple we

[00:04:42] : [00:04:45]

use clip as always a model able to

[00:04:45] : [00:04:48]

encode text similar to how images are

[00:04:48] : [00:04:50]

encoded clip is amazing for this

[00:04:50] : [00:04:51]

application since it was trained with

[00:04:51] : [00:04:54]

tons of image caption pairs to encode

[00:04:54] : [00:04:57]

both similarly so when it gets a clear

[00:04:57] : [00:04:59]

text prompt it's a bridge for comparing

[00:04:59] : [00:05:02]

text and images and finally we need to

[00:05:02] : [00:05:04]

produce a good segmentation from all

[00:05:04] : [00:05:07]

those information this can be done using

[00:05:07] : [00:05:10]

a decoder which is simply put the

[00:05:10] : [00:05:12]

reverse network of the image encoder

[00:05:12] : [00:05:15]

taking condensed information and

[00:05:15] : [00:05:17]

recreating an image though now we only

[00:05:17] : [00:05:19]

want to create masks that we put back

[00:05:19] : [00:05:22]

over the initial image so it's much

[00:05:22] : [00:05:23]

easier than generating a completely new

[00:05:23] : [00:05:26]

image as Delhi or mid Journey does such

[00:05:26] : [00:05:29]

models use diffusion models but in this

[00:05:29] : [00:05:31]

case they decided to go for a similar

[00:05:31] : [00:05:33]

architecture as the image encoder a

[00:05:33] : [00:05:36]

vision Transformer based decoder that

[00:05:36] : [00:05:39]

works really well and voila this was a

[00:05:39] : [00:05:42]

simple overview of how the new Sam model

[00:05:42] : [00:05:44]

by meta works of course it's not perfect

[00:05:44] : [00:05:47]

and has limitations like missing fine

[00:05:47] : [00:05:49]

structures or sometimes hallucinating

[00:05:49] : [00:05:52]

small disconnected components

[00:05:52] : [00:05:54]

still it's extremely powerful and a huge

[00:05:54] : [00:05:57]

step forward introducing a new

[00:05:57] : [00:05:59]

interesting and highly applicable tasks

[00:05:59] : [00:06:01]

I invite you to read meta's great blog

[00:06:01] : [00:06:03]

post and paper to learn more about the

[00:06:03] : [00:06:06]

model or try it directly with their code

[00:06:06] : [00:06:08]

or demo all the links are in the

[00:06:08] : [00:06:10]

description below I hope you've enjoyed

[00:06:10] : [00:06:12]

this episode and I will see you next

[00:06:12] : [00:06:16]

time with another amazing paper

[00:06:16] : [00:06:23]

[Music]

[00:06:23] : [00:06:26]

foreign

[00:06:26] : [00:06:26]



Verweise

►Lesen Sie den vollständigen Artikel: https://www.louisbouchard.ai/meta-sam/

►Artikel: Kirillov et al., Meta, (2023): Segment Anything, https://ai.facebook.com/research/publications/segment-anything/

►Demo: https://segment-anything.com/demo

►Code: https://github.com/facebookresearch/segment-anything

►Datensatz: https://segment-anything.com/dataset/index.html

►Mein Newsletter (Eine neue KI-Anwendung, die wöchentlich in Ihren E-Mails erklärt wird!): https://www.louisbouchard.ai/newsletter/

►Unterstützen Sie mich auf Patreon: https://www.patreon.com/whatsai

►Unterstützen Sie mich, indem Sie Merch tragen: https://whatsai.myshopify.com/

►Treten Sie unserem KI-Discord bei: https://discord.gg/learnaitogether

L O A D I N G
. . . comments & more!

About Author

Louis Bouchard HackerNoon profile picture
Louis Bouchard@whatsai
I explain Artificial Intelligence terms and news to non-experts.

Hängeetiketten

DIESER ARTIKEL WURDE VORGESTELLT IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X REMOVE AD