I explain Artificial Intelligence terms and news to non-experts.
The best videos on the Internet archived and shared on HackerNoon.
Hot off the press! This story contains factual information about a recent event.
Segmentierung – das ist in der Fotowelt so etwas wie das Detektivspiel. Mit dieser Superkraft können Sie alles und jedes in einem Bild, von Objekten bis hin zu Personen, mit pixelgenauer Präzision identifizieren.
Es ist ein entscheidender Faktor für alle Arten von Anwendungen, etwa für autonome Fahrzeuge, die wissen müssen, was sich um sie herum befindet, etwa ein Auto oder einen Fußgänger.
Du kennst dich mittlerweile sicherlich auch mit Aufforderungen aus. Aber haben Sie schon einmal von promptable Segmentation gehört? Es ist das neueste Kind im Block und es ist wirklich cool. Mit diesem neuen Trick im Ärmel können Sie Ihr KI-Modell dazu veranlassen, alles zu segmentieren, was Sie wollen – und ich meine alles! Dank Metas unglaublichem neuen SAM (Segment Anything Model) sind Ihren Möglichkeiten keine Grenzen gesetzt.
Wenn Sie neugierig sind, wie die schnelle Segmentierung und das SAM-Modell ihre Wirkung entfalten, sollten Sie sich mein Video nicht entgehen lassen. Darin erfahren Sie alles darüber, wie diese erstaunliche neue Technologie die Bildsegmentierung verändert. Lehnen Sie sich also zurück, entspannen Sie sich und lassen Sie sich von mir auf eine Reise in die Welt der promptable-Segmentierung mit SAM entführen. Vertrauen Sie mir, Sie werden es nicht bereuen!
segmentation is the ability to take an
image and identify the objects people or
anything of interest it's done by
identifying which image pixels belong to
Which object and it's super useful for
tons of applications where you need to
know what's going on like a self-driving
car on the road identifying other cars
and pedestrians we also know that
prompting is a new skill for
communicating with AIS what about
promptable segmentation promptable
segmentation is a new task that was just
introduced with an amazing new AI model
by meta sum sum stands for Segment
anything modal and is able to segment
anything following a prompt how cool is
that in one click you can segment any
object from any photo or video it's the
first Foundation model for this task
trained to generate masks for almost any
existing object it's just like judge BT
for segmenting images a very general
model pretty much trained with every
type of of image and video with a good
understanding of every object and
similarly it has adaptation capabilities
for more complicated objects like a very
specific tool or machine this means you
can help it segment unknown objects
through prompts without retraining the
model which is called zero shot transfer
zero shot as in it has never seen that
in training some is super exciting for
all segmentation related tasks with
Incredible capabilities and is open
source super promising for the research
Community including myself and has tons
of applications you've seen the results
and you can see even more using the demo
linked below if you'd like we've also
had a quick overview of what it is but
how does it work and why is it so good
to answer the second question of why
it's that good we must go back to the
root of all current AI systems data it's
that good because we trained it with a
new data set which I cite is the largest
ever segmentation data set indeed the
data set called segment anything 1
billion was built specifically for this
task and is composed of 1.1 billion high
quality segmentation masks from 11
million images that represents
approximately 400 times more masks than
any existing segmentation data set to
date this is enormous and of super high
quality with really high definition
images and that's the recipe for Success
always more data and good curation
other than data which most models use
anyways let's see how the model works
and how it implements prompting into
segmentation tasks because this is all
related indeed the dataset was built
using the model itself iteratively as
you can see here on the right they use
the model to annotate the data further
train the model and repeat this is
because we cannot simply find images
with masks around objects on the
internet instead we start by training
our model with human help to correct the
predicted masks we then repeat with less
and less human involvement primarily for
the objects that the model didn't see
before but where is prompting used it's
used to say what we want to segment from
the image as we've talked in my recent
podcast episode with sander sulath
founder of learn prompting which I think
you should listen to a prompt can be
anything in this case it's either text
or spatial information like a rough box
or just a point on the image basically
asking what you want or showing it then
we use an image encoder as with all
segmentation tasks and a prompt encoder
the image encoder will be similar to
most I already covered on the channel
where we take the image and basically
extracts the most valuable information
from it using a neural network here the
novelty is our prompt encoder having
this prompt encoder separated from our
image encoder is what makes the approach
so fast and responsive since we can
simply process the image once and then
iterate prompts to segment multiple
objects as you can see by yourself in
their online demo the image encoder is
another Vision Transformer or vit that
you can learn more about in my vision
Transformer video if you'd like it will
produce our image embeddings which are
our extracted information then we will
use this information along with our
prompts to generate a segmentation but
how can we combine our text and spatial
prompts to this image embedding we
represent the spatial prompts through
the use of positional encodings
basically giving the spatial information
as is then for the text it's simple we
use clip as always a model able to
encode text similar to how images are
encoded clip is amazing for this
application since it was trained with
tons of image caption pairs to encode
both similarly so when it gets a clear
text prompt it's a bridge for comparing
text and images and finally we need to
produce a good segmentation from all
those information this can be done using
a decoder which is simply put the
reverse network of the image encoder
taking condensed information and
recreating an image though now we only
want to create masks that we put back
over the initial image so it's much
easier than generating a completely new
image as Delhi or mid Journey does such
models use diffusion models but in this
case they decided to go for a similar
architecture as the image encoder a
vision Transformer based decoder that
works really well and voila this was a
simple overview of how the new Sam model
by meta works of course it's not perfect
and has limitations like missing fine
structures or sometimes hallucinating
small disconnected components
still it's extremely powerful and a huge
step forward introducing a new
interesting and highly applicable tasks
I invite you to read meta's great blog
post and paper to learn more about the
model or try it directly with their code
or demo all the links are in the
description below I hope you've enjoyed
this episode and I will see you next
time with another amazing paper
[Music]
foreign
►Lesen Sie den vollständigen Artikel: https://www.louisbouchard.ai/meta-sam/
►Artikel: Kirillov et al., Meta, (2023): Segment Anything, https://ai.facebook.com/research/publications/segment-anything/
►Demo: https://segment-anything.com/demo
►Code: https://github.com/facebookresearch/segment-anything
►Datensatz: https://segment-anything.com/dataset/index.html
►Mein Newsletter (Eine neue KI-Anwendung, die wöchentlich in Ihren E-Mails erklärt wird!): https://www.louisbouchard.ai/newsletter/
►Unterstützen Sie mich auf Patreon: https://www.patreon.com/whatsai
►Unterstützen Sie mich, indem Sie Merch tragen: https://whatsai.myshopify.com/
►Treten Sie unserem KI-Discord bei: https://discord.gg/learnaitogether
Metas neues Segment Anything Model (SAM) ist ein Game Changer | HackerNoon