This AI Removes Unwanted Objects From Your Images! by@whatsai

This AI Removes Unwanted Objects From Your Images!

This task of removing part of an image and replacing it with what should appear behind has been tackled by many AI researchers for a long time. It is called image inpainting, and it’s extremely challenging. Learn more in the video!
image
Louis Bouchard HackerNoon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

linkedin social iconfacebook social icontwitter social icongithub social iconyoutube social iconinstagram social icon

Learn how this algorithm can understand images and automatically remove the undesired object or person and save your future Instagram post!

You’ve most certainly experienced this situation once: You take a great picture with your friend, and someone is photobombing behind you, ruining your future Instagram post. Well, that’s no longer an issue. Either it is a person or a trashcan you forgot to remove before taking your selfie that’s ruining your picture. This AI will just automatically remove the undesired object or person in the image and save your post. It’s just like a professional photoshop designer in your pocket, and with a simple click!

This task of removing part of an image and replacing it with what should appear behind has been tackled by many AI researchers for a long time. It is called image inpainting, and it’s extremely challenging. Learn more in the video!

Watch the video

References

► Complete article: https://www.louisbouchard.ai/lama/
► Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A.,
Silvestrov, A., Kong, N., Goka, H., Park, K. and Lempitsky, V., 2022.
Resolution-robust Large Mask Inpainting with Fourier Convolutions. In
Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision (pp. 2149-2159).
► Code: https://github.com/saic-mdal/lama
► Colab Demo: https://colab.research.google.com/github/saic-mdal/lama/blob/master/colab/LaMa_inpainting.ipynb
Product using LaMa: https://cleanup.pictures/
► Fourier Domain explained by the great  @3Blue1Brown :
► Great in-depth explanation of LaMa with the authors by  @Yannic Kilcher  :
► My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

00:00

you've most certainly experienced this

00:01

situation once you take a great picture

00:04

with your friend and someone is

00:05

photobombing behind you ruining your

00:07

future instagram post well that's no

00:10

longer an issue either it's a person or

00:12

a trash can you forgot to remove before

00:14

taking your selfie that's ruining your

00:16

picture this ai will automatically

00:18

remove the undesired object or person in

00:21

the image and save your post it's just

00:23

like a professional photoshop designer

00:25

in your pocket with a simple click this

00:27

task of removing part of an image and

00:29

replacing it with what should appear

00:31

behind it has been tackled by many ai

00:34

researchers for a long time it's called

00:36

image and painting and it's extremely

00:38

challenging as you will see the paper i

00:40

want to show you achieves it with

00:41

incredible results and can do it easily

00:43

in high definition unlike previous

00:45

approaches you may have heard of before

00:48

you definitely want to stay until the

00:49

end of the video to see that you won't

00:51

believe how great and realistic it looks

00:54

for something produced in a split second

00:56

by an algorithm as i said this task of

00:59

imaging painting is basically you

01:00

removing unwanted objects from your

01:02

images you should be doing the same in

01:05

your work life and remove any friction

01:07

your next step as an ar professional or

01:09

student to do that should be to do like

01:11

me and try the sponsor of today's

01:13

episode weights and biases if you run a

01:16

lot of experiments you should be using

01:18

weights and vices it will remove all

01:20

painful steps from hyperparameter tuning

01:22

to results analysis with a handful of

01:25

lines of code added and it's entirely

01:27

free for personal usage it takes not

01:29

even five minutes to set up and you

01:31

don't have anything else to do forever

01:34

talking about removing friction points i

01:36

don't think you can do better than that

01:38

weight and biases has everything you

01:39

need for your code to be reproducible

01:42

without you even trying for your

01:44

well-being do like me and give weights

01:46

and biases a try for free with the first

01:48

link below to remove an object from an

01:50

image the machine needs to understand

01:52

what should appear behind the subject

01:54

and to do this will require having a

01:56

three-dimensional understanding of the

01:58

world as humans do but it doesn't have

02:00

that it just has access to a few pixels

02:03

in an image which is why it's so

02:05

complicated whereas it looks quite

02:06

simple for us that can simply imagine

02:09

the depth and guess that there should be

02:11

the rest of the wall here the window and

02:13

etc we basically need to teach the

02:15

machine how the world typically looks

02:17

like

02:18

so we will do that using a lot of

02:20

examples of real world images so that it

02:23

can have an idea of what our world looks

02:25

like in the two-dimensional picture

02:27

world which is not a perfect approach

02:29

but does the job then another problem

02:32

comes with the computational cost of

02:34

using real-world images with way too

02:36

many pixels to fix that most current

02:38

approaches work with low quality images

02:41

so a downsized version of the image that

02:43

is manageable for our computers and

02:46

upscale the inpainted part at the end to

02:48

replace it in the original image making

02:50

the final results look worse than it

02:53

could be or at least they won't look

02:55

great enough to be shared on instagram

02:57

and have all the likes you deserve you

02:59

can't really feel it high quality images

03:01

directly as it will take way too much

03:03

time to process and train or can you

03:06

well these are the main problems the

03:08

researchers attacked in this paper and

03:10

here's how roman suvarov ital from

03:13

samsung research introduced a new

03:15

network called llama that is quite

03:17

particular as you can see in image and

03:19

painting you will typically send the

03:21

initial image as well as what you'd like

03:23

to remove from it this is called a mask

03:26

and will cover the image as you can see

03:28

here and the network won't have access

03:30

to this information anymore as it needs

03:32

to fill in the pixels then it has to

03:35

understand the image and try to fill in

03:37

the same pixels it thinks should fit

03:39

best so in this case they start like any

03:41

other network and downscale the image

03:44

but don't worry their technique will

03:45

allow them to keep the same quality as a

03:47

high resolution image this is because

03:50

here in the processing of the image they

03:52

use something a bit different than usual

03:54

typically we can see different networks

03:56

here in the middle mostly convolutional

03:58

neural networks such networks are often

04:01

used on images due to how convolutions

04:03

work which i explained in other videos

04:05

like the one appearing on the top right

04:07

of your screen if you are interested in

04:09

how it works in short the network will

04:11

work in two steps first it will compress

04:14

the image and try to only save relevant

04:16

information the network will end up

04:18

conserving mostly the general

04:20

information about the image like its

04:22

color overall style or general object

04:24

appearing but not precise details then

04:27

it will try to reconstruct the image

04:29

using the same principles but backward

04:32

we use some tricks like skip connections

04:34

that will save information from the

04:35

first few layers of the network and pass

04:38

it along the second step so that it can

04:40

orient it towards the right objects in

04:42

short it easily knows that there's a

04:44

tower with a blue sky and trees called

04:47

global information but it needs the skip

04:49

connections to know that it's the eiffel

04:51

tower in the middle of the screen that

04:53

there are clouds here and there the

04:55

trees have these colors etc all the fine

04:58

grained details which we call local

05:00

information following a long training

05:02

with many examples we will expect our

05:04

network to reconstruct the image or at

05:06

least a very similar image that contains

05:09

the same kind of objects and be very

05:11

similar if not identical to the initial

05:14

image but remember in this case we are

05:16

working with low quality images that we

05:18

need to upscale which will hurt the

05:20

quality of the results the particularity

05:22

here is that instead of using

05:24

convolutions as in regular convolutional

05:26

networks and skip connections to keep

05:28

local knowledge it uses what we call the

05:31

fast fourier convolution or ffc this

05:34

means that the network will work in both

05:36

the spatial and frequency domains and

05:39

doesn't need to get back to the early

05:40

layers to understand the context of the

05:42

image each layer will work with

05:44

convolutions in the spatial domain to

05:46

process local features and use 4g

05:49

convolutions in the frequency domain to

05:51

analyze global features the frequency

05:53

domain is a bit special and i linked a

05:55

great video covering it in the

05:57

description below if you are curious it

05:58

will basically transform your image into

06:00

all possible frequencies just like sound

06:03

waves and tell you how much of each

06:05

frequency the image contains so each new

06:09

pixel of this newly created image will

06:11

represent a frequency covering the whole

06:13

spatial image and how much it is present

06:16

instead of colors the frequencies here

06:19

are just the repeated patterns at

06:21

different scales for example one of

06:23

these frequency pixels could be highly

06:25

activated by the vertical lines at a

06:27

specific distance from each other in

06:29

this case it could be the same distance

06:31

as the length of a brick so it will be

06:33

highly activated if there is a brick

06:35

wall in the image from this you'd

06:37

understand that there's probably a brick

06:39

wall and the size proportional to how

06:41

much it is activated and you can repeat

06:43

this for all pixels being activated for

06:45

similar patterns giving you good hints

06:48

of the overall aspect of the image but

06:50

nothing about the object themselves or

06:52

the colors the spatial domain will take

06:54

charge of this so doing convolutions on

06:57

this new 4d image allows you to work

06:59

with the whole image at each step of the

07:01

convolution process so it has access to

07:04

a much better global understanding of

07:05

the image even at early layers without

07:08

much computational cost which is

07:10

impossible to achieve with regular

07:11

convolutions in the spatial domain then

07:14

both global and local results are saved

07:17

and sent to the next layer which will

07:19

repeat these steps you will end up with

07:21

the final image that you can upscale

07:23

back the use of the fourier domain is

07:25

what makes it scalable to bigger images

07:27

as their image resolution doesn't affect

07:29

the fourier domain since it uses

07:31

frequencies over the whole image instead

07:34

of colors and the repeated pattern it's

07:36

looking for will be the same whatever

07:38

the size of the image meaning that even

07:40

with training this network with small

07:42

images you will be able to feed it much

07:44

larger images afterward and get amazing

07:47

results

07:54

as you can see the results are not

07:55

perfect but they are quite impressive

07:57

and i'm excited to see what they will do

07:59

next to improve them of course this was

08:01

just a simple overview of this new model

08:03

and you can find more detail about the

08:05

implementation in the paper linked in

08:07

the description below you can also

08:09

implement it yourself with the code link

08:11

below as well i hope you enjoyed the

08:13

video and if so please take a second to

08:16

share it with a friend that could find

08:17

this interesting it will mean a lot and

08:20

help this channel grow

08:21

thank you for watching

08:24

[Music]

react to story with heart
react to story with light
react to story with boat
react to story with money
L O A D I N G
. . . comments & more!