Have you ever had an image you really liked and could only manage to find a small version of it that looked like this image above on the left? How cool would it be if you could take this image and make it twice look as good? It’s great, but what if you could make it even four or eight times more high definition? Now we’re talking, just look at that.
Here we enhanced the resolution of the image by a factor of four, meaning that we have four times more height and width pixels for more details, making it look a lot smoother. The best thing is that this is done within a few seconds, completely automatically, and works with pretty much any image. Oh, and you can even use it yourself with a demo they made available... Watch more results and learn about how it works in the video!
►Read the full article: https://www.louisbouchard.ai/swinir/
►Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L. and Timofte, R.,
2021. SwinIR: Image restoration using swin transformer. In Proceedings
of the IEEE/CVF International Conference on Computer Vision (pp.
1833-1844).
►Code: https://github.com/JingyunLiang/SwinIR
►Demo: https://replicate.ai/jingyunliang/swinir
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
00:00
have you ever had an image and really
00:01
liked it but couldn't manage to find a
00:04
better version than this how cool would
00:06
it be if you could take this image and
00:08
make it look twice as good it would be
00:10
great but what if i could make it even
00:12
four or eight times more high definition
00:15
now we are talking just look at that
00:18
here we enhance the resolution of the
00:19
image by a factor of 4 meaning that we
00:22
have 4 times more height and width
00:24
pixels for more details making it look a
00:27
lot smoother the best thing is that this
00:29
is done within a few seconds completely
00:31
automatically and works with pretty much
00:33
any image oh and you can even use it
00:36
yourself with a demo they made available
00:38
as we will see during the video
00:40
speaking of enhancing resolution i'm
00:42
always looking to enhance different
00:44
aspects of how i work and share what i
00:46
make if you are working on machine
00:47
learning problems there is no better way
00:49
to enhance your workflows than with this
00:51
episode sponsor weights and biases waste
00:53
and biases is a ml ups platform where
00:56
you can keep track of your machine
00:57
learning experiments insights and ids a
01:00
feature i especially love is how you can
01:02
quickly create and share amazing looking
01:04
interactive reports like this one
01:06
clearly showing your team or future self
01:08
your runs matrix hyper parameter and
01:10
data configurations alongside any notes
01:13
you had at the time capturing and
01:15
sharing your work is essential if you
01:16
want to grow as a ml practitioner which
01:18
is why i highly recommend using tools
01:20
that improve your work like weights and
01:22
biases just try it with the first link
01:24
below and i will owe you an apology if
01:26
you haven't been promoted within a year
01:29
before getting into this amazing model
01:31
we have to first introduce the concept
01:33
of photo of sampling or image super
01:36
resolution the goal here is to construct
01:38
a high resolution image from a
01:40
corresponding low resolution input image
01:42
which is a face in this case but it can
01:44
be any object animal or landscape the
01:47
low resolution will be such as 512
01:50
pixels or smaller not that blurry but
01:53
it's clearly not high definition when
01:55
you have it full screen just take a
01:57
second to put the video on full screen
01:59
and you'll see the artifacts while we
02:01
are at it you should also take a few
02:02
more seconds to like the video and send
02:05
it to a friend or two i'm convinced they
02:07
will love this and will thank you for it
02:09
anyway we take the slow definition image
02:11
and transform it into a high definition
02:13
image with a much clearer face in this
02:16
case a 2048 pixel square image which is
02:19
4 times more hd to achieve that we
02:22
usually have a typical unit like
02:24
architecture with convolutional neural
02:26
networks which i covered in many videos
02:28
before like the one appearing on the top
02:30
right corner of your screen if you'd
02:32
like to learn more about how they work
02:34
the main downside is that cnns have
02:36
difficulty adapting to extremely broad
02:39
data sets since they have the same
02:40
kernels for all images which makes them
02:43
great for local results and
02:44
generalization but less powerful for the
02:47
overall results when we want the best
02:48
results for each individual image on the
02:51
other hand transformers are promising
02:53
architecture due to the self-attention
02:55
mechanism capturing global interactions
02:57
between contexts for each image but have
03:00
heavy computations that are not suitable
03:02
for images here instead of using cnn's
03:05
or transformers they created the same
03:07
unit-like architecture with both
03:09
convolution and attention mechanisms or
03:12
more precisely using the swin
03:14
transformer architecture the swin
03:16
transformer is amazing since it has both
03:18
the advantages of the cnns to process
03:20
images of larger sizes and prepare them
03:23
for the attention mechanisms and these
03:25
attention mechanisms will create
03:27
long-range connections so that the model
03:29
understands the overall image much
03:31
better and can also recreate the same
03:33
image in a better way i won't enter into
03:36
the details of the swin transformer as i
03:38
already covered this architecture a few
03:40
months ago and explain its difference
03:42
with cnns and classical transformer
03:44
architectures used in natural language
03:46
processing if you'd like to learn more
03:47
about it and how the researchers applied
03:49
transformers to vision check out the
03:52
video and come back for the explanation
03:54
of the subsampling model the model is
03:56
called swin ir and can do many tasks
03:59
which include image of sampling as i
04:01
said it uses convolutions to allow for
04:03
bigger images more precisely they use a
04:06
convolutional layer to reduce the size
04:08
of the image which you can see here this
04:10
reduced image is then sent into the
04:12
model and also passed directly to the
04:15
reconstruction module to give the model
04:17
general information about the image as
04:20
we will see in a few seconds this
04:21
representation will basically look like
04:23
many weird blurry versions of the image
04:26
giving valuable information to the
04:28
upscaling module and how the overall
04:30
image should look like then we see the
04:33
swing transformer layers coupled with
04:35
convolutions this is to compress the
04:37
image further and always extract more
04:39
valuable precise information about both
04:42
the style and details while forgetting
04:44
about the overall image this is why we
04:46
then add the convoluted image to give
04:48
the overall information we lack with a
04:51
skip connection all at this is finally
04:53
sent into a reconstruction module called
04:55
subpixel which looks like this and uses
04:58
both the larger general features and
05:01
smaller detailed features we just
05:03
created to reconstruct a higher
05:05
definition image you can see this as a
05:07
convolutional neural network but in
05:09
reverse or simply a decoder taking the
05:12
condensed features we have and
05:14
reconstructing a bigger image from it
05:16
again if you'd like to learn more about
05:18
cnns and decoders you should check some
05:20
of the videos i made covering them so
05:22
you basically send your image in a
05:24
convolutional layer take this new
05:26
representation save it for later while
05:29
also sending it in the swin transformer
05:31
architecture to condense the information
05:33
further and learn the most important
05:35
features to reconstruct then you take
05:38
these new features with the saved ones
05:40
and use a decoder to reconstruct the
05:42
high definition version and voila now
05:45
you only need enough data and you will
05:47
have results like this
05:54
[Music]
05:59
of course as with all research there are
06:01
some limitations in this case probably
06:03
due to the initial convolutional layer
06:06
it doesn't work really well with very
06:07
small images under 200 pixels wide you
06:10
may see artifacts and weird results like
06:12
this one appear it seems like you can
06:14
also remove wrinkles using the bigger of
06:17
scalers which can be a useful artifact
06:20
if you are looking to do that other than
06:21
that the results are pretty crazy and
06:24
for having played with it a lot in the
06:26
past few days the four times upscaling
06:28
is incredible and you can play with it
06:30
too they made the github repo available
06:32
for everyone with pre-trained models and
06:35
even a demo you can play with right away
06:37
without any code of course this was just
06:40
an overview of this amazing new model
06:42
and i will strongly invite you to read
06:44
their paper for a deeper technical
06:46
understanding everything is linked in
06:48
the description let me know what you
06:49
think and i hope you've enjoyed this
06:52
video thank you once again weights and
06:54
biases for sponsoring this video and to
06:56
anyone still watching see you next week
06:59
with another exciting paper