I’m sure you’ve all clicked on a video thumbnail from the slow mo guys to see water floating in the air when popping a water balloon or other super cool-looking “slow-mos” made with extremely expensive cameras. Now, we are lucky enough to be able to do something not really comparable but still quite cool with our phones. What if you could reach the same quality without such an expensive setup?
Well, that’s exactly what Time Lens, a new model published by Tulyakov et al. can do with extreme precision.
Just look at that video, the results are amazing! It generated slow-motion videos of over 900 frames per second out of videos of only 50 FPS!
This is possible by guessing what the frames in-between the real frames could look like, and it is an incredibly challenging task.
Learn more in the video and check out the crazy results.
The full article: https://www.louisbouchard.ai/timelens/
Official code: https://github.com/uzh-rpg/rpg_timelens
Stepan Tulyakov*, Daniel Gehrig*, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf
00:00
i'm sure you've all clicked on a video
00:02
thumbnail from the slo-mo guys to see
00:04
the water floating in the air when
00:06
popping a water balloon or any other
00:08
super cool looking slo-mo they made with
00:10
extremely expensive cameras now we are
00:13
lucky enough to be able to do something
00:15
not really comparable but still quite
00:17
cool with our phones what if you could
00:20
reach the same quality without such an
00:22
expensive setup well that's exactly what
00:25
time lens a new model published by
00:27
tuliakov it all can do with extreme
00:30
precision just look at that it generated
00:32
a slow motion videos of over 900 frames
00:36
per second out of videos of only 50
00:39
frames per second this is possible by
00:41
guessing what the frames in between the
00:43
real frames could look like and it's an
00:46
incredibly challenging task instead of
00:48
attacking it with the classical idea of
00:50
using the optical flow of the videos to
00:52
guess the movement of the particles they
00:54
used a simple setup with two cameras and
00:57
one of them is very particular by the
01:00
way if you work in the ai field and want
01:02
to have your models online running on
01:04
web apps i'm sure you will love the
01:06
sponsor of this video ubs stick until
01:09
the end to learn more about them and how
01:11
they can be quite handy for you let's
01:13
get back to the paper the first camera
01:15
is the basic camera recording the rgb
01:18
frames as you know them the second one
01:20
on the other hand is an event camera
01:22
this kind of camera uses novel sensors
01:25
that only reports the pixel intensity
01:28
changes instead of the current pixel
01:30
intensities which a regular camera does
01:32
and it looks just like this this camera
01:35
provides information in between the
01:37
regular frames due to the compressed
01:39
representation of the information they
01:41
report compared to regular images this
01:44
is because the camera reports only
01:46
information regarding the pixels that
01:48
changed and in a lower resolution making
01:50
it much easier to record at a higher
01:52
rate making it a high temporal
01:54
resolution camera but low definition you
01:57
can see this as sacrificing the quality
02:00
of the images it captures in exchange
02:02
for more images fortunately this lack of
02:06
image quality is fixed by using the
02:08
other frame based camera which we will
02:10
see in a few seconds time lens leverages
02:13
these two types of cameras the frame and
02:16
the event cameras using machine learning
02:18
to maximize these two cameras type of
02:20
information and better reconstruct what
02:23
actually happened between those frames
02:26
something that even our eyes cannot see
02:28
in fact it achieved results that our
02:30
intelligent phones and no other models
02:33
could reach before here's how they
02:35
achieve that as you know we start with
02:37
the typical frame which comes from the
02:39
regular camera with something between 20
02:42
and 60 frames per second this cannot do
02:44
much as you need much more frames in a
02:47
second to achieve a slow motion effect
02:49
like this one more precisely to look
02:51
interesting you need at least 300 frames
02:54
per second which means that we have 300
02:57
images for only one second of video
03:00
footage but how can we go from 20 or so
03:03
frames to 300 we cannot create the
03:06
missing frames this is just too little
03:08
information to interpolate from well we
03:11
use the event based camera which
03:13
contains much more time-wise information
03:16
than the frames as you can see here it
03:18
basically contains incomplete frames in
03:20
between the real frames but they are
03:22
just informative enough to help us
03:24
understand the movement of the particles
03:26
and still grasp the overall image using
03:28
the real frames around them the events
03:31
and frame information are both sent into
03:33
two modules to train and interpolate the
03:36
in-between frames we need the warping
03:38
based interpolation and the
03:40
interpolation by synthesis modules this
03:43
warping module is the main tool to
03:45
estimate the motion from events instead
03:48
of the frames like the synthesis module
03:50
does it takes the frames and events and
03:52
translates them into optical flow
03:54
representation using a classic u-net
03:57
shaped network this network simply takes
03:59
images as inputs encodes them and then
04:02
decodes them into a new representation
04:05
this is possible because the model is
04:07
trained to achieve this task on huge
04:09
data sets as you may know i already
04:11
covered similar architectures numerous
04:13
times on my channel which you can find
04:15
with various applications for more
04:17
details but in short you can see it as
04:20
an image to image translation tool that
04:22
just changes the style of the image
04:24
which in this case takes the events and
04:27
finds an optimal optic flow
04:28
representation for it to create a new
04:31
frame for each event it basically
04:33
translates an event image into a real
04:35
frame by trying to understand what's
04:37
happening in the image with the optical
04:40
flow if you are not familiar with
04:41
optical flow i'd strongly recommend
04:43
watching my video covering a great paper
04:45
about it that was published at the same
04:47
conference a year ago the interpolation
04:50
by synthesis module is quite
04:52
straightforward it is used because it
04:54
can handle new objects appearing between
04:57
frames and changes in lighting like the
04:59
water reflection shown here
05:02
this is due to the fact that it uses a
05:04
similar u-net shaped network to
05:06
understand the frames with the events to
05:08
generate a new fictional frame in this
05:11
case the unit takes the events in
05:13
between two frames and generates a new
05:15
possible frame for each event directly
05:18
instead of going through the optical
05:20
flow the main drawback here is that
05:23
noise may appear due to the lack of
05:24
information regarding the movement in
05:27
the image which is where the other
05:28
module helps then the first module is
05:31
refined using even more information from
05:34
the interpolation synthesis i just
05:36
covered it basically extracts the most
05:38
valuable information about these two
05:40
generated frames of the same event to
05:43
refine the warped representation and
05:45
generate a third version of each event
05:48
using a unit network again finally these
05:52
three frame candidates are sent into an
05:54
attention-based averaging module this
05:56
last module simply takes these three
05:59
newly generated frames and combines them
06:02
into a final frame which will take only
06:04
the best parts of all three possible
06:07
representation which is also learned by
06:09
training the network to achieve that if
06:11
you are not familiar with the concept of
06:13
attention i'd strongly recommend
06:15
watching the video i made covering how
06:17
it works with images you now have a high
06:20
definition frame for the first event in
06:23
between your frames and just need to
06:24
repeat this process for all the events
06:27
given by your event camera and voila
06:30
this is how you can create amazing
06:32
looking and realistic slow motion videos
06:34
using artificial intelligence if you
06:37
watch until now and enjoy this paper
06:39
overview i'm sure you are more than
06:41
interested in this field and you may
06:43
have developed a machine learning model
06:45
for yourself or for work and at some
06:47
point you most probably wanted to deploy
06:50
your models run them live in the cloud
06:52
and make them available for others to
06:55
use or call them from other applications
06:58
you most certainly know that setting up
07:00
a serving infrastructure to do this can
07:02
be a very challenging task especially
07:05
when you like to focus on research as i
07:07
do luckily my friends at ubf and the
07:10
sponsors of this video built a solution
07:13
for us it's a fully managed free serving
07:16
and hosting platform that helps you
07:18
deploy your code as a web service with
07:20
an api super easily the ubs platform is
07:24
very user friendly it helps you turn
07:26
your scripts and models into live web
07:28
services within minutes you can also
07:30
create more complex data pipelines
07:32
combining different services together do
07:35
version controls on your models and much
07:37
more you can use it for yourself or as a
07:40
data science team there's a lot of cool
07:42
functionality to explore you can try it
07:45
for yourself by visiting ubs.com and
07:47
creating an account for free their free
07:50
tier already has a lot of monthly
07:52
compute budget and allows you to use all
07:55
the functionality so there's literally
07:57
no reason not to check it out you can
07:59
find a wide range of examples for
08:01
working with tools like scikit-learn
08:04
tensorflow or other familiar frameworks
08:06
and information you need on their dex
08:08
and github plus their team is there to
08:11
help you with a slack server available
08:13
for anyone to join and ask questions
08:16
click this card to sign up for a free
08:18
account or see the first link in the
08:20
description you will be impressed with
08:22
their toolkit and how easy it is to use
08:25
as always if you are curious about this
08:27
model the link to the code and paper are
08:29
in the description below thank you again
08:31
ubs for sponsoring this video and many
08:34
thanks to you for watching it until the
08:36