I’m sure you’ve all clicked on a video thumbnail from to see water floating in the air when popping a water balloon or other super cool-looking “slow-mos” made with extremely expensive cameras. Now, we are lucky enough to be able to do something not really comparable but still quite cool with our phones. What if you could reach the same quality without such an expensive setup? the slow mo guys Well, that’s exactly what Time Lens, a new model published by Tulyakov et al. can do with extreme precision. Just look at that video, the results are amazing! It generated slow-motion videos of over 900 frames per second out of videos of only 50 FPS! This is possible by guessing what the frames in-between the real frames could look like, and it is an incredibly challenging task. Learn more in the video and check out the crazy results. Watch the video References The full article: Official code: Stepan Tulyakov*, Daniel Gehrig*, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, https://www.louisbouchard.ai/timelens/ https://github.com/uzh-rpg/rpg_timelens http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf Video transcript 00:00 i'm sure you've all clicked on a video 00:02 thumbnail from the slo-mo guys to see 00:04 the water floating in the air when 00:06 popping a water balloon or any other 00:08 super cool looking slo-mo they made with 00:10 extremely expensive cameras now we are 00:13 lucky enough to be able to do something 00:15 not really comparable but still quite 00:17 cool with our phones what if you could 00:20 reach the same quality without such an 00:22 expensive setup well that's exactly what 00:25 time lens a new model published by 00:27 tuliakov it all can do with extreme 00:30 precision just look at that it generated 00:32 a slow motion videos of over 900 frames 00:36 per second out of videos of only 50 00:39 frames per second this is possible by 00:41 guessing what the frames in between the 00:43 real frames could look like and it's an 00:46 incredibly challenging task instead of 00:48 attacking it with the classical idea of 00:50 using the optical flow of the videos to 00:52 guess the movement of the particles they 00:54 used a simple setup with two cameras and 00:57 one of them is very particular by the 01:00 way if you work in the ai field and want 01:02 to have your models online running on 01:04 web apps i'm sure you will love the 01:06 sponsor of this video ubs stick until 01:09 the end to learn more about them and how 01:11 they can be quite handy for you let's 01:13 get back to the paper the first camera 01:15 is the basic camera recording the rgb 01:18 frames as you know them the second one 01:20 on the other hand is an event camera 01:22 this kind of camera uses novel sensors 01:25 that only reports the pixel intensity 01:28 changes instead of the current pixel 01:30 intensities which a regular camera does 01:32 and it looks just like this this camera 01:35 provides information in between the 01:37 regular frames due to the compressed 01:39 representation of the information they 01:41 report compared to regular images this 01:44 is because the camera reports only 01:46 information regarding the pixels that 01:48 changed and in a lower resolution making 01:50 it much easier to record at a higher 01:52 rate making it a high temporal 01:54 resolution camera but low definition you 01:57 can see this as sacrificing the quality 02:00 of the images it captures in exchange 02:02 for more images fortunately this lack of 02:06 image quality is fixed by using the 02:08 other frame based camera which we will 02:10 see in a few seconds time lens leverages 02:13 these two types of cameras the frame and 02:16 the event cameras using machine learning 02:18 to maximize these two cameras type of 02:20 information and better reconstruct what 02:23 actually happened between those frames 02:26 something that even our eyes cannot see 02:28 in fact it achieved results that our 02:30 intelligent phones and no other models 02:33 could reach before here's how they 02:35 achieve that as you know we start with 02:37 the typical frame which comes from the 02:39 regular camera with something between 20 02:42 and 60 frames per second this cannot do 02:44 much as you need much more frames in a 02:47 second to achieve a slow motion effect 02:49 like this one more precisely to look 02:51 interesting you need at least 300 frames 02:54 per second which means that we have 300 02:57 images for only one second of video 03:00 footage but how can we go from 20 or so 03:03 frames to 300 we cannot create the 03:06 missing frames this is just too little 03:08 information to interpolate from well we 03:11 use the event based camera which 03:13 contains much more time-wise information 03:16 than the frames as you can see here it 03:18 basically contains incomplete frames in 03:20 between the real frames but they are 03:22 just informative enough to help us 03:24 understand the movement of the particles 03:26 and still grasp the overall image using 03:28 the real frames around them the events 03:31 and frame information are both sent into 03:33 two modules to train and interpolate the 03:36 in-between frames we need the warping 03:38 based interpolation and the 03:40 interpolation by synthesis modules this 03:43 warping module is the main tool to 03:45 estimate the motion from events instead 03:48 of the frames like the synthesis module 03:50 does it takes the frames and events and 03:52 translates them into optical flow 03:54 representation using a classic u-net 03:57 shaped network this network simply takes 03:59 images as inputs encodes them and then 04:02 decodes them into a new representation 04:05 this is possible because the model is 04:07 trained to achieve this task on huge 04:09 data sets as you may know i already 04:11 covered similar architectures numerous 04:13 times on my channel which you can find 04:15 with various applications for more 04:17 details but in short you can see it as 04:20 an image to image translation tool that 04:22 just changes the style of the image 04:24 which in this case takes the events and 04:27 finds an optimal optic flow 04:28 representation for it to create a new 04:31 frame for each event it basically 04:33 translates an event image into a real 04:35 frame by trying to understand what's 04:37 happening in the image with the optical 04:40 flow if you are not familiar with 04:41 optical flow i'd strongly recommend 04:43 watching my video covering a great paper 04:45 about it that was published at the same 04:47 conference a year ago the interpolation 04:50 by synthesis module is quite 04:52 straightforward it is used because it 04:54 can handle new objects appearing between 04:57 frames and changes in lighting like the 04:59 water reflection shown here 05:02 this is due to the fact that it uses a 05:04 similar u-net shaped network to 05:06 understand the frames with the events to 05:08 generate a new fictional frame in this 05:11 case the unit takes the events in 05:13 between two frames and generates a new 05:15 possible frame for each event directly 05:18 instead of going through the optical 05:20 flow the main drawback here is that 05:23 noise may appear due to the lack of 05:24 information regarding the movement in 05:27 the image which is where the other 05:28 module helps then the first module is 05:31 refined using even more information from 05:34 the interpolation synthesis i just 05:36 covered it basically extracts the most 05:38 valuable information about these two 05:40 generated frames of the same event to 05:43 refine the warped representation and 05:45 generate a third version of each event 05:48 using a unit network again finally these 05:52 three frame candidates are sent into an 05:54 attention-based averaging module this 05:56 last module simply takes these three 05:59 newly generated frames and combines them 06:02 into a final frame which will take only 06:04 the best parts of all three possible 06:07 representation which is also learned by 06:09 training the network to achieve that if 06:11 you are not familiar with the concept of 06:13 attention i'd strongly recommend 06:15 watching the video i made covering how 06:17 it works with images you now have a high 06:20 definition frame for the first event in 06:23 between your frames and just need to 06:24 repeat this process for all the events 06:27 given by your event camera and voila 06:30 this is how you can create amazing 06:32 looking and realistic slow motion videos 06:34 using artificial intelligence if you 06:37 watch until now and enjoy this paper 06:39 overview i'm sure you are more than 06:41 interested in this field and you may 06:43 have developed a machine learning model 06:45 for yourself or for work and at some 06:47 point you most probably wanted to deploy 06:50 your models run them live in the cloud 06:52 and make them available for others to 06:55 use or call them from other applications 06:58 you most certainly know that setting up 07:00 a serving infrastructure to do this can 07:02 be a very challenging task especially 07:05 when you like to focus on research as i 07:07 do luckily my friends at ubf and the 07:10 sponsors of this video built a solution 07:13 for us it's a fully managed free serving 07:16 and hosting platform that helps you 07:18 deploy your code as a web service with 07:20 an api super easily the ubs platform is 07:24 very user friendly it helps you turn 07:26 your scripts and models into live web 07:28 services within minutes you can also 07:30 create more complex data pipelines 07:32 combining different services together do 07:35 version controls on your models and much 07:37 more you can use it for yourself or as a 07:40 data science team there's a lot of cool 07:42 functionality to explore you can try it 07:45 for yourself by visiting ubs.com and 07:47 creating an account for free their free 07:50 tier already has a lot of monthly 07:52 compute budget and allows you to use all 07:55 the functionality so there's literally 07:57 no reason not to check it out you can 07:59 find a wide range of examples for 08:01 working with tools like scikit-learn 08:04 tensorflow or other familiar frameworks 08:06 and information you need on their dex 08:08 and github plus their team is there to 08:11 help you with a slack server available 08:13 for anyone to join and ask questions 08:16 click this card to sign up for a free 08:18 account or see the first link in the 08:20 description you will be impressed with 08:22 their toolkit and how easy it is to use 08:25 as always if you are curious about this 08:27 model the link to the code and paper are 08:29 in the description below thank you again 08:31 ubs for sponsoring this video and many 08:34 thanks to you for watching it until the 08:36