This model takes a picture, understands which particles are supposed to be moving, and realistically animates them in an infinite loop while conserving the rest of the picture entirely. The end result is amazingly realistic videos like this one, using only still pictures to generate it. Watch the video References ►Read the full article: ►Paper: Holynski, Aleksander, et al. "Animating Pictures with Eulerian Motion Fields." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021., ►Project link (code coming soon): https://www.louisbouchard.ai/animate-pictures/ https://arxiv.org/abs/2011.15128 https://eulerian.cs.washington.edu/ Video Transcript 00:00 Have you ever taken a beautiful landscape picture and later on you noticed that it didn't 00:05 look quite as good as when you were there. 00:07 It may be because you just cannot freeze such a real-life landscape and expect it to look 00:12 as good. 00:13 In that case, what about having this picture animated where the normally-moving particles 00:18 would be in constant movement, just like the moment you took the photo? 00:22 Observing the water flow or see the smoke disperse in the air. 00:25 Well, this is what a new algorithm from Facebook and the University of Washington does. 00:30 It takes a picture, understands which particles are supposed to be moving, and realistically 00:35 animates them in an infinite loop while conserving the rest of the picture entirely still creating 00:41 amazing-looking videos like this one. 00:44 Sincerely, I don't know why but I LOVE how it looks and wanted to share their work. 00:49 What do you think about these results, and how would you use them? 00:53 Personally, once the code is released, I am using these as desktop backgrounds. 00:57 Now that we've seen what it can achieve, I hope you are as excited as I was when discovering 01:02 this paper. 01:03 Let's get into the even more interesting things. 01:06 Which is: how can they take a single picture and create a realistic animated looping video 01:11 out of it? 01:13 This is done in three important steps. 01:15 The first step is to find what needs to be animated from what needs to stay still. 01:20 In other words, find the water, smoke, or clouds to animate. 01:23 Of course, detecting these moving particles is extremely easy for humans as we can imagine 01:29 the real scene and how it actually was, but how can a computer that sees only a picture 01:35 and doesn't know the world do this? 01:37 Well, the answer lies within the question: we need to teach it a bit more about the world 01:43 and how it works, or in this case, how it moves. 01:46 This is done by training an artificial intelligence model on videos of real landscape scenes instead 01:52 of pictures. 01:53 This way, it can learn how water, smoke, and clouds typically behave in the form of a flow 01:59 field. 02:00 This flow field is a version of the input image where each pixel value is an approximation 02:04 of their direction and speed at a frozen time. 02:07 It is called an Eulerian flow field. 02:10 Eulerian flow fields look at how fluid moves focusing on a fixed location instead of following 02:15 the particles of the fluid. 02:17 You can see this as sitting in front of a waterfall and watching the same exact positions 02:22 observing how the water changes there, instead of following the water down the waterfall. 02:27 And this is exactly what we need in this case as the image is precisely representing that: 02:32 flowing water in a still position. 02:35 So using many landscape videos, they started by identifying these fields for each video. 02:41 This is done quite easily as it actually moves during the videos, and we can use widely known 02:46 techniques to identify the moving particles in each frame. 02:50 Then uses this identified flow for each frame as a landmark to train their algorithm. 02:55 The training starts with an image-to-image translation network using video frames as 03:00 inputs. 03:01 These identified flow fields are used to compare the outputs to teach the network in a supervised 03:06 way what we want to achieve. 03:08 This is done by iteratively correcting and improving the network based on the difference 03:12 between the generated image and our known flow fields. 03:16 After such training, the network can generate this flow field without any external help 03:20 for any image of a landscape received. 03:23 This works just like any other GAN architecture, more precisely any encoder coupled with a 03:29 decoder. 03:30 It first encodes the input frame, the landscape image, and then decodes it to generate a new 03:36 version of the same image, conserving the spatial features and changing the image's 03:40 style. 03:41 In this case, the style changed is the pixel values which identify a motion field instead 03:46 of the actual colors of the images. 03:49 The second step is to animate these sections of the image and do it realistically. 03:53 For this, we only need two things: the input image and the Eulerian or static flow estimation 04:00 we just found for the image. 04:02 Using this information, we know where the pixels are supposed to go next based on their 04:06 speed and directions, but directly applying this will cause some 04:10 issues as some pixels may not have any values after the translation, resulting in black 04:15 holes starting where the motion begins in the picture. 04:18 This is because 1. 04:19 the predicted motion field isn't perfect and 2. 04:22 some pixels will go to the same resulting pixel after their displacement 04:26 , which means that it will get worse over time and produce something like this. 04:30 So how can we make this more intelligent? 04:33 Again, it is done using an encoder and a decoder and doing one more step in-between the two. 04:39 So they encode the input frame a second time using a different encoder trained on this 04:44 specific task, producing what they call here their deep features. 04:48 These deep features are the encodings of the input image, meaning that it is a concentration 04:52 of the important information for this task about the picture. 04:56 What is judged "important information" here is what they optimized their model to do during 05:01 training. 05:02 Using these deep features, controlled by the displacement fields indicating how the next 05:06 frame looks like, they use a decoder trained to generate the 05:10 next frame from this condensed information about the frame and the flow field we give 05:15 it. 05:16 Note that during training, they used two different frames, the first and last frames, to learn 05:20 the real-looking flow of the fluids and try to avoid such black holes from happening. 05:25 Now comes the third and last step: the looping part. 05:29 Using the same frame as starting frame, they generate animation in two directions, a forward 05:34 movement and a backward movement, until they reach the second frame. 05:38 This enables them to produce the looping effect by merging the two videos since one starts 05:44 when the other ends and meets in the center. 05:46 Then, at inference time, or in other words, when you actually use the model, it does the 05:52 same thing with only a starting frame, which is the image you give the model. 05:56 And voila, you have your animated image! 05:59 I hope you enjoyed this video as much as I enjoyed discovering this technique. 06:03 If so, I invite you to read their paper too for more technical details about this super 06:08 cool model. 06:09 It is extremely well done! 06:14 Thank you for watching!

Flow

Facebook

Super

CVPR 2021 Best Paper Award: GIRAFFE Controllable Image Generation

TextStyleBrush Translates Text in Images While Emulating the Font

Watch more on YouTube: https://www.youtube.com/c/WhatsAI

2021 - HackerNoon Contributor of the Year - DEEP-LEARNING

2021 - HackerNoon Contributor of the Year - FACEBOOK

Nominated for 2022 - Best Data Science Newsletter

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - Top Tech Youtuber

Nominated for 2022 - HackerNoon Contributor of the Year - Innovation

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Natural Language Processing

This AI Creates Realistic Animated Looping Videos from Static Images

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps