paint-brush
FlowVid: Taming Imperfect Optical Flows: Generation: Edit the First Frame Then Propagateby@kinetograph

FlowVid: Taming Imperfect Optical Flows: Generation: Edit the First Frame Then Propagate

tldt arrow

Too Long; Didn't Read

This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video.
featured image - FlowVid: Taming Imperfect Optical Flows: Generation: Edit the First Frame Then Propagate
Kinetograph: The Video Editing Technology Publication HackerNoon profile picture

(1) Feng Liang, The University of Texas at Austin and Work partially done during an internship at Meta GenAI (Email: [email protected]);

(2) Bichen Wu, Meta GenAI and Corresponding author;

(3) Jialiang Wang, Meta GenAI;

(4) Licheng Yu, Meta GenAI;

(5) Kunpeng Li, Meta GenAI;

(6) Yinan Zhao, Meta GenAI;

(7) Ishan Misra, Meta GenAI;

(8) Jia-Bin Huang, Meta GenAI;

(9) Peizhao Zhang, Meta GenAI (Email: [email protected]);

(10) Peter Vajda, Meta GenAI (Email: [email protected]);

(11) Diana Marculescu, The University of Texas at Austin (Email: [email protected]).

4.3. Generation: edit the first frame then propagate


Another advantageous strategy we discovered is the integration of self-attention features from DDIM inversion, a technique also employed in works like FateZero [35] and TokenFlow [13]. This integration helps preserve the original structure and motion in the input video. Concretely, we use DDIM inversion to invert the input video with the original prompt and save the intermediate self-attention maps at various timesteps, usually 20. During the generation with the target prompt, we substitute the keys and values in the selfattention modules with these pre-stored maps. Then, during the generation process guided by the target prompt, we replace the keys and values within the self-attention modules with previously saved corresponding maps.


This paper is available on arxiv under CC 4.0 license.