4.1 Key Sample and Joint Editing
4.2 Edit Propagation Via TokenFlow
5.1 Qualitative Evaluation and 5.2 Quantitative Evaluation
7 Acknowledgement and References
We presented a new framework for text-driven video editing using an image diffusion model. We study the internal representation of a video in the diffusion feature space, and demonstrate that consistent video editing can be achieved via consistent diffusion feature representation during the generation. Our method outperforms existing baselines, demonstrating a significant improvement in temporal consistency. As for limitations, our method is tailored to preserve the motion of the original video, and as such, it cannot handle edits that require structural changes (Fig 7.) Moreover, our method is built upon a diffusion-based image editing technique to allow the structure preservation of the original frames. When the image-editing technique fails to preserve the structure, our method enforces correspondences that are meaningless in the edited frames, resulting in visual artifacts. Lastly, the LDM decoder introduces some high frequency flickering (Blattmann et al., 2023). A possible solution for this would be to combine our framework with an improved decoder (e.g., Blattmann et al. (2023), Zhu et al. (2023)). We note that this minor level of flickering can be easily eliminated with exiting post-process deflickering (see SM). Our work shed new light on the internal representation of natural videos in the space of diffusion models (e.g., temporal redundancies), and how they can be leveraged for enhancing video synthesis. We believe it can inspire future research in harnessing image models for video tasks, and for the design of text-to-video models.
This paper is available on arxiv under CC BY 4.0 DEED DEED license.
Authors:
(1) Michal Geyer, Weizmann Institute of Science and Indicates equal contribution;
(2) Omer Bar-Tal, Weizmann Institute of Science and Indicates equal contribution;
(3) Shai Bagon, Weizmann Institute of Science;
(4) Tali Dekel, Weizmann Institute of Science.