Authors: (1) Yuwei Guo, The Chinese University of Hong Kong; (2) Ceyuan Yang, Shanghai Artificial Intelligence Laboratory with Corresponding Author; (3) Anyi Rao, Stanford University; (4) Zhengyang Liang, Shanghai Artificial Intelligence Laboratory; (5) Yaohui Wang, Shanghai Artificial Intelligence Laboratory; (6) Yu Qiao, Shanghai Artificial Intelligence Laboratory; (7) Maneesh Agrawala, Stanford University; (8) Dahua Lin, Shanghai Artificial Intelligence Laboratory; (9) Bo Dai, The Chinese University of Hong Kong and The Chinese University of Hong Kong. Table of Links Abstract and 1 Introduction 2 Work Related 3 Preliminary AnimateDiff 4.1 Alleviate Negative Effects from Training Data with Domain Adapter 4.2 Learn Motion Priors with Motion Module 4.3 Adapt to New Motion Patterns with MotionLora 4.4 AnimateDiff in Practice 5 Experiments and 5.1 Qualitative Results 5.2 Qualitative Comparison 5.3 Ablative Study 5.4 Controllable Generation 6 Conclusion 7 Ethics Statement 8 Reproducibility Statement, Acknowledgement and References 6 CONCLUSION In this paper, we present AnimateDiff, a practical pipeline directly turning personalized text-toimage (T2I) models for animation generation once and for all, without compromising quality or losing pre-learned domain knowledge. To accomplish this, we design three component modules in AnimateDiff to learn meaningful motion priors while alleviating visual quality degradation and enabling motion personalization with a lightweight fine-tuning technique named MotionLoRA. Once trained, our motion module can be integrated into other personalized T2Is to generate animated images with natural and coherent motions while remaining faithful to the personalized domain. Extensive evaluation with various personalized T2I models also validates the effectiveness and generalizability of our AnimateDiff and MotionLoRA. Furthermore, we demonstrate the compatibility of our method with existing content-controlling approaches, enabling controllable generation without incurring additional training costs. Overall, AnimateDiff provides an effective baseline for personalized animation and holds significant potential for a wide range of applications. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Yuwei Guo, The Chinese University of Hong Kong; (2) Ceyuan Yang, Shanghai Artificial Intelligence Laboratory with Corresponding Author; (3) Anyi Rao, Stanford University; (4) Zhengyang Liang, Shanghai Artificial Intelligence Laboratory; (5) Yaohui Wang, Shanghai Artificial Intelligence Laboratory; (6) Yu Qiao, Shanghai Artificial Intelligence Laboratory; (7) Maneesh Agrawala, Stanford University; (8) Dahua Lin, Shanghai Artificial Intelligence Laboratory; (9) Bo Dai, The Chinese University of Hong Kong and The Chinese University of Hong Kong. Authors: Authors: (1) Yuwei Guo, The Chinese University of Hong Kong; (2) Ceyuan Yang, Shanghai Artificial Intelligence Laboratory with Corresponding Author; (3) Anyi Rao, Stanford University; (4) Zhengyang Liang, Shanghai Artificial Intelligence Laboratory; (5) Yaohui Wang, Shanghai Artificial Intelligence Laboratory; (6) Yu Qiao, Shanghai Artificial Intelligence Laboratory; (7) Maneesh Agrawala, Stanford University; (8) Dahua Lin, Shanghai Artificial Intelligence Laboratory; (9) Bo Dai, The Chinese University of Hong Kong and The Chinese University of Hong Kong. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Work Related 2 Work Related 3 Preliminary 3 Preliminary AnimateDiff AnimateDiff 4.1 Alleviate Negative Effects from Training Data with Domain Adapter 4.1 Alleviate Negative Effects from Training Data with Domain Adapter 4.2 Learn Motion Priors with Motion Module 4.2 Learn Motion Priors with Motion Module 4.3 Adapt to New Motion Patterns with MotionLora 4.3 Adapt to New Motion Patterns with MotionLora 4.4 AnimateDiff in Practice 4.4 AnimateDiff in Practice 5 Experiments and 5.1 Qualitative Results 5 Experiments and 5.1 Qualitative Results 5.2 Qualitative Comparison 5.2 Qualitative Comparison 5.3 Ablative Study 5.3 Ablative Study 5.4 Controllable Generation 5.4 Controllable Generation 6 Conclusion 6 Conclusion 7 Ethics Statement 7 Ethics Statement 8 Reproducibility Statement, Acknowledgement and References 8 Reproducibility Statement, Acknowledgement and References 6 CONCLUSION In this paper, we present AnimateDiff, a practical pipeline directly turning personalized text-toimage (T2I) models for animation generation once and for all, without compromising quality or losing pre-learned domain knowledge. To accomplish this, we design three component modules in AnimateDiff to learn meaningful motion priors while alleviating visual quality degradation and enabling motion personalization with a lightweight fine-tuning technique named MotionLoRA. Once trained, our motion module can be integrated into other personalized T2Is to generate animated images with natural and coherent motions while remaining faithful to the personalized domain. Extensive evaluation with various personalized T2I models also validates the effectiveness and generalizability of our AnimateDiff and MotionLoRA. Furthermore, we demonstrate the compatibility of our method with existing content-controlling approaches, enabling controllable generation without incurring additional training costs. Overall, AnimateDiff provides an effective baseline for personalized animation and holds significant potential for a wide range of applications. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv