Authors:
(1) Yuwei Guo, The Chinese University of Hong Kong;
(2) Ceyuan Yang, Shanghai Artificial Intelligence Laboratory with Corresponding Author;
(3) Anyi Rao, Stanford University;
(4) Zhengyang Liang, Shanghai Artificial Intelligence Laboratory;
(5) Yaohui Wang, Shanghai Artificial Intelligence Laboratory;
(6) Yu Qiao, Shanghai Artificial Intelligence Laboratory;
(7) Maneesh Agrawala, Stanford University;
(8) Dahua Lin, Shanghai Artificial Intelligence Laboratory;
(9) Bo Dai, The Chinese University of Hong Kong and The Chinese University of Hong Kong.
4.1 Alleviate Negative Effects from Training Data with Domain Adapter
4.2 Learn Motion Priors with Motion Module
4.3 Adapt to New Motion Patterns with MotionLora
5 Experiments and 5.1 Qualitative Results
8 Reproducibility Statement, Acknowledgement and References
We conduct the quantitative comparison through user study and CLIP metrics. The comparison focuses on three key aspects: text alignment, domain similarity, and motion smoothness. The results are shown in Table 1. Detailed implementations can be found in supplementary materials.
User study. In the user study, we generate animations using all three methods based on the same personalized T2I models. Participants are then asked to individually rank the results based on the above three aspects. We use the Average User Ranking (AUR) as a preference metric where a higher score indicates superior performance. Note that the corresponding prompts and images are provided for reference for text alignment and domain similarity evaluation.
CLIP metric. We also employed the CLIP (Radford et al., 2021) metric, following the approach taken by previous studies (Wu et al., 2023; Khachatryan et al., 2023). When evaluating domain similarity, it is important to note that the CLIP score was computed between the animation frames and the reference images generated using the personalized T2Is.
This paper is available on arxiv under CC BY 4.0 DEED license.