This story draft by @homology has not been reviewed by an editor, YET.
Authors:
(1) Xian Liu, Snap Inc., CUHK with Work done during an internship at Snap Inc.;
(2) Jian Ren, Snap Inc. with Corresponding author: [email protected];
(3) Aliaksandr Siarohin, Snap Inc.;
(4) Ivan Skorokhodov, Snap Inc.;
(5) Yanyu Li, Snap Inc.;
(6) Dahua Lin, CUHK;
(7) Xihui Liu, HKU;
(8) Ziwei Liu, NTU;
(9) Sergey Tulyakov, Snap Inc.
3 Our Approach and 3.1 Preliminaries and Problem Setting
3.2 Latent Structural Diffusion Model
A Appendix and A.1 Additional Quantitative Results
A.2 More Implementation Details and A.3 More Ablation Study Results
A.5 Impact of Random Seed and Model Robustness and A.6 Boarder Impact and Ethical Consideration
A.7 More Comparison Results and A.8 Additional Qualitative Results
Image Datasets:
• LAION-5B**[**2] (Schuhmann et al., 2022): Creative Common CC-BY 4.0 license.
• COYO-700M**[**3] (Byeon et al., 2022): Creative Common CC-BY 4.0 license.
• MS-COCO**[**4] (Lin et al., 2014): Creative Commons Attribution 4.0 License.
Pretrained Models and Off-the-Shelf Annotation Tools:
• diffusers[5] (von Platen et al., 2022): Apache 2.0 License.
• CLIP[6] (Radford et al., 2021): MIT License.
• Stable Diffusion[7] (Rombach et al., 2022): CreativeML Open RAIL++-M License.
• YOLOS-Tiny[8] (Fang et al., 2021): Apache 2.0 License.
• BLIP2[9] (Guo et al., 2023): MIT License.
• MMPose[10] (Contributors, 2020): Apache 2.0 License.
• ViTPose[11] (Xu et al., 2022): Apache 2.0 License.
• Omnidata[12] (Eftekhar et al., 2021): OMNIDATA STARTER DATASET License
• MiDaS[13] (Ranftl et al., 2022): MIT License.
• clean-fid[14] (Parmar et al., 2022): MIT License.
• SDv2-inpainting[15] (Rombach et al., 2022): CreativeML Open RAIL++-M License.
• SDXL-base-v1.0[16] (Podell et al., 2023): CreativeML Open RAIL++-M License.
• Improved Aesthetic Predictor[17]: Apache 2.0 License.
This paper is available on arxiv under CC BY 4.0 DEED license.
[2]https://laion.ai/blog/laion-5b/
[3]https://github.com/kakaobrain/coyo-dataset
[4]https://cocodataset.org/#home
[5]https://github.com/huggingface/diffusers
[6]https://github.com/openai/CLIP
[7]https://huggingface.co/stabilityai/stable-diffusion-2-base
[8]https://huggingface.co/hustvl/yolos-tiny
[9]https://huggingface.co/Salesforce/blip2-opt-2.7b
[10]https://github.com/open-mmlab/mmpose
[11]https://github.com/ViTAE-Transformer/ViTPose
[12]https://github.com/EPFL-VILAB/omnidata
[13]https://github.com/isl-org/MiDaS
[14]https://github.com/GaParmar/clean-fid
[15]https://huggingface.co/stabilityai/stable-diffusion-2-inpainting [16]https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 [17]https://github.com/christophschuhmann/improved-aesthetic-predictor