Authors:
(1) Xian Liu, Snap Inc., CUHK with Work done during an internship at Snap Inc.;
(2) Jian Ren, Snap Inc. with Corresponding author: jren@snapchat.com;
(3) Aliaksandr Siarohin, Snap Inc.;
(4) Ivan Skorokhodov, Snap Inc.;
(5) Yanyu Li, Snap Inc.;
(6) Dahua Lin, CUHK;
(7) Xihui Liu, HKU;
(8) Ziwei Liu, NTU;
(9) Sergey Tulyakov, Snap Inc. Table of Links Abstract and 1 Introduction 2 Related Work 3 Our Approach and 3.1 Preliminaries and Problem Setting 3.2 Latent Structural Diffusion Model 3.3 Structure-Guided Refiner 4 Human Verse Dataset 5 Experiments 5.1 Main Results 5.2 Ablation Study 6 Discussion and References A Appendix and A.1 Additional Quantitative Results A.2 More Implementation Details and A.3 More Ablation Study Results A.4 More User Study Details A.5 Impact of Random Seed and Model Robustness and A.6 Boarder Impact and Ethical Consideration A.7 More Comparison Results and A.8 Additional Qualitative Results A.9 Licenses A.9 LICENSES Image Datasets: • LAION-5B**[**2] (Schuhmann et al., 2022): Creative Common CC-BY 4.0 license. • COYO-700M**[**3] (Byeon et al., 2022): Creative Common CC-BY 4.0 license. • MS-COCO**[**4] (Lin et al., 2014): Creative Commons Attribution 4.0 License. Pretrained Models and Off-the-Shelf Annotation Tools: • diffusers[5] (von Platen et al., 2022): Apache 2.0 License. • CLIP[6] (Radford et al., 2021): MIT License. • Stable Diffusion[7] (Rombach et al., 2022): CreativeML Open RAIL++-M License. • YOLOS-Tiny[8] (Fang et al., 2021): Apache 2.0 License. • BLIP2[9] (Guo et al., 2023): MIT License. • MMPose[10] (Contributors, 2020): Apache 2.0 License. • ViTPose[11] (Xu et al., 2022): Apache 2.0 License. • Omnidata[12] (Eftekhar et al., 2021): OMNIDATA STARTER DATASET License • MiDaS[13] (Ranftl et al., 2022): MIT License. • clean-fid[14] (Parmar et al., 2022): MIT License. • SDv2-inpainting[15] (Rombach et al., 2022): CreativeML Open RAIL++-M License. • SDXL-base-v1.0[16] (Podell et al., 2023): CreativeML Open RAIL++-M License. • Improved Aesthetic Predictor[17]: Apache 2.0 License. This paper is available on arxiv under CC BY 4.0 DEED license. [2]https://laion.ai/blog/laion-5b/ [3]https://github.com/kakaobrain/coyo-dataset [4]https://cocodataset.org/#home [5]https://github.com/huggingface/diffusers [6]https://github.com/openai/CLIP [7]https://huggingface.co/stabilityai/stable-diffusion-2-base [8]https://huggingface.co/hustvl/yolos-tiny [9]https://huggingface.co/Salesforce/blip2-opt-2.7b [10]https://github.com/open-mmlab/mmpose [11]https://github.com/ViTAE-Transformer/ViTPose [12]https://github.com/EPFL-VILAB/omnidata [13]https://github.com/isl-org/MiDaS [14]https://github.com/GaParmar/clean-fid [15]https://huggingface.co/stabilityai/stable-diffusion-2-inpainting [16]https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 [17]https://github.com/christophschuhmann/improved-aesthetic-predictor Authors: (1) Xian Liu, Snap Inc., CUHK with Work done during an internship at Snap Inc.; (2) Jian Ren, Snap Inc. with Corresponding author: jren@snapchat.com; (3) Aliaksandr Siarohin, Snap Inc.; (4) Ivan Skorokhodov, Snap Inc.; (5) Yanyu Li, Snap Inc.; (6) Dahua Lin, CUHK; (7) Xihui Liu, HKU; (8) Ziwei Liu, NTU; (9) Sergey Tulyakov, Snap Inc. Authors: Authors: (1) Xian Liu, Snap Inc., CUHK with Work done during an internship at Snap Inc.; (2) Jian Ren, Snap Inc. with Corresponding author: jren@snapchat.com; (3) Aliaksandr Siarohin, Snap Inc.; (4) Ivan Skorokhodov, Snap Inc.; (5) Yanyu Li, Snap Inc.; (6) Dahua Lin, CUHK; (7) Xihui Liu, HKU; (8) Ziwei Liu, NTU; (9) Sergey Tulyakov, Snap Inc. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Related Work 2 Related Work 3 Our Approach and 3.1 Preliminaries and Problem Setting 3 Our Approach and 3.1 Preliminaries and Problem Setting 3.2 Latent Structural Diffusion Model 3.2 Latent Structural Diffusion Model 3.3 Structure-Guided Refiner 3.3 Structure-Guided Refiner 4 Human Verse Dataset 4 Human Verse Dataset 5 Experiments 5 Experiments 5.1 Main Results 5.1 Main Results 5.2 Ablation Study 5.2 Ablation Study 6 Discussion and References 6 Discussion and References A Appendix and A.1 Additional Quantitative Results A Appendix and A.1 Additional Quantitative Results A.2 More Implementation Details and A.3 More Ablation Study Results A.2 More Implementation Details and A.3 More Ablation Study Results A.4 More User Study Details A.4 More User Study Details A.5 Impact of Random Seed and Model Robustness and A.6 Boarder Impact and Ethical Consideration A.5 Impact of Random Seed and Model Robustness and A.6 Boarder Impact and Ethical Consideration A.7 More Comparison Results and A.8 Additional Qualitative Results A.7 More Comparison Results and A.8 Additional Qualitative Results A.9 Licenses A.9 Licenses A.9 LICENSES Image Datasets: • LAION-5B**[**2] (Schuhmann et al., 2022): Creative Common CC-BY 4.0 license. • COYO-700M**[**3] (Byeon et al., 2022): Creative Common CC-BY 4.0 license. • MS-COCO**[**4] (Lin et al., 2014): Creative Commons Attribution 4.0 License. Pretrained Models and Off-the-Shelf Annotation Tools: • diffusers[5] (von Platen et al., 2022): Apache 2.0 License. • CLIP[6] (Radford et al., 2021): MIT License. • Stable Diffusion[7] (Rombach et al., 2022): CreativeML Open RAIL++-M License. • YOLOS-Tiny[8] (Fang et al., 2021): Apache 2.0 License. • BLIP2[9] (Guo et al., 2023): MIT License. • MMPose[10] (Contributors, 2020): Apache 2.0 License. • ViTPose[11] (Xu et al., 2022): Apache 2.0 License. • Omnidata[12] (Eftekhar et al., 2021): OMNIDATA STARTER DATASET License • MiDaS[13] (Ranftl et al., 2022): MIT License. • clean-fid[14] (Parmar et al., 2022): MIT License. • SDv2-inpainting[15] (Rombach et al., 2022): CreativeML Open RAIL++-M License. • SDXL-base-v1.0[16] (Podell et al., 2023): CreativeML Open RAIL++-M License. • Improved Aesthetic Predictor[17]: Apache 2.0 License. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv [2]https://laion.ai/blog/laion-5b/ [3]https://github.com/kakaobrain/coyo-dataset [4]https://cocodataset.org/#home [5]https://github.com/huggingface/diffusers [6]https://github.com/openai/CLIP [7]https://huggingface.co/stabilityai/stable-diffusion-2-base [8]https://huggingface.co/hustvl/yolos-tiny [9]https://huggingface.co/Salesforce/blip2-opt-2.7b [10]https://github.com/open-mmlab/mmpose [11]https://github.com/ViTAE-Transformer/ViTPose [12]https://github.com/EPFL-VILAB/omnidata [13]https://github.com/isl-org/MiDaS [14]https://github.com/GaParmar/clean-fid [15]https://huggingface.co/stabilityai/stable-diffusion-2-inpainting [16]https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 [17]https://github.com/christophschuhmann/improved-aesthetic-predictor

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Hyper-Realistic Human Generation with Latent Structural Diffusion: Licenses

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Unified Approach to In-The-Wild Realistic Human Image Generation

The Noonification: Subjectivity and the Evolution of AI Philosophy (11/22/2023)

The Noonification: The State of Webhooks in 2023 (10/28/2023)

The Noonification: A Game-Changing Leap in Voice AI Technology (10/22/2023)

The Noonification: Go and Protocol Buffers (Quick Tutorial) (10/15/2023)

The Noonification: Migrating from WebGL to WebGPU (12/20/2023)

A Unified Approach to In-The-Wild Realistic Human Image Generation

The Noonification: Subjectivity and the Evolution of AI Philosophy (11/22/2023)

The Noonification: The State of Webhooks in 2023 (10/28/2023)

The Noonification: A Game-Changing Leap in Voice AI Technology (10/22/2023)

The Noonification: Go and Protocol Buffers (Quick Tutorial) (10/15/2023)

The Noonification: Migrating from WebGL to WebGPU (12/20/2023)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps