Table of Links Abstract and 1 Introduction 2. Related Works 2.1. 2D Diffusion Models for 3D Generation 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 3. Problem Formulation 3.1. Diffusion Models 3.2. The Distribution of 3D Assets 4. Method and 4.1. Consistent Multi-view Generation 4.2. Cross-Domain Diffusion 4.3. Textured Mesh Extraction 5. Experiments 5.1. Implementation Details 5.2. Baselines 5.3. Evaluation Protocol 5.4. Single View Reconstruction 5.5. Novel View Synthesis and 5.6. Discussions 6. Conclusions and Future Works, Acknowledgements and References 2. Related Works 2.1. 2D Diffusion Models for 3D Generation Recent compelling successes in 2D diffusion models [8, 22, 47] and large vision language models (e.g., CLIP model [45]) provide new possibilities for generating 3D assets using the strong priors of 2D diffusion models. Pioneering works DreamFusion [43] and SJC [59] propose to distill a 2D text-to-image generation model to generate 3D shapes from texts, and many follow-up works follow such per-shape optimization scheme. For the task of textto-3D [2, 5, 6, 23, 29, 48, 49, 57, 63, 65, 69, 77] or imageto-3D synthesis [38, 44, 46, 50, 54, 67], these methods typically optimize a 3D representation (i.e., NeRF, mesh, or SDF), and then leverage neural rendering to generate 2D images from various viewpoints. The images are then fed into the 2D diffusion models or CLIP model for calculating SDS [43] losses, which can guide the 3D shape optimization. However, most of these methods always suffer from low efficiency and multi-face problem, where a per-shape optimization consumes tens of minutes and the optimized geometry tends to produce multiple faces due to the lack of explicit 3D supervision. A recent work one-2-3-45 [15] proposes to leverage a generalizable neural reconstruction method SparseNeuS [36] to directly produce 3D geometry from the generated images from zero123 [31]. Although the method achieves high efficiency, its results are of low-quality and lack geometric details. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. Authors:
(1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions;
(2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions;
(3) Cheng Lin, The University of Hong Kong with Corresponding authors;
(4) Yuan Liu, The University of Hong Kong;
(5) Zhiyang Dou, The University of Hong Kong;
(6) Lingjie Liu, University of Pennsylvania;
(7) Yuexin Ma, Shanghai Tech University;
(8) Song-Hai Zhang, The University of Hong Kong;
(9) Marc Habermann, MPI Informatik;
(10) Christian Theobalt, MPI Informatik;
(11) Wenping Wang, Texas A&M University with Corresponding authors. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2. Related Works 2.1. 2D Diffusion Models for 3D Generation 2.1. 2D Diffusion Models for 3D Generation 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 3. Problem Formulation 3.1. Diffusion Models 3.1. Diffusion Models 3.2. The Distribution of 3D Assets 3.2. The Distribution of 3D Assets 4. Method and 4.1. Consistent Multi-view Generation 4. Method and 4.1. Consistent Multi-view Generation 4.2. Cross-Domain Diffusion 4.2. Cross-Domain Diffusion 4.3. Textured Mesh Extraction 4.3. Textured Mesh Extraction 5. Experiments 5.1. Implementation Details 5.1. Implementation Details 5.2. Baselines 5.2. Baselines 5.3. Evaluation Protocol 5.3. Evaluation Protocol 5.4. Single View Reconstruction 5.4. Single View Reconstruction 5.5. Novel View Synthesis and 5.6. Discussions 5.5. Novel View Synthesis and 5.6. Discussions 6. Conclusions and Future Works, Acknowledgements and References 6. Conclusions and Future Works, Acknowledgements and References 2. Related Works 2.1. 2D Diffusion Models for 3D Generation Recent compelling successes in 2D diffusion models [8, 22, 47] and large vision language models (e.g., CLIP model [45]) provide new possibilities for generating 3D assets using the strong priors of 2D diffusion models. Pioneering works DreamFusion [43] and SJC [59] propose to distill a 2D text-to-image generation model to generate 3D shapes from texts, and many follow-up works follow such per-shape optimization scheme. For the task of textto-3D [2, 5, 6, 23, 29, 48, 49, 57, 63, 65, 69, 77] or imageto-3D synthesis [38, 44, 46, 50, 54, 67], these methods typically optimize a 3D representation (i.e., NeRF, mesh, or SDF), and then leverage neural rendering to generate 2D images from various viewpoints. The images are then fed into the 2D diffusion models or CLIP model for calculating SDS [43] losses, which can guide the 3D shape optimization. However, most of these methods always suffer from low efficiency and multi-face problem, where a per-shape optimization consumes tens of minutes and the optimized geometry tends to produce multiple faces due to the lack of explicit 3D supervision. A recent work one-2-3-45 [15] proposes to leverage a generalizable neural reconstruction method SparseNeuS [36] to directly produce 3D geometry from the generated images from zero123 [31]. Although the method achieves high efficiency, its results are of low-quality and lack geometric details. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. available on arxiv Authors: (1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions; (2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions; (3) Cheng Lin, The University of Hong Kong with Corresponding authors; (4) Yuan Liu, The University of Hong Kong; (5) Zhiyang Dou, The University of Hong Kong; (6) Lingjie Liu, University of Pennsylvania; (7) Yuexin Ma, Shanghai Tech University; (8) Song-Hai Zhang, The University of Hong Kong; (9) Marc Habermann, MPI Informatik; (10) Christian Theobalt, MPI Informatik; (11) Wenping Wang, Texas A&M University with Corresponding authors. Authors: Authors: (1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions; (2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions; (3) Cheng Lin, The University of Hong Kong with Corresponding authors; (4) Yuan Liu, The University of Hong Kong; (5) Zhiyang Dou, The University of Hong Kong; (6) Lingjie Liu, University of Pennsylvania; (7) Yuexin Ma, Shanghai Tech University; (8) Song-Hai Zhang, The University of Hong Kong; (9) Marc Habermann, MPI Informatik; (10) Christian Theobalt, MPI Informatik; (11) Wenping Wang, Texas A&M University with Corresponding authors.

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

2D Diffusion Models for 3D Generation: How They're Related to Wonder3D

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Implementation Details of Wonder3D That You Should Know About

Wonder3D: A Look At Our Method and Consistent Multi-view Generation

What Is Wonder3D? A Method for Generating High-Fidelity Textured Meshes From Single-View Images

First Text, Then Music - Now, AI Comes for 3D

Finding AI-Generated Faces in the Wild: Model

Finding AI-Generated Faces in the Wild: Data sets

Implementation Details of Wonder3D That You Should Know About

Wonder3D: A Look At Our Method and Consistent Multi-view Generation

What Is Wonder3D? A Method for Generating High-Fidelity Textured Meshes From Single-View Images

First Text, Then Music - Now, AI Comes for 3D

Finding AI-Generated Faces in the Wild: Model

Finding AI-Generated Faces in the Wild: Data sets

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps