paint-brush
Wonder3D: Learn More About Diffusion Modelsby@ringi
New Story

Wonder3D: Learn More About Diffusion Models

by RingiJanuary 1st, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Diffusion models [22, 52] are first proposed to gradually recover images from a specifically designed degradation process, where a forward Markov chain and a Reverse Markov chain are adopted.
featured image - Wonder3D: Learn More About Diffusion Models
Ringi HackerNoon profile picture

Abstract and 1 Introduction

2. Related Works

2.1. 2D Diffusion Models for 3D Generation

2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models

3. Problem Formulation

3.1. Diffusion Models

3.2. The Distribution of 3D Assets

4. Method and 4.1. Consistent Multi-view Generation

4.2. Cross-Domain Diffusion

4.3. Textured Mesh Extraction

5. Experiments

5.1. Implementation Details

5.2. Baselines

5.3. Evaluation Protocol

5.4. Single View Reconstruction

5.5. Novel View Synthesis and 5.6. Discussions

6. Conclusions and Future Works, Acknowledgements and References

3. Problem Formulation

3.1. Diffusion Models

Diffusion models [22, 52] are first proposed to gradually recover images from a specifically designed degradation process, where a forward Markov chain and a Reverse Markov chain are adopted.


Given a sample z0 drawn from the data distribution p(z), the forward process of denoising diffusion models yields a sequence of noised data {zt | t ∈ (0, T)} with zt = αtz0 + σtϵ, where ϵ is random noise drawn from distribution N (0, 1), and αt, σt are fixed sequence of the noise schedule. The forward process will be iteratively applied to the target image until the image becomes complete Gaussian noise at the end.


On the contrary, the reverse chain then is employed to iteratively denoise the corrupted image, i.e., recovering zt−1 from zt by predicting the added random noise ϵ. The readers can refer to [22, 52] for more details about image diffusion models.


Figure 2. Overview of Wonder3D. Given a single image, Wonder3D takes the input image, the text embedding produced by CLIP model [45], the camera parameters of multiple views, and a domain switcher as conditioning to generate consistent multi-view normal maps and color images. Subsequently, Wonder3D employs an innovative normal fusion algorithm to robustly reconstruct high-quality 3D geometry from the 2D representations, yielding high-fidelity textured meshes


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions;

(2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions;

(3) Cheng Lin, The University of Hong Kong with Corresponding authors;

(4) Yuan Liu, The University of Hong Kong;

(5) Zhiyang Dou, The University of Hong Kong;

(6) Lingjie Liu, University of Pennsylvania;

(7) Yuexin Ma, Shanghai Tech University;

(8) Song-Hai Zhang, The University of Hong Kong;

(9) Marc Habermann, MPI Informatik;

(10) Christian Theobalt, MPI Informatik;

(11) Wenping Wang, Texas A&M University with Corresponding authors.