Table of Links Abstract and 1 Introduction 2. Related Works 2.1. 2D Diffusion Models for 3D Generation 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 3. Problem Formulation 3.1. Diffusion Models 3.2. The Distribution of 3D Assets 4. Method and 4.1. Consistent Multi-view Generation 4.2. Cross-Domain Diffusion 4.3. Textured Mesh Extraction 5. Experiments 5.1. Implementation Details 5.2. Baselines 5.3. Evaluation Protocol 5.4. Single View Reconstruction 5.5. Novel View Synthesis and 5.6. Discussions 6. Conclusions and Future Works, Acknowledgements and References 4. Method As per our problem formulation in Section 3.2, we propose a multi-view cross-domain diffusion scheme, which operates on two distinct domains to generate multi-view consistent normal maps and color images. The overview of our method is presented in Figure 2. First, our method adopts a multi-view diffusion scheme to generate multi-view normal maps and color images, and enforces the consistency across different views using multi-view attentions (see Section 4.1). Second, our proposed domain switcher allows the diffusion model to operate on more than one domain while its formulation does not require a re-training of an existing (potentially single domain) diffusion model such as Stable Diffusion [45]. Thus, we can leverage the generalizability of large foundational models, which are trained on a large corpus of data. A cross-domain attention is proposed to propagate information between the normal domain and color image domain ensuring geometric and visual coherence between the two domains (see Section 4.2). Finally, our novel geometry-aware normal fusion reconstructs the high-quality geometry and appearance from the multi-view 2D normal and color images (see Section 4.3). 4.1. Consistent Multi-view Generation The prior 2D diffusion models [31, 45] generate each image separately, so that the resulting images are not geometrically and visually consistent across different views. To enhance consistency among different views, similar to prior works such as SyncDreamer [33] and MVDream [51], we utilize attention mechanism to facilitate information propagation across different views, implicitly encoding multi-view dependencies (as illustrated in Figure 4) This is achieved by extending the original self-attention layers to be global-aware, allowing connections to other views within the attention layers. Keys and values from different views are connected to each other to facilitate the exchange of information. By sharing information across different views within the attention layers, the diffusion model perceives multi-view correlation and becomes capable of generating consistent multi-view color images and normal maps. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. Authors:
(1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions;
(2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions;
(3) Cheng Lin, The University of Hong Kong with Corresponding authors;
(4) Yuan Liu, The University of Hong Kong;
(5) Zhiyang Dou, The University of Hong Kong;
(6) Lingjie Liu, University of Pennsylvania;
(7) Yuexin Ma, Shanghai Tech University;
(8) Song-Hai Zhang, The University of Hong Kong;
(9) Marc Habermann, MPI Informatik;
(10) Christian Theobalt, MPI Informatik;
(11) Wenping Wang, Texas A&M University with Corresponding authors. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2. Related Works 2.1. 2D Diffusion Models for 3D Generation 2.1. 2D Diffusion Models for 3D Generation 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models 3. Problem Formulation 3.1. Diffusion Models 3.1. Diffusion Models 3.2. The Distribution of 3D Assets 3.2. The Distribution of 3D Assets 4. Method and 4.1. Consistent Multi-view Generation 4. Method and 4.1. Consistent Multi-view Generation 4.2. Cross-Domain Diffusion 4.2. Cross-Domain Diffusion 4.3. Textured Mesh Extraction 4.3. Textured Mesh Extraction 5. Experiments 5.1. Implementation Details 5.1. Implementation Details 5.2. Baselines 5.2. Baselines 5.3. Evaluation Protocol 5.3. Evaluation Protocol 5.4. Single View Reconstruction 5.4. Single View Reconstruction 5.5. Novel View Synthesis and 5.6. Discussions 5.5. Novel View Synthesis and 5.6. Discussions 6. Conclusions and Future Works, Acknowledgements and References 6. Conclusions and Future Works, Acknowledgements and References 4. Method As per our problem formulation in Section 3.2, we propose a multi-view cross-domain diffusion scheme, which operates on two distinct domains to generate multi-view consistent normal maps and color images. The overview of our method is presented in Figure 2. First, our method adopts a multi-view diffusion scheme to generate multi-view normal maps and color images, and enforces the consistency across different views using multi-view attentions (see Section 4.1). Second, our proposed domain switcher allows the diffusion model to operate on more than one domain while its formulation does not require a re-training of an existing (potentially single domain) diffusion model such as Stable Diffusion [45]. Thus, we can leverage the generalizability of large foundational models, which are trained on a large corpus of data. A cross-domain attention is proposed to propagate information between the normal domain and color image domain ensuring geometric and visual coherence between the two domains (see Section 4.2). Finally, our novel geometry-aware normal fusion reconstructs the high-quality geometry and appearance from the multi-view 2D normal and color images (see Section 4.3). 4.1. Consistent Multi-view Generation The prior 2D diffusion models [31, 45] generate each image separately, so that the resulting images are not geometrically and visually consistent across different views. To enhance consistency among different views, similar to prior works such as SyncDreamer [33] and MVDream [51], we utilize attention mechanism to facilitate information propagation across different views, implicitly encoding multi-view dependencies (as illustrated in Figure 4) This is achieved by extending the original self-attention layers to be global-aware, allowing connections to other views within the attention layers. Keys and values from different views are connected to each other to facilitate the exchange of information. By sharing information across different views within the attention layers, the diffusion model perceives multi-view correlation and becomes capable of generating consistent multi-view color images and normal maps. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. available on arxiv Authors: (1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions; (2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions; (3) Cheng Lin, The University of Hong Kong with Corresponding authors; (4) Yuan Liu, The University of Hong Kong; (5) Zhiyang Dou, The University of Hong Kong; (6) Lingjie Liu, University of Pennsylvania; (7) Yuexin Ma, Shanghai Tech University; (8) Song-Hai Zhang, The University of Hong Kong; (9) Marc Habermann, MPI Informatik; (10) Christian Theobalt, MPI Informatik; (11) Wenping Wang, Texas A&M University with Corresponding authors. Authors: Authors: (1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions; (2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions; (3) Cheng Lin, The University of Hong Kong with Corresponding authors; (4) Yuan Liu, The University of Hong Kong; (5) Zhiyang Dou, The University of Hong Kong; (6) Lingjie Liu, University of Pennsylvania; (7) Yuexin Ma, Shanghai Tech University; (8) Song-Hai Zhang, The University of Hong Kong; (9) Marc Habermann, MPI Informatik; (10) Christian Theobalt, MPI Informatik; (11) Wenping Wang, Texas A&M University with Corresponding authors.

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Wonder3D: A Look At Our Method and Consistent Multi-view Generation

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

2D Diffusion Models for 3D Generation: How They're Related to Wonder3D

2D Diffusion Models for 3D Generation: How They're Related to Wonder3D

Finding AI-Generated Faces in the Wild: Model

Finding AI-Generated Faces in the Wild: Data sets

Finding AI-Generated Faces in the Wild: Results

Finding AI-Generated Faces in the Wild: Discussion, Acknowledgements, and References

2D Diffusion Models for 3D Generation: How They're Related to Wonder3D

2D Diffusion Models for 3D Generation: How They're Related to Wonder3D

Finding AI-Generated Faces in the Wild: Model

Finding AI-Generated Faces in the Wild: Data sets

Finding AI-Generated Faces in the Wild: Results

Finding AI-Generated Faces in the Wild: Discussion, Acknowledgements, and References

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps