2. Related Works
2.1. 2D Diffusion Models for 3D Generation
2.2. 3D Generative Models and 2.3. Multi-view Diffusion Models
3. Problem Formulation
3.2. The Distribution of 3D Assets
4. Method and 4.1. Consistent Multi-view Generation
5. Experiments
5.4. Single View Reconstruction
5.5. Novel View Synthesis and 5.6. Discussions
6. Conclusions and Future Works, Acknowledgements and References
We adopt Zero123 [31], RealFusion [38], Magic123 [44], One-2-3-45 [30], Point-E [41], Shap-E [25] and a recent work SyncDreamer [33] as baseline methods. Given an input image, zero123 is capable of generating novel views of arbitrary viewpoints, and it can be incorporated with SDS loss [43] for 3D reconstruction (we adopt the implementation of ThreeStudio [20]). RealFusion [38] and Magic123 [44] leverage Stable Diffusion [47] and SDS loss for single-view reconstruction. One-2-3-45 [30] directly predict SDFs via SparseNeuS [36] by taking the generated multiple images of Zero123 [31]. Point-E [41] and ShapE [25] are 3D generative models trained on a large internal OpenAI 3D dataset, both of which are able to convert a single-view image into a point cloud or an implicit representation. SyncDreamer[33] aims to generate multi-view consistent images from a single image for deriving 3D geometry.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
Authors:
(1) Xiaoxiao Long, The University of Hong Kong, VAST, MPI Informatik and Equal Contributions;
(2) Yuan-Chen Guo, Tsinghua University, VAST and Equal Contributions;
(3) Cheng Lin, The University of Hong Kong with Corresponding authors;
(4) Yuan Liu, The University of Hong Kong;
(5) Zhiyang Dou, The University of Hong Kong;
(6) Lingjie Liu, University of Pennsylvania;
(7) Yuexin Ma, Shanghai Tech University;
(8) Song-Hai Zhang, The University of Hong Kong;
(9) Marc Habermann, MPI Informatik;
(10) Christian Theobalt, MPI Informatik;
(11) Wenping Wang, Texas A&M University with Corresponding authors.