3 Method and 3.1 Proxy-Guided 3D Conditioning for Diffusion
3.2 Interactive Generation Workflow and 3.3 Volume Conditioned Reconstruction
4 Experiment and 4.1 Comparison on Proxy-based and Image-based 3D Generation
5 Conclusions, Acknowledgments, and References
SUPPLEMENTARY MATERIAL
For personalized generation demands, we think only using text / images is insufficient and also unintuitive for expressing 3D structures of objects and their spatial relationships. Hence, granting
system 3D-aware controllability with 3D proxy is necessary for 3D generation. As for the acquisition of 3D proxies, we believe this is not an obstacle for target users, as it can be assembled easily using kids’ software like Tinkercad, taken from 3D modeling games from SteamVR, or using LLM+procedural modeling instructions. Similarly, ControlNet uses control images from raw sketches to delicate line art, which also requires basic painting skills.
First, the resolution of 3D-aware control is bounded by the size of the proxy feature volume, which cannot fully leverage control from complex high-poly models. For example, we cannot generate a large-scale urban scene with satisfactory building details. Second, our method requires manual tuning control strength to balance between over-constrained and under-constrained, which is also similar to ControlNet [Zhang et al. 2023] as the control strength mainly depends on the creators’ aesthetic choices.
Authors:
(1) Wenqi Dong, from Zhejiang University, and conducted this work during his internship at PICO, ByteDance;
(2) Bangbang Yang, from ByteDance contributed equally to this work together with Wenqi Dong;
(3) Lin Ma, ByteDance;
(4) Xiao Liu, ByteDance;
(5) Liyuan Cui, Zhejiang University;
(6) Hujun Bao, Zhejiang University;
(7) Yuewen Ma, ByteDance;
(8) Zhaopeng Cui, a Corresponding author from Zhejiang University.
This paper is