Table of Links Abstract and 1 Introduction 2. Related Work 3. Method and 3.1. Architecture 3.2. Loss and 3.3. Implementation Details 4. Data Curation 4.1. Training Dataset 4.2. Evaluation Benchmark 5. Experiments and 5.1. Metrics 5.2. Baselines 5.3. Comparison to SOTA Methods 5.4. Qualitative Results and 5.5. Ablation Study 6. Limitations and Discussion 7. Conclusion and References A. Additional Qualitative Comparison B. Inference on AI-generated Images C. Data Curation Details 5.4. Qualitative Results We show qualitative results of different methods in Fig. 5. Generative approaches such as Point-E and Shap-E tend to have sharper surfaces and contain more details in their generation. However, many details are erroneous hallucination that do not accurately follow the input image, and the visible surfaces are often reconstructed incorrectly. Previous regression-based approaches such as MCC better follow the input cues in the input images, but the hallucination of the occluded surfaces is often inaccurate. We observe that One2-3-45, OpenLRM and SS3D cannot always accurately capture details and concavities. Comparing with prior arts, the reconstruction of ZeroShape not only faithfully capture the global shape structure, but also accurately follows the local geometry cues from the input image. More qualitative results are included in the supplement. 5.5. Ablation Study We analyze our method by ablating the design choices we made. We consider baselines by modifying different modules correspondingly. The results are shown in Tab. 4. Explicit geometric reasoning. We first consider the baseline without any geometric reasoning (Ours w/o geo). We remove the projection unit together with the depth and camera pretraining losses. The number of parameters is controlled to be the same, and we train the model for the same number of total iterations. Comparing the first row to the last row, we see that enforcing explicit geometric reasoning in our model positively affects performance. Alternative intermediate representations. Prior works [56, 64, 65] typically consider depth as the 2.5D intermediate representation. To compare this to our projection-based representation, we consider a baseline where the latent vectors directly come from the depth map instead of a 3D projection map. As shown in Tab. 4 (Ours w/o unproj), depth leads to inferior performance to our intrinsic-guided projection map representation. Intrinsic-guided projection. We propose joint learning of intrinsics with depth to more accurately estimate the 3D shape of the visible object surface. To study the impact of this, we compare our full model with a baseline without intrinsics learning, where the unprojection to 3D is done via a fixed intrinsics during both training and testing. This baseline (Ours w/o intr) leads to indifferent performance to using depth intermediate representation and is worse than our full model. We also show qualitative examples of the estimated surface using our pretrained intrinsics estimator in Fig. 6. Compared with fixed intrinsics, unprojection with our estimated intrinisics leads to more accurate reconstruction of the visible surface. This paper is available on arxiv under CC BY 4.0 DEED license. Authors:
(1) Zixuan Huang, University of Illinois at Urbana-Champaign and both authors contributed equally to this work;
(2) Stefan Stojanov, Georgia Institute of Technology and both authors contributed equally to this work;
(3) Anh Thai, Georgia Institute of Technology;
(4) Varun Jampani, Stability AI;
(5) James M. Rehg, University of Illinois at Urbana-Champaign. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2. Related Work 2. Related Work 3. Method and 3.1. Architecture 3. Method and 3.1. Architecture 3.2. Loss and 3.3. Implementation Details 3.2. Loss and 3.3. Implementation Details 4. Data Curation 4.1. Training Dataset 4.1. Training Dataset 4.2. Evaluation Benchmark 4.2. Evaluation Benchmark 5. Experiments and 5.1. Metrics 5. Experiments and 5.1. Metrics 5.2. Baselines 5.2. Baselines 5.3. Comparison to SOTA Methods 5.3. Comparison to SOTA Methods 5.4. Qualitative Results and 5.5. Ablation Study 5.4. Qualitative Results and 5.5. Ablation Study 6. Limitations and Discussion 6. Limitations and Discussion 7. Conclusion and References 7. Conclusion and References A. Additional Qualitative Comparison A. Additional Qualitative Comparison B. Inference on AI-generated Images B. Inference on AI-generated Images C. Data Curation Details C. Data Curation Details 5.4. Qualitative Results We show qualitative results of different methods in Fig. 5. Generative approaches such as Point-E and Shap-E tend to have sharper surfaces and contain more details in their generation. However, many details are erroneous hallucination that do not accurately follow the input image, and the visible surfaces are often reconstructed incorrectly. Previous regression-based approaches such as MCC better follow the input cues in the input images, but the hallucination of the occluded surfaces is often inaccurate. We observe that One2-3-45, OpenLRM and SS3D cannot always accurately capture details and concavities. Comparing with prior arts, the reconstruction of ZeroShape not only faithfully capture the global shape structure, but also accurately follows the local geometry cues from the input image. More qualitative results are included in the supplement. 5.5. Ablation Study We analyze our method by ablating the design choices we made. We consider baselines by modifying different modules correspondingly. The results are shown in Tab. 4. Explicit geometric reasoning. We first consider the baseline without any geometric reasoning (Ours w/o geo). We remove the projection unit together with the depth and camera pretraining losses. The number of parameters is controlled to be the same, and we train the model for the same number of total iterations. Comparing the first row to the last row, we see that enforcing explicit geometric reasoning in our model positively affects performance. Explicit geometric reasoning. Alternative intermediate representations. Prior works [56, 64, 65] typically consider depth as the 2.5D intermediate representation. To compare this to our projection-based representation, we consider a baseline where the latent vectors directly come from the depth map instead of a 3D projection map. As shown in Tab. 4 (Ours w/o unproj), depth leads to inferior performance to our intrinsic-guided projection map representation. Alternative intermediate representations. Intrinsic-guided projection. We propose joint learning of intrinsics with depth to more accurately estimate the 3D Intrinsic-guided projection. shape of the visible object surface. To study the impact of this, we compare our full model with a baseline without intrinsics learning, where the unprojection to 3D is done via a fixed intrinsics during both training and testing. This baseline (Ours w/o intr) leads to indifferent performance to using depth intermediate representation and is worse than our full model. We also show qualitative examples of the estimated surface using our pretrained intrinsics estimator in Fig. 6. Compared with fixed intrinsics, unprojection with our estimated intrinisics leads to more accurate reconstruction of the visible surface. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv Authors: (1) Zixuan Huang, University of Illinois at Urbana-Champaign and both authors contributed equally to this work; (2) Stefan Stojanov, Georgia Institute of Technology and both authors contributed equally to this work; (3) Anh Thai, Georgia Institute of Technology; (4) Varun Jampani, Stability AI; (5) James M. Rehg, University of Illinois at Urbana-Champaign. Authors: Authors: (1) Zixuan Huang, University of Illinois at Urbana-Champaign and both authors contributed equally to this work; (2) Stefan Stojanov, Georgia Institute of Technology and both authors contributed equally to this work; (3) Anh Thai, Georgia Institute of Technology; (4) Varun Jampani, Stability AI; (5) James M. Rehg, University of Illinois at Urbana-Champaign.

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Zero Shape: The Qualitative Results of Different Methods and Our Ablation Study

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Close Look at Misalignment in Pretraining Datasets

ZeroShape: Presenting Our Architecture for Shape Reconstruction

ZeroShape: The Metrics and Evaluation Protocol That We Used

ZeroShape: The Limitations We Are Facing

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

A Close Look at Misalignment in Pretraining Datasets

ZeroShape: Presenting Our Architecture for Shape Reconstruction

ZeroShape: The Metrics and Evaluation Protocol That We Used

ZeroShape: The Limitations We Are Facing

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps