NeRF Editing and Inpainting Techniques: Training View Pre-processing

by Writings, Papers and Blogs on Text ModelsJuly 18th, 2024

Too Long; Didn't Read

This paper proposes Inpaint4DNeRF to capitalize on state-of-the-art stable diffusion models for direct generation of the underlying completed background content

featured image - NeRF Editing and Inpainting Techniques: Training View Pre-processing

Authors:

(1) Han Jiang, HKUST and Equal contribution ([email protected]);

(2) Haosen Sun, HKUST and Equal contribution ([email protected]);

(3) Ruoxuan Li, HKUST and Equal contribution ([email protected]);

(4) Chi-Keung Tang, HKUST ([email protected]);

(5) Yu-Wing Tai, Dartmouth College, ([email protected]).

Table of Links

Abstract and 1. Introduction

2.3. Text-Guided Visual Content Generation

3. Method

3.1. Training View Pre-processing

3.2. Progressive Training

3.3. 4D Extension

4. Experiments and 4.1. Experimental Setups

4.2. Ablation and comparison

5. Conclusion and 6. References

3.1. Training View Pre-processing

Text-guided visual content generation is inherently a highly underdetermined problem: for a given text prompt, there are infinitely many object appearances that are feasible matches. In our method, we generate content in NeRF by first performing inpainting on its training images and then backward propagating the modified pixels of the images into NeRF.

If we inpaint each view independently, the inpainted content will not be consistent across multiple views. Some prior 3D generation works, using techniques including score distillation sampling [21], have demonstrated the possibility of multiview convergence based on independently modified training views. Constraining the generation problem by enforcing the training views to be strongly related to each other will simplify convergence to a large extent. Therefore, we first inpaint a small number of seed images associated with a coarse set of cameras that covers sufficiently wide viewing angles. For the other views, the inpainted content are strongly conditioned on these seed images.

Stable diffusion correction. Since our planar depth estimation is not always accurate, while the raw projected results are plausible in general, the many small artifacts make the relevant images unqualified for training. To make them look more natural, we propose to cover the projection artifacts with stable diffusion hallucinated details. First, we blend the raw projection with random noise in stable diffusion’s latent space for t timesteps, where t is relatively small to the total number of stable diffusion timesteps. Then, starting from the last t steps, we denoise with stable diffusion to generate the hallucinated image. The resulting image is then regarded as the initial training image. With this step, the training image pre-processing stage is completed.

This paper is available on arxiv under CC 4.0 license.

L O A D I N G
. . . comments & more!

About Author

Writings, Papers and Blogs on Text Models@textmodels

We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.

Read my stories About @textmodels

TOPICS

machine-learning #neural-networks #neural-radiance-fields #nerf #inpaint4dnerf #stable-diffusion-models #generative-approach #seed-images #3d-geometry-proxies

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

NeRF Editing and Inpainting Techniques: Training View Pre-processing

Too Long; Didn't Read

Table of Links

3.1. Training View Pre-processing

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES