Authors:
(1) Han Jiang, HKUST and Equal contribution (hjiangav@connect.ust.hk);
(2) Haosen Sun, HKUST and Equal contribution (hsunas@connect.ust.hk);
(3) Ruoxuan Li, HKUST and Equal contribution (rliba@connect.ust.hk);
(4) Chi-Keung Tang, HKUST (cktang@cs.ust.hk);
(5) Yu-Wing Tai, Dartmouth College, (yu-wing.tai@dartmouth.edu). Table of Links Abstract and 1. Introduction 2. Related Work 2.1. NeRF Editing and 2.2. Inpainting Techniques 2.3. Text-Guided Visual Content Generation 3. Method 3.1. Training View Pre-processing 3.2. Progressive Training 3.3. 4D Extension 4. Experiments and 4.1. Experimental Setups 4.2. Ablation and comparison 5. Conclusion and 6. References 4. Experiments For our experiments, we select from dynamic scenes in the Nvidia Dynamic Scenes Dataset [35]. Scenes in this dataset are captured using a sparse set of 12 stationary cameras located in two rows, producing images of resolution 1015×1920. The static scenes we use are taken from one frame from the dynamic scenes. For the backbone NeRF, we use static and dynamic versions of K-Planes [25] implemented in nerfstudio [30]. For each scene, we conduct inpainting by replacing a foreground object with another text-prompted object with a different geometry. We will demonstrate the effectiveness of our method by showing the qualitative intermediate and final results. In addition, we will explain different parts of our design by ablations and comparisons on our baseline. 4.1. Qualitative results 3D Examples. We show several 3D inpainting examples in figure 2. For each individual inpainting task, we show 2 renderings of the final NeRF from different views to demonstrate the multiview consistency. Additionally, we show the first seed image, another pre-processed image, as well as the RGB and depth map in the three stages: before training, after warmup training, and after convergence. These beforeand-after images demonstrate the efficacy of each stage in our method. As shown in Figure 2, a roughly consistent preprocessed image can optimize a coarse inpainted NeRF after warmup training, and the geometry (represented by depth map) converges during warmup training. Then, fine convergence across views is achieved after the final training stage. All 3D inpainting tasks are trained on a single Nvidia RTX 4090 GPU. Warmup training takes approximately 0.5–1 hour, and the main training stage with IDU takes approximately 1–2 hours. 4D Example. We show a 4D inpainting example in figure 3 to demonstrate that our method has the potential to generalize to dynamic NeRFs. In this example, we remove the foreground object in the video of the seed view using E2FGVI [11], a flow-based method with optimization by feature propagation and content hallucination. For transferring motion to the generated object, after key point tracking, we estimate a rigid transformation between the key points, and propagate the pixels along the transformation. This dynamic scene consists of 16 frames, in which the first frame includes the first seed image. As shown in the figures, we successfully obtained an overall convergence on the generated object with correct motion for all the illustrated frames. This paper is available on arxiv under CC 4.0 license. Authors: (1) Han Jiang, HKUST and Equal contribution (hjiangav@connect.ust.hk); (2) Haosen Sun, HKUST and Equal contribution (hsunas@connect.ust.hk); (3) Ruoxuan Li, HKUST and Equal contribution (rliba@connect.ust.hk); (4) Chi-Keung Tang, HKUST (cktang@cs.ust.hk); (5) Yu-Wing Tai, Dartmouth College, (yu-wing.tai@dartmouth.edu). Authors: Authors: (1) Han Jiang, HKUST and Equal contribution (hjiangav@connect.ust.hk); (2) Haosen Sun, HKUST and Equal contribution (hsunas@connect.ust.hk); (3) Ruoxuan Li, HKUST and Equal contribution (rliba@connect.ust.hk); (4) Chi-Keung Tang, HKUST (cktang@cs.ust.hk); (5) Yu-Wing Tai, Dartmouth College, (yu-wing.tai@dartmouth.edu). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Related Work 2.1. NeRF Editing and 2.2. Inpainting Techniques 2.1. NeRF Editing and 2.2. Inpainting Techniques 2.3. Text-Guided Visual Content Generation 2.3. Text-Guided Visual Content Generation 3. Method 3. Method 3.1. Training View Pre-processing 3.1. Training View Pre-processing 3.2. Progressive Training 3.2. Progressive Training 3.3. 4D Extension 3.3. 4D Extension 4. Experiments and 4.1. Experimental Setups 4. Experiments and 4.1. Experimental Setups 4.2. Ablation and comparison 4.2. Ablation and comparison 5. Conclusion and 6. References 5. Conclusion and 6. References 4. Experiments For our experiments, we select from dynamic scenes in the Nvidia Dynamic Scenes Dataset [35]. Scenes in this dataset are captured using a sparse set of 12 stationary cameras located in two rows, producing images of resolution 1015×1920. The static scenes we use are taken from one frame from the dynamic scenes. For the backbone NeRF, we use static and dynamic versions of K-Planes [25] implemented in nerfstudio [30]. For each scene, we conduct inpainting by replacing a foreground object with another text-prompted object with a different geometry. We will demonstrate the effectiveness of our method by showing the qualitative intermediate and final results. In addition, we will explain different parts of our design by ablations and comparisons on our baseline. 4.1. Qualitative results 3D Examples. We show several 3D inpainting examples in figure 2. For each individual inpainting task, we show 2 renderings of the final NeRF from different views to demonstrate the multiview consistency. Additionally, we show the first seed image, another pre-processed image, as well as the RGB and depth map in the three stages: before training, after warmup training, and after convergence. These beforeand-after images demonstrate the efficacy of each stage in our method. As shown in Figure 2, a roughly consistent preprocessed image can optimize a coarse inpainted NeRF after warmup training, and the geometry (represented by depth map) converges during warmup training. Then, fine convergence across views is achieved after the final training stage. All 3D inpainting tasks are trained on a single Nvidia RTX 4090 GPU. Warmup training takes approximately 0.5–1 hour, and the main training stage with IDU takes approximately 1–2 hours. 3D Examples. 4D Example. We show a 4D inpainting example in figure 3 to demonstrate that our method has the potential to generalize to dynamic NeRFs. In this example, we remove the foreground object in the video of the seed view using E2FGVI [11], a flow-based method with optimization by feature propagation and content hallucination. For transferring motion to the generated object, after key point tracking, we estimate a rigid transformation between the key points, and propagate the pixels along the transformation. This dynamic scene consists of 16 frames, in which the first frame includes the first seed image. As shown in the figures, we successfully obtained an overall convergence on the generated object with correct motion for all the illustrated frames. 4D Example. This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

NeRF Editing and Inpainting Techniques: Experiments and Qualitative results

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

NeRF Editing and Inpainting Techniques: 4D Extension

NeRF Editing and Inpainting Techniques: NeRF Editing and Inpainting Techniques

NeRF Editing and Inpainting Techniques: Text-Guided Visual Content Generation

NeRF Editing and Inpainting Techniques: Method

NeRF Editing and Inpainting Techniques: Ablation and comparison

102 Languages, One Model: The Multimodal AI Breakthrough You Need to Know

NeRF Editing and Inpainting Techniques: 4D Extension

NeRF Editing and Inpainting Techniques: NeRF Editing and Inpainting Techniques

NeRF Editing and Inpainting Techniques: Text-Guided Visual Content Generation

NeRF Editing and Inpainting Techniques: Method

NeRF Editing and Inpainting Techniques: Ablation and comparison

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps