NeRFiller: Completing Scenes via Generative 3D Inpainting

Abstract

We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2x2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions.

Completing Large Unknown Regions

Using NeRFiller, we can iteratively update a dataset to inpaint a scene. Our methods are shown on the bottom right against various baselines. Click here to see all results of the baselines on 10 datasets. "LaMask" and "SD Image Cond" are our adaptations of SPIn-NeRF to our problem setting. "Inpaint + DU" is an adaptation of Instruct-NeRF2NeRF.

Interactive Comparisons

Select a method and dataset to see the results. Ours is on the right.

Reference-Based Inpainting

Using NeRFiller, you can inpaint one image and use it as a reference to guide the 3D infilling.

Method

We find that tiling images in a 2x2 grid produces more 3D consistent inpaints. We extend this "Grid Prior" property to more than 4 images by averaging diffusion model predictions, and we show how to use it iteratively in the NeRF framework with dataset updates. Please see the paper for more details.

Multi-View Consistent Inpainting

With NeRFiller, we can inpaint many images jointly. We accomplish this by using the our "Grid Prior" and "Joint Multi-View Inpainting" strategies. We compare various inpainting strategies, where we inpaint once and then NeRF the scene to check for 3D consistency. Click here to see all results for all baselines on all 8 NeRF-synthetic scenes, for the test splits. "SD" stands for Stable Diffusion. "Extended Attention" is introduced in Tune-A-Video.

Related Works

There's a lot of work that we found relevant and useful while working on this project.

Relevant works include SPIn-NeRF, InpaintNeRF360, "Removing Objects From Neural Radiance Fields", "Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting", and "Reference-guided Controllable Inpainting of Neural Radiance Fields". These works primarily focus on removing objects, while our work is for the more general setting of scene completition. When we adapt SPIn-NeRF to our setting, we are implementing it via inpainting the dataset images once and training with a patch-based perceptual LPIPS loss in the Nerfstudio framework. "LaMask" inpaints once with LaMa and "SD" inpaints once with Stable Diffusion.

We were also inspired by "Visual Prompting via Image Inpainting" and MultiDiffusion for various ideas in our method.

Instruct-NeRF2NeRF is relevant for how we do dataset updates, where we use dataset updates instead of Score Distillation Sampling (SDS). We differ by using an inpainting model, updating many images at a time, and linearly anneal the noise schedule.

There is a related concurrent work called Inpaint3D, which uses an inpainting model and SDS to inpaint a scene.

Acknowledgements

This project is supported in part by IARPA DOI/IBC 140D0423C0035. We would like to thank Frederik Warburg, David McAllister, Qianqian Wang, Matthew Tancik, Grace Luo, Dave Epstein, Riley Peterlinz for discussions and technical support. We also thank Ruilong Li, Evonne Ng, Adam Rashid, Alexander Kristoffersen, Rohan Mathur, Jonathan Zakharov for proofreading drafts and providing feedback.

BibTeX

Please consider citing our work if you find it useful.

@inproceedings{weber2023nerfiller,
  title = {NeRFiller: Completing Scenes via Generative 3D Inpainting},
  author = {Ethan Weber and Aleksander Holynski and Varun Jampani and Saurabh Saxena and
    Noah Snavely and Abhishek Kar and Angjoo Kanazawa},
  booktitle = {CVPR},
  year = {2024},
}