Novel view reconstruction based on only 2 input images is an important but extremely challenging task. pixelSplat is a potential solution that was shown to deliver high-quality results at a competitive speed. We first evaluate pixelSplat on more challenging reconstruction tasks by applying cam- era positions that are further away from each other, and find that its performance is heavily impacted. We then explore 2D image enhancement methods to fix the corrupted novel view images. A diffusion model-based solution proves to be able to restore significantly impacted areas, but fails to stay consistent with the original scene even after long fine- tuning, resulting in flickering videos. An alternative solu- tion based on an image restoration model results in pleasant videos and quantitative improvements in most metrics, but does not address all errors seen in the novel view images. We explore the underlying reasons for these shortcomings, and propose future research directions for fixing them.
This code builds upon the code from the paper pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann.
Check out their project website here.
demo.mp4
- Training ControlNet
- Training InstructIR: note the authors didn't provide a training script so we made one based on the information from their paper
- Inference
- Testing ControlNet on out of domain data
- Creating demo video
!python3 -m src.main +experiment=re10k mode=test test.data_loader="train" test.output_path="outputs/re10k_train_data" data_loader.train.batch_size=1 checkpointing.load=checkpoints/re10k.ckpt
This is not straightforward as we don't have sudo privileges and many default packages are outdated. Also, new versions of g++ are not compatible. After cloning the repo, execute the following commands, in order and only after the previous command is finished:
Step by step instructions
cd installation_jobs
This takes approximately 30 minutes, all others are much faster.
sbatch install_env.job
This will return an error but we will fix this afterwards.
sbatch install_packages.job
Debugging jobs:
sbatch debug.job
sbatch debug2.job
sbatch debug3.job
sbatch debug4.job
sbatch debug5.job
Now this should run without any errors.
sbatch install_packages.job
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets
folder in the project root directory.
The datasets that were used to finetune the diffusion model and InstructIR can be found on Huggingface (https://huggingface.co/datasets/Wouter01/re10k_hard)
You can find pre-trained checkpoints here. You can find the checkpoints for the original codebase (without the improvements from the camera-ready version of the paper) here.
Also the finetuned diffusion and InstructIR models can be found on Huggingface (https://huggingface.co/Wouter01/diffusion_re10k_hard, https://huggingface.co/Wouter01/InstructIR_re10k_hard)
@misc{wouter2024improve_nvs,
title={Improving novel view synthesis of 3D Gaussian splats using 2D image enhancement methods},
author={Wouter Bant and Ádám Divák and Jasper Eppink and Clio Feng and Roos Hutter},
year={2024},
url={https://github.com/adamdivak/diffusion_augmented_pixelsplat}
}
This code is mainly from https://dcharatan.github.io/pixelsplat
@inproceedings{charatan23pixelsplat,
title={pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction},
author={David Charatan and Sizhe Li and Andrea Tagliasacchi and Vincent Sitzmann},
year={2023},
booktitle={arXiv},
}