Skip to content

Latest commit

 

History

History
172 lines (95 loc) · 8.56 KB

README.md

File metadata and controls

172 lines (95 loc) · 8.56 KB

3D-Reconstruction from Single RGB Image

Project:

The model takes a single RGB image as input and attempts at creating a 3D mesh of the scene visible in the image by the methods of panoptic segmentation, masking, Mesh R CNN and then concatenation of alignment aware meshes to present the output.

Input Image:

Output:

Table of Contents

About the Project:

Our Approach consists of first performing Panoptic Segmentation on the given image. This step associates distinct objects present in the scene with unique hues. This association of objects with unique hues is then used to create masks of those object from the input RGB Image.

We create masks in order to aid the Mask-RCNN modality which is responsible to create masks for the objects present and then Mesh-RCNN creates mesh of the important objects present in the image.

After the meshes are produced, they are then concatenated together in order to reconstruct the complete 3D Scene. Concatenation should result in the meshes being perfectly aligned with each other and with the camera as present in the input RGB image

We've used ShapeNet Dataset which contains huge CAD amounts of model from diverse categories. This dataset is standard when it comes to ML model building for 3D applications. We've also evaluated our model on the challenging dataset of Pix3D. This dataset consists of real life images and models of objects which are aligned with the image provided making it a one of a kind dataset, as it helps yield reasonable output even when challenged with real-life images.

Process Flow

  1. Input: The complete model will have the input in the form of a single RGB image. The image file can be in .jpg or .png file format.

  1. Panoptic segmentation: For the given image with our ML Model, panoptic segmentation will be applied on the given input image. As Panoptic Segmentation is the combination of instance segmentation as well as semantic segmentation, we get the regions of the objects present but as well as the distinct regions different classification of objects present in the scene

  1. Generation of masks: With the help of the regions obtained by the Panoptic Segmentation, we then move towards generating masks of the distinct object instances present in the image. We perform this step specially to aid the formation of better masks by the Mask RCNN which is the primary input for the Mesh Modality which creates mesh for individual objects.

Generation of separate mask for every instance. Save only instances of those classes on which model is trained

  1. Generation of individual mesh: By the now obtained refined and accurate masks of the the objects, mesh are created singular objects by the mesh formation block applied in Mesh RCNN. A rough voxel grid is first formed for the image and which is then refined by Mesh Refinement, following a coarse-to-fine approach which creates an ideal mesh.

Inference can be run on Colab T4 GPU

  1. Concatenation of meshes: We use the functions offered by the Open3D library in order to achieve the final mesh. The final mesh consists of all the previous individual meshes aligned with each other and with the camera as present in the input image.

File Structure

📦3D-Reconstruction 
 ┣ 📂assets                            # Contains gifs, objs and images of the results 
 ┣ 📂scripts                           # Python programs to run 
 ┃ ┣ segment_and_mask.py               # Used to create and save masks of objects from input image
 ┃ ┣ inference.ipynb                   # Run this notebook to get results
 ┣ 📜README.md
 ┣ 📜demo_video.gif                    # Demo Video
 ┣ 📜project_report.docx               # Project Report
 ┗ 📜requirements.txt                  # Requirements

Architecture and Dataset

The Mesh that gets generated from the masked image is done on the basis of the Mesh RCNN architecture

It has been trained upon Pix3D dataset

Installations and Execution

Project was tested on Ubuntu 22.04 and T4 GPU offered by Google Colab

Cloning into device

git clone https://github.com/lbhsnh/3D-Reconstruction.git

cd 3D-Reconstruction

Create a virtual env for the project

pip install requirements.txt

Rest all dependencies will be taken care of by the scripts

cd scripts

python3 segment_and_mask.py

then run the colab file inference.ipynb

  • To view .obj file

You can use Open3d to view the saved mesh or use :

Online 3D Viewer

Tech Stack

  • Open3D

  • Pytorch3D

  • Detectron2

  • Ultralytics

Future Prospects

  • Till now we’re able to create a combined mesh which aligns with the image. In future we aim at reconstruction of the wall and floor in order to create the entire scene present.

  • The model is restricted only to a defined number of interior objects such as bed, couch, chair because of the limited number of classes present in the Pix3D dataset. We aim at improving the dataset by either finding a more diverse dataset or adding additional categories to the existing dataset.

  • Due to GPU constraints we were unable to train the model to get an improved output. Therefore we plan to train on our new modified and diverse dataset in order to improve the diversity as well as the quality of mesh being produced.

Mentor

Contributors

Acknowledgements and Resources

Open 3D library documentation

Pixel2Mesh Paper by Nanyang Wang et al

For Image Segmentation Methods

Mesh R CNN by Justin Johnson et al

Detectron2