CVPR 2024
Yuqi Zhang · Guanying Chen · Jiaxing Chen · Shuguang Cui
Project Page
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D.
Overview
This repository contains the following components to train Aerial Lifting:
- Dataset processing scripts, including:
- far-view semantic label fusion;
- cross-view instance label grouping.
- Training and evaluation scripts.
Note: This is a preliminary release and there may still be some bugs.
Installation
Create new conda env (CUDA)
-
Clone this repo by:
git clone https://github.com/zyqz97/Aerial_lifting.git
-
Create a conda environment (installation via anaconda is recommended.
conda create -n aeriallift python=3.9 conda activate aeriallift
-
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
-
tiny-cuda-nn and others
pip install -r requirements.txt pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
-
Install the extension of torch-ngp
cd ./gp_nerf/torch_ngp/gridencoder python setup.py install cd ../raymarching python setup.py install cd ../shencoder python setup.py install
-
Follow the official neuralsim to install nr3d_lib.
-
Install SAM
git clone https://github.com/facebookresearch/segment-anything.git cd segment-anything pip install -e . cd tools/segment_anything wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Tested environments
Ubuntu 20.04 with torch 1.10.1 & CUDA 11.3 on A100 GPU.
Data Processing & Training Step
-
We take Yingrenshi dataset as an example. And you need to set 'dataset_path=$YOURPATH/Aerial_lifting_data/Yingrenshi' and 'config_file=configs/yingrenshi.yaml'.
-
We also provide the processed data in the next section. The training scripts (Step 1.1, Step 2.4, and Step 3.3) can be run directly if you download the processed data.
Step 1. Training Geometry
-
1.1 Train the geometry field.
Note: $exp_name denotes the logs_saving path (e.g. exp_name=logs/train_geo_yingrenshi)sh bash/train_geo.sh
Step 2. Training Semantic Field
-
2.1 Get Mask2former semantic labels
For generating semantic labels of Mask2former, please use our modified version of Mask2former from here. You need to create a new conda env. This code is largely based on MaskFormer and a modified version of Panapti-Lifting.
After installing the environment of Mask2former:
sh bash/2_1_m2f_labels.sh
-
2.2 Render far-view RGB images from the checkpoint of Step 1.
sh bash/2_2_get_far_view_images.sh
Note: need to specify $M2F_path, $exp_name, $ckpt_path
-
2.3 Get fusion semantic label.
sh bash/2_3_fusion.sh
-
2.4 Train the semantic field.
After processing or downloading the data, you can use the script below to train the semantic field.
sh bash/train_semantic.sh
3. Training Instance Field
-
3.1 Generate the SAM instance mask with geo-filter
sh bash/3_1_get_sam_mask_depth_filter.sh
-
3.2 Generate the cross-view guidance map
sh bash/3_2_cross_view_process.sh
-
3.3 Train the instance field.
After processing or downloading the data, you can use the script below to train the instance field.
sh bash/train_instance.sh
Processed Dataset & Trained Models.
Download the processed data and trained checkpoints.
We thank the authors for providing the datasets. If you find the datasets useful in your research, please cite the papers that provided the original aerial images:
@inproceedings{UrbanBIS,
title = {UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building Instance Segmentation},
author = {Guoqing Yang and Fuyou Xue and Qi Zhang and Ke Xie and Chi-Wing Fu and Hui Huang},
booktitle = {SIGGRAPH},
year = {2023},
}
@inproceedings{UrbanScene3D,
title={Capturing, Reconstructing, and Simulating: the UrbanScene3D Dataset},
author={Liqiang Lin and Yilin Liu and Yue Hu and Xingguang Yan and Ke Xie and Hui Huang},
booktitle={ECCV},
year={2022},
}
Citation
If you find this work useful for your research and applications, please cite our paper:
@inproceedings{zhang2024aerial,
title={Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery},
author={Zhang, Yuqi and Chen, Guanying and Chen, Jiaxing and Cui, Shuguang},
booktitle={CVPR},
year={2024}
}
Acknowledgements
Large parts of this codebase are based on existing work in the Mega-NeRF, torch-ngp, neuralsim, Panoptic-Lifting, Contrastive-Lift, SAM, Mask2Former. We thank the authors for releasing their code.