A deep learning method for building height estimation using high-resolution multi-view imagery over urban areas: A case study of 42 Chinese cities.
We introduce high-resolution ZY-3 multi-view images to estimate building height at a spatial resolution of 2.5 m. We propose a multi-spectral, multi-view, and multi-task deep network (called M3Net) for building height estimation, where ZY-3 multi-spectral and multi-view images are fused in a multi-task learning framework. By preprocessing the data from Amap (details can be seen in the Section 2 of the paper), we obtained 4723 samples from the 42 cities (Table 1), and randomly selected 70%, 10%, and 20% of them for training, validation, and testing, respectively. Paper link (website)
- pytorch >= 1.8.0 (lower version can also work)
- python >=3.6
See the sample directory. Due to the copyright problem, the whole dataset is not available publicly now. However, the reference height data from Amap can be accessible for research use. Here is the download link and extraction code is 4gn2 ). The provided data is original one, and preprocessing is needed before use.
for the sample directory:
--img: the multi-spectral images with four bands (B, G, R, and NIR)
--lab: the building height (unit: meter)
--lab_floor: the number of floors of buildings
--tlc: the multi-view images with three bands (nadir, forward, and backward viewing angles)
Note that it is a good start to use the open ZY3 data from the ISPRS organization, see link.
Take Hong Kong, China for example:
This image can be used to test the performance of the pretrained building height model.
- References can be seen in https://www.cnblogs.com/enviidl/p/16541009.html
- One-by-one steps: ortho-rectification, image-to-image registration, pan-sharpening, radiometric correction (i.e., quick atmospheric correction (QUAC)), and image cropping.
- Software: ENVI 5.3
- The resolution of all images at each step is set to 2.5 m.
- The detailed procedures are shown below:
Apply the ENVI tool called RPC orthorectification workflow
to all ZY-3 images including multi-spectral and nadir, backward, and forward images.
Apply the ENVI tool called Image Registration workflow
to nadir image (as reference) and other images (as warp images).
Thus, all warp images can be registered to the reference image.
Apply the ENVI tool called Gram-Schmidt Pan Sharpening
to original multi-spectral and nadir images.
Thus, the two images can be fused to generate high-resolution multi-spectral images.
Note that all original images from the data provider have been radiometrically corrected, but they still suffer from
atmospheric effects.
Thus, apply the ENVI tool called quick atmospheric correction (QUAC)
to the fused multi-spectral images from step 3.
All images should be cropped at the same size.
Apply the ENVI tool called layer stacking
to multi-spectral and multi-view images.
data_path = r'sample' # the path of images
resume = r'runs\tlcnetu_zy3bh\V1\finetune_298.tar' # the path of pretrained weights
- whole image
usejupyterlab
to run the following code: (firstpip install jupyterlab
, then typejupyter lab
in the command prompt.
0311_predict_tlcnetU_process_wholeimg.ipynb
- testset
python pred_zy3bh_tlcnetU.py # the proposed model with two encoders for multi-spectral and multi-view images
python pred_zy3bh_tlcnetU_mux.py # the model with one encoder for multi-spectral images
python pred_zy3bh_tlcnetU_tlc.py # the model with one encoder for multi-view images
python pred_zy3bh_tlcnetU_tlcmux.py # the model with one encoder for the stacking image from multi-spectral and multi-view images along the channel dimension
Project the building height into the footprint.
demo_deeppreed.m
python train_zy3bh_tlcnetU_loss.py
see the pretrained model in directory runs/
python evaluate.py
If there is any issue, please feel free to contact me. The email adress is yinxcao@163.com or yinxcao@whu.edu.cn, and researchgate link
update on 2022.2.26
We directly applied the trained model in China to Bangalore, and obtained amazing results as follows.
Note that the acquisition dates of the ZY-3 images and Google images are different, as well as their spatial resolutions,
and therefore,there are some differences between google images and our results.
The above results show that our method outperforms random forest method, and shows rich details of buildings.
If you find this repo useful for your research, please consider citing the paper
@article{cao2021deep,
title={A deep learning method for building height estimation using high-resolution multi-view imagery over urban areas: A case study of 42 Chinese cities},
author={Cao, Yinxia and Huang, Xin},
journal={Remote Sensing of Environment},
volume={264},
pages={112590},
year={2021},
publisher={Elsevier}
}
Thanks for advice from the supervisor Xin Huang, Doctor Mengmeng Li, Professor Xuecao Li, and anonymous reviewers.
@article{mshahsemseg,
Author = {Meet P Shah},
Title = {Semantic Segmentation Architectures Implemented in PyTorch.},
Journal = {https://github.com/meetshah1995/pytorch-semseg},
Year = {2017}
}