Official Pytorch implementation for NeurIPS 2022 paper "Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation”
This code is developed with Python 3.6, PyTorch 1.6.0. We follow VLN-CE to install Habitat-Sim and Habitat-Lab. Then clone this repository and install requirements. (More details in SETUP.md)
git clone https://github.com/PeihaoChen/WS-MGMap.git
cd WS-MGMap
pip install -r requirements.txt
Follow the instructions in VLN-CE to download Matterport3D scenes to data/scene_datasets
folder and VLN-CE datasets to data/datasets
folder and corresponding episodes data.
Download the cache ground-truth semantic map here to data/map_data
folder as the supervision for the semantic hallucination.
The pre-trained semantic segmentation model for the semantic segmentation and DD-PPO model for the navigation control can be found here. Download it to data/pretrain_model
folder
This code expects all data files in the following structure:
WS-MGMap
├─ data
| ├─ datasets
| | ├─ R2R_VLNCE_v1-2
| | ├─ R2R_VLNCE_v1-2_preprocessed
| ├─ map_data
| | ├─ semantic
| | | ├─ train
| | | | ├─ ep_0.npy
| | | | ├─ ...
| | | ├─ train_aug
| | | | ├─ ep_0.npy
| | | | ├─ ...
| ├─ pretrain_model
| | ├─ ddppo-models
| | | ├─ gibson-2plus-resnet50.pth
| | ├─ unet-models
| | | ├─ 2021_02_14-23_42_50.pt
| ├─ scene_datasets
| | ├─ mp3d
| | | ├─ 1LXtFkjw3qL
| | | ├─ ...
We provide our trained models here for reproducing the results shown in the paper. Run the following to evaluate a trained model:
export CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 run.py \
--run-type eval \
-c vlnce_baselines/config/CMA_AUG_DA_TUNE.yaml \
-e $PATH_TO_SAVE_RESULT$ \
EVAL_CKPT_PATH_DIR $PATH_TO_TRAINED_MODEL$ \
NUM_PROCESSES 1 \
use_ddppo True
STAGE1: Run the following for teacher forcing training on augmented data:
export CUDA_VISIBLE_DEVICES=0,1,2
python -m torch.distributed.launch --nproc_per_node=3 run.py \
-c vlnce_baselines/config/CMA_AUG.yaml \
-e $PATH_TO_SAVE_RESULT$ \
NUM_PROCESSES 6 \
DAGGER.BATCH_SIZE 8
STAGE2: Run the following for dagger training to fine-tune the model:
export CUDA_VISIBLE_DEVICES=0,1,2
python -m torch.distributed.launch --nproc_per_node=3 run.py \
-c vlnce_baselines/config/CMA_AUG_DA_TUNE.yaml \
-e $PATH_TO_SAVE_RESULT$ \
NUM_PROCESSES 5 \
DAGGER.BATCH_SIZE 8 \
DAGGER.CKPT_TO_LOAD $PATH_TO_MODEL_FROM_STAGE1$
If you use or discuss WS-MGMap in your research, please consider citing the paper as follows
@article{chen2022weakly,
title={Weakly-supervised multi-granularity map learning for vision-and-language navigation},
author={Chen, Peihao and Ji, Dongyu and Lin, Kunyang and Zeng, Runhao and Li, Thomas H and Tan, Mingkui and Gan, Chuang},
journal={arXiv preprint arXiv:2210.07506},
year={2022}
}