Code for our CVPR 2021 paper "Reinforced Structured State-Evolution for Vision-Language Navigation".
Contributed by Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu
-
Install the Matterport3D simulators, please follow the intructions here.
-
Clone this repository.
cd Matterport3DSimulator && mkdir methods && cd methods git clone https://github.com/chenjinyubuaa/SEvol.git
-
Install the requirements.
pip install -r requirements.txt
Please download the data and pretrained checkpoints from here. put the img_features
and task
directory under the Matterport3DSimulator
directory. The CLIP image feature downloads from here.
Following Speaker-follower and EnvDrop, we train our model on R2R as follows:
- Train the
speaker
model under the Matterport3DSimulator:
bash methods/SEvol/run/train_speaker.sh 0
- Train the
follower
model:
bash methods/SEvol/run/train_r2r.sh 0
- train with the back translation data augmentation:
bash methods/SEvol/run/train_r2r_bt.sh 0
We use the speaker model with best bleu and the follower model with the best SR on val-unseen split for the 3rd stage training
-
Use the valid.sh to test the checkpoints. Just change the checkpoint path in it
bash methods/SEvol/run/valid.sh 0
Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.
@InProceedings{Chen_2022_CVPR,
author = {Chen, Jinyu and Gao, Chen and Meng, Erli and Zhang, Qiong and Liu, Si},
title = {Reinforced Structured State-Evolution for Vision-Language Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {15450-15459}
}
CKR-nav is released under the MIT license. See LICENSE for additional details.
Some of the codes are built upon NvEM and EnvDrop. Thanks them for their great works!