The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
This is the repository of ORIST (ICCV 2021).
Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.
- Implemented distributed data parallel training (pytorch).
- Some code optimization for fast training
- Install Docker with GPU support (There are lots of tutorials, just google it.)
- Pull the docker image:
docker pull qykshr/ubuntu:orist
-
Download the processed data and pretrained models:
- Processed data:
- For evaluation only:
- For training:
-
Build Matterport3D simulator
Build OSMesa version using CMake:
mkdir build && cd build cmake -DOSMESA_RENDERING=ON .. make cd ../
Other versions can refer to here
-
Run inference:
sh eval_scripts/xxx.sh
-
Run training:
sh run_scripts/xxx.sh
If this code or data is useful for your research, please consider citing:
@inproceedings{orist,
author = {Yuankai Qi and
Zizheng Pan and
Yicong Hong and
Ming{-}Hsuan Yang and
Anton van den Hengel and
Qi Wu},
title = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
booktitle = {ICCV},
pages = {1655--1664},
year = {2021}
}
@inproceedings{reverie,
author = {Yuankai Qi and
Qi Wu and
Peter Anderson and
Xin Wang and
William Yang Wang and
Chunhua Shen and
Anton van den Hengel},
title = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
Environments},
booktitle = {CVPR},
pages = {9979--9988},
year = {2020}
}