By Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, and Roozbeh Mottaghi
In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. To this end, we introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
-
Requirements
We implement this codebase on Ubuntu 18.04.3 LTS and also have tried it on Ubuntu 16.
In addition, this codebase needs to be executed on GPU(s).
-
Clone this repository
git clone git@github.com:KuoHaoZeng/Interactive_Visual_Navigation.git
-
Intsall
xorg
if the machine does not have itNote: This codebase should be executed on GPU. Thus, we need
xserver
for GPU redering.# Need sudo permission to install xserver sudo apt-get install xorg
Then, do the xserver refiguration for GPU
sudo python startx.py
-
Using
conda
, create an environmentNote: The
python
version needs to be above3.6
, sincepython 2.x
may have issues with some required packages.conda env create --file environment.yml --name thor-ivn
We consider two downstream tasks in the physics-enabled, visually rich AI2-THOR environment:
(1) reaching a target while the path to the target is blocked
(2) moving an object to a target location by pushing it.
We use Kitchens, Living Rooms, Bedrooms, and Bathrooms for our experiments (120 scenes in total).
To collect the datasets, we use 20 categories of objects, including
alarm clock, apple, armchair, box, bread, chair, desk, dining table, dog bed, laptop,
garbage can, lettuce, microwave, pillow, pot, side table, sofa, stool, television, tomato.
Overall, we collect 10K training instances, 2.5K validation instances, and 2.5K testing instances for both tasks, respectively.
The data for ObjPlace and ObsNav are available in the dataset
folder.
For more information about how to control an agent to interact with the environment in AI2-iTHOR, please vist this webpage.
We currently provide the following pretrained models:
ObsNav | Model |
---|---|
PPO | Link |
NIE | Link |
ObjPlace | |
PPO | Link |
NIE | Link |
MaskRCNN | Link |
These models can be downloaded from the above links and should be placed into the pretrained_model_ckpts
directory. You can then, for example, run inference for the NIE
model on ObsNav using AllenAct by running:
export CURRENT_TIME=$(date '+%Y-%m-%d_%H-%M-%S') # This is just to record when you ran this inference
allenact -s 23456 configs/ithor_ObsNav/obstacles_nav_rgbdk_vNIE.py -c pretrained_model_ckpts/{THE_DOWNLOADED_MODEL}.pt -t $CURRENT_TIME
Note: Make sure to download the pretrained MaskRCNN
to the pretrained_model_ckpts/maskRCNN/model.pth
. In addition, you have to turn on the using_mask_rcnn
in the configs/ithor-ObsNav/obstacles_nav_base.py
to use a pretrained maskRCNN
model during the evaluation stage when testing the NIE
model.
We use the AllenAct framework for training the baseline models and our NIE
models, this framework is automatically installed when installing the requirements for this project.
Before running training or inference you'll first have to add the Interactive_Visual_Navigation
directory to your PYTHONPATH
(so that python
and AllenAct
knows where to for various modules). To do this you can run the following:
cd YOUR/PATH/TO/Interactive_Visual_Navigation
export PYTHONPATH=$PYTHONPATH:$PWD
Let's say you want to train a NIE
model on ObjPlace task. This can be easily done by running the command
allenact -s 23456 -o out -b . configs/ithor_ObjPlace/placement_rgbdk_vNIE.py
If you find this project useful in your research, please consider citing our paper:
@inproceedings{khz2021interact,
author = {Zeng, Kuo-Hao and Farhadi, Ali and Weihs, Luca and Mottaghi, Roozbeh},
title = {Pushing it out of the Way: Interactive Visual Navigation},
booktitle = {CVPR},
year = {2021}
}