This repository is now deprecated for the newest version of the LED task which operates on top of the navigation graph. Please see the New Repository
This repository is the orginal implementation of the paper Where Are You? Localization from Embodied Dialog [project website]
This project is developed with Python 3.6 and PyTorch
Clone this repository and install the rest of the dependencies:
git clone https://github.com/batra-mlp-lab/WAY.git
cd WAY
python -m pip install -r requirements.txt
python nltk_requirements.py
mkdir data/logs
mkdir data/logs/tensorboard
mkdir data/logs
mkdir lingUnet/vis
You will need to download the WAY dataset described here into the data folder.
Dataset | Extract path | Size |
---|---|---|
way_splits.zip | data/way_splits/ |
2 MB |
word_embeddings.zip | data/word_embeddings/ |
13 MB |
floorplans.zip | data/floorplans/ |
103 MB |
connectivity.zip | data/connectivity/ |
1 MB |
Downloading the dataset:
cd data
# WAY Splits
mkdir way_splits/
cd way_splits
gdown 'https://drive.google.com/uc?id=19env7HjYpgimenS8CJA_1iqoCi_kVDux'
unzip way_splits.zip
rm way_splits.zip
# Word Embeddings
gdown 'https://drive.google.com/uc?id=1gC6Y4jqFOFkKFLSiqkt_ZGU4MM0vYIW7'
unzip word_embeddings.zip
rm word_embeddings.zip
# Floorplans
gdown 'https://drive.google.com/uc?id=1_JHaTxty1cnZHnBKUWcNIgAPyCFx0nR7'
unzip floorplans.zip
rm floorplans.zip
# Graph Connectivity
gdown 'https://drive.google.com/uc?id=1LQ__PGY1KSNjfmGK_YqZezkSwqtdYu9c'
unzip connectivity.zip
rm connectivity.zip
We provide a trained lingUnet-skip model described in the paper for the LED task. These models are hosted on Google Drive and can be downloaded as such:
cd data
mkdir models
# LingUNet-Skip (65MB)
gdown 'https://drive.google.com/uc?id=1WTHyDEpn-4wRnvGkXCm_g7bm5_gBB8oQ'
- In the paper we show accuracy on the LED task as defined by euclidean distance to obtain these results just run the eval.sh script with the default parameters and the provided model.
- For future analysis we are now recommending using geodesic distance to calculate Localization Error.This will allow better comparison across different different map representations during evaluation and allows for calculating distances between predictions with multi-story enviroments. We have added code to snap our pixel prediction to a node in the scene graph and then calculate the geodesic distance to the true location using the scene graph. We now evaluate accuracy at 0m, 5m, 10m and geodesic localization error. We can see the 0m accuracy is up and 5m accuracy is down which is to be expected.
Results from LingUNet with geodesic distance and snap to scene graph - over single floor only (final floor was provided during evaluation):
Model | LE | 0m | 5m | 10m |
---|---|---|---|---|
LingUNet-Skip | 7.62+-0.6 | 0.23+-0.024 | 0.567+-0.028 | 0.76+-0.024 |
Random Node* | 15.08+-0.68 | 0.016+-0.007 | 0.174+-0.022 | 0.37+-0.028 |
Model | LE | 0m | 5m | 10m |
---|---|---|---|---|
LingUNet-Skip | 9.9+-0.39 | 0.092+-0.012 | 0.375+-0.02 | 0.655+-0.02 |
Random Node* | 12.35+-0.41 | 0.019+-0.006 | 0.225+-0.017 | 0.499+-0.021 |
The lingUnet/run.py
script is how training and evaluation is done for all model configurations.
For testing use lingUnet/run_scripts/eval.sh
For training use lingUnet/run_scripts/base.sh
Before running these scripts you will need to change the BASEDIR
path to the location of this repo.
Additionally use these files to change the parameters of the model which are set to default values in lingUnet/cfg.py
For evalutation you can run
./lingUnet/run_scripts/eval.sh
The model which to run will have to be set in the eval.sh file and as well as change the BASEDIR
path to the location of this repo. The file will evaluate the val splits and create a file of predictions for the test set.
In the paper we show accuracy on the LED task as defined by euclidean distance to obtain these results just run the eval.sh script with the default parameters and the provided model. Note the parameter distance_metric
needs to be set to "euclidean". We now suggest running with geodesic distance to obtain these results please change the parameter distance_metric
to "geodesic". Please see above for explanation of this chance and the results in terms of geodesic distance.
In order to run the ablations experiments presented in the paper or other parameters for running the model you can change the arguments in /lingUnet/run_scripts/{}.sh
or in /lingUnet/run_scripts/cfg.py
- To make Language Changes change the parameter:
language_change
Options:
locator_only
observer_only
first_half
second_half
none
- To zero out the inputs or use data augmentation set the boolean parameters:
blind_lang
,blind_vis
anddata_aug
We presented 4 non-learning baselines in the paper, random pixel, random viewpoint, center pixel and a heuristic sliding window approach
- To run the heuristic sliding window approach go into
Baselines
runpython run_sliding_window.py
- To run random and center baselines go into
Baselines
runpython paper_baselines.py
If you find something wrong or have a question, feel free to open an issue. If you would like to contribute, please install pre-commit before making commits in a pull request:
python -m pip install pre-commit
pre-commit install
If you use the WAY dataset in your research, please cite the following paper:
@inproceedings{hahn2020you,
title={Where Are You? Localization from Embodied Dialog},
author={Hahn, Meera and Krantz, Jacob and Batra, Dhruv and Parikh, Devi and Rehg, James and Lee, Stefan and Anderson, Peter},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={806--822},
year={2020}
}