This is the repository that contains source code for the work Where am I?: Scene Retrieval with Language.
First download the model weights from here and place it in /playground/graph_models/model_checkpoints/graph2graph/
.
The necessary data files also need to be downloaded from here and placed into /playground/graph_models/
.
Then run the run_eval.sh
script in /shell/
.
Run the run.sh
script in /shell/
.
The CLIP2CLIP baseline can be found and run in the /baselines/CLIP2CLIP/
folder. And the Text2Pos baseline can be found in this fork.
The model weights for the fine-tuned version of Text2Pos can be found here, and for the version trained from scratch on the 3DSSG dataset is here.
In order to run the Text2Pos models, you can run run_text2pos.sh
for training and run_eval_text2pos.sh
for evaluation.