L3MVN: Leveraging Large Language Models for Visual Target Navigation

This work is based on our paper. We proposed a new framework to explore and search for the target in unknown environment based on Large Language Model. Our work is based on SemExp and llm_scene_understanding, implemented in PyTorch.

Author: Bangguo Yu, Hamidreza Kasaei and Ming Cao

Affiliation: University of Groningen

Frontier Semantic Exploration Framework

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

Docker

Clone repository

git clone https://github.com/edwardjjj/l3mvn.git l3mvn

Install utility function to download hm3d dataset

conda create -n l3mvn
conda install habitat-sim-challenge-2022 headless -c conda-forge -c aihabitat -n l3mvn
conda activate l3mvn
python -m habitat_sim.utils.datasets_download --username <api-token-id> --password <api-token-secret> --uids hm3d --data-path l3mvn/data
ln -s -f ../versioned_data/hm3d-1.0/hm3d l3mvn/data/scene_datasets/hm3d

Download segmentation model from here. Put the downloaded file in l3mvn/data.
Download test set from here. Unzip and rename the folder to objectgoal_hm3d and place it in l3mvn/data
Build Docker image

docker build -t l3mvn:1.0 .

Run the image

docker run --gpus all -v .:/home/app -it l3mvn:1.0

inside the container run the following to test the feed-forward method

cd l3mvn
. activate habitat
python main_llm_vis.py --split val --eval 1 --auto_gpu_config 0 \
-n 8 --num_eval_episodes 250 --load pretrained_models/llm_model.pt \
--use_gtsem 0 --num_local_steps 10

run the following to test the zero-shot method

cd l3mvn
. activate habitat
python main_llm_zeroshot.py --split val --eval 1 --auto_gpu_config 0 \
-n 5 --num_eval_episodes 400 --num_processes_on_first_gpu 5 \
--use_gtsem 0 --num_local_steps 10 --exp_name exp_llm_hm3d_zero \

Installation

The code has been tested only with Python 3.7 on Ubuntu 20.04.

Installing Dependencies

We use challenge-2022 versions of habitat-sim and habitat-lab as specified below:
Installing habitat-sim:

git clone https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim; git checkout tags/challenge-2022; 
pip install -r requirements.txt; 
python setup.py install --headless
python setup.py install # (for Mac OS)

Installing habitat-lab:

git clone https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab; git checkout tags/challenge-2022; 
pip install -e .

Install pytorch according to your system configuration. The code is tested on pytorch v1.7.0 and cudatoolkit v11.4. If you are using conda:

conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit=11.4 #(Linux with GPU)
conda install pytorch==1.7.0 torchvision==0.8.1 -c pytorch #(Mac OS)

Install detectron2 according to your system configuration.

Download HM3D datasets:

Habitat Matterport

Download HM3D dataset using download utility and instructions:

python -m habitat_sim.utils.datasets_download --username <api-token-id> --password <api-token-secret> --uids hm3d_minival

Download additional datasets

Download the segmentation model in RedNet/model path.

Setup

Clone the repository and install other requirements:

git clone https://github.com/ybgdgh/L3MVN
cd L3MVN/
pip install -r requirements.txt

Setting up datasets

The code requires the datasets in a data folder in the following format (same as habitat-lab):

L3MVN/
  data/
    scene_datasets/
    matterport_category_mappings.tsv
    object_norm_inv_perplexity.npy
    versioned_data
    objectgoal_hm3d/
        train/
        val/
        val_mini/

For evaluation:

For evaluating the pre-trained model:

python main_llm_vis.py --split val --eval 1 --auto_gpu_config 0 \
-n 1 --num_eval_episodes 2000 --load pretrained_models/llm_model.pt \
--use_gtsem 0 --num_local_steps 10

Demo Video

video

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
RedNet		RedNet
agents		agents
configs		configs
data		data
envs		envs
habitat-lab		habitat-lab
img		img
llm_priors		llm_priors
pretrained_models		pretrained_models
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
arguments.py		arguments.py
build.sh		build.sh
constants.py		constants.py
data_generator.py		data_generator.py
main_llm_vis.py		main_llm_vis.py
main_llm_zeroshot.py		main_llm_zeroshot.py
model.py		model.py
notes.md		notes.md
objectnav.yaml		objectnav.yaml
per_frontier_llm_vis.sh		per_frontier_llm_vis.sh
per_frontier_llm_zeroshot.sh		per_frontier_llm_zeroshot.sh
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L3MVN: Leveraging Large Language Models for Visual Target Navigation

Frontier Semantic Exploration Framework

Docker

Installation

Habitat Matterport

Setup

Setting up datasets

For evaluation:

Demo Video

About

Releases

Packages

Languages

edwardjjj/L3MVN

Folders and files

Latest commit

History

Repository files navigation

L3MVN: Leveraging Large Language Models for Visual Target Navigation

Frontier Semantic Exploration Framework

Docker

Installation

Habitat Matterport

Setup

Setting up datasets

For evaluation:

Demo Video

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages