Video based Object Pose Estimation using Transformers

This directory contains implementation of paper Video based Object 6D Pose Estimation using Transformers.
Accepted into NeuRIPS 2022 Workshop on Vision Transformers: Theory and Applications.

If this code helps with your work, please cite:

@article{beedu2022video,
  title={Video based Object 6D Pose Estimation using Transformers},
  author={Beedu, Apoorva and Alamri, Huda and Essa, Irfan},
  journal={arXiv preprint arXiv:2210.13540},
  year={2022}
}

Environment setup

Please install all the requirements using requirements.txt pip3 install -r requirements.txt

Directory setup

Create a ./evaluation_results_video, wandb, logs, output and model folders.

Arguments

Arguments and their defaults are in arguments.py

backbone swin or beit
use_depth To use ground-truth depth during training
restore_file name of the file in --model_dir_path containing weights to reload before training
lr Learning rate for the optimiser
batch_size Batch size for the dataset
workers num_workers
env_name environment name for wandb, which is also the checkpoint name

Setting up dataset

Download the entire YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/
The data folder looks like

train_eval.py
dataloader.py
├── data
│   ├── YCB
│   │   └── data
│   │       ├── 0000
│   │       └── 0001
│   │   └── models
│   │   └── train.txt
│   │   └── keyframe.txt
│   │   └── val.txt

Execution

python3 train_eval.py --batch_size=8 --lr=0.0001 --backbone=swin --predict_future=1 --use_depth=1 --video_length=5 --workers=12

Video based Object Pose Estimation using Transformers

This directory contains implementation for estimating 6D object poses from videos.

Environment setup

Please install all the requirements using requirements.txt pip3 install -r requirements.txt

Directory setup

Create a ./evaluation_results_video, wandb, logs, output and model folders.

Arguments

Arguments and their defaults are in arguments.py

backbone swin or beit
use_depth To use ground-truth depth during training
restore_file name of the file in --model_dir_path containing weights to reload before training
lr Learning rate for the optimiser
batch_size Batch size for the dataset
workers num_workers
env_name environment name for wandb, which is also the checkpoint name

Setting up dataset

Download the entire YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/

Download the checkpoint from https://drive.google.com/drive/folders/1lQh3G7KN-SHb7B-NYpqWj55O1WD4E9s6?usp=sharing

Add the checkpoint to ./model/Videopose/last_checkpoint_0000.pt, and pass the argument --restore_file=Videopose during training to start from a checkpoint. If no start_epoch is mentioned, the training will restart from the last checkpoint.

The data folder looks like

train_eval.py
dataloader.py
├── data
│   ├── YCB
│   │   └── data
│   │       ├── 0000
│   │       └── 0001
│   │   └── models
│   │   └── train.txt
│   │   └── keyframe.txt
│   │   └── val.txt

Execution

The project uses wandb for visualisation.

Main branch uses -posecnn.mat files, that I manually generated for every frame in the dataset using Posecnn repository. If you do not have those files, v1 is the branch to use.

python3 train_eval.py --batch_size=8 --lr=0.0001 --backbone=swin --predict_future=1 --use_depth=1 --video_length=5 --workers=12

Evaluation

Evaluation currently runs only on one GPU.

python3 train_eval.py --batch_size=8 --backbone=swin --predict_future=1 --use_depth=1 --video_length=5 --workers=12  --restore_file=Videopose --split=eval

The command will create several mat files for the keyframes and also saves images into a folder. To evaluate the mat files, please use the YCBToolBox.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
dataloader.py		dataloader.py
requirements.txt		requirements.txt
train_eval.py		train_eval.py
transformer_videopose.pdf		transformer_videopose.pdf
transformer_videopose.png		transformer_videopose.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video based Object Pose Estimation using Transformers

Environment setup

Directory setup

Arguments

Setting up dataset

Execution

Video based Object Pose Estimation using Transformers

Environment setup

Directory setup

Arguments

Setting up dataset

Execution

Evaluation

About

Releases

Packages

Languages

License

ApoorvaBeedu/VideoPose

Folders and files

Latest commit

History

Repository files navigation

Video based Object Pose Estimation using Transformers

Environment setup

Directory setup

Arguments

Setting up dataset

Execution

Video based Object Pose Estimation using Transformers

Environment setup

Directory setup

Arguments

Setting up dataset

Execution

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages