First commit

VinAIResearch · Jan 10, 2024 · 88d7804 · 88d7804
commit 88d7804
Show file tree

Hide file tree

Showing 231 changed files with 35,196 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,132 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+data
+CLIP/
+lvis-api/
+workdirs
+.history
+.vscode
+.idea
+.DS_Store
+
+# custom
+*.pkl
+*.pkl.json
+*.log.json
+
+# Pytorch
+*.pth
+*.pt
+*.py~
+*.sh~
+changelog.py
+
+checkpoints
+analysis_results*
+outputs
+lightning_logs
+notebooks
+proposals
diff --git a/README.md b/README.md
@@ -0,0 +1,128 @@
+#### **Table of contents**
+1. [Introduction](#pytorch-implementation-of-lp-ovod-open-vocabulary-object-detection-by-linear-probing-wacv-2024)
+2. [Requirements](#requirements)
+3. [Preparation](#preparation)
+4. [Training and Testing](#training-and-testing)
+5. [Contacts](#contacts)
+
+
+# **PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)**
+<a href="https://arxiv.org/abs/2310.17109"><img src="https://img.shields.io/badge/arxiv-2310.17109-red?style=for-the-badge"></a>
+
+Chau Pham, Truong Vu, Khoi Nguyen<br>
+**VinAI Research, Vietnam**
+
+> **Abstract:** 
+Preparing training data for deep vision models is a labor-intensive task. To ad-
+dress this, generative models have emerged as an effective solution for generating
+synthetic data. While current generative models produce image-level category
+labels, we propose a novel method for generating pixel-level semantic segmen-
+tation labels using the text-to-image generative model Stable Diffusion (SD). By
+utilizing the text prompts, cross-attention, and self-attention of SD, we introduce
+three new techniques: class-prompt appending, class-prompt cross-attention, and
+self-attention exponentiation. These techniques enable us to generate segmentation
+maps corresponding to synthetic images. These maps serve as pseudo-labels for
+training semantic segmenters, eliminating the need for labor-intensive pixel-wise
+annotation. To account for the imperfections in our pseudo-labels, we incorporate
+uncertainty regions into the segmentation, allowing us to disregard loss from those
+regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO,
+and our approach significantly outperforms concurrent work.
+
+![teaser.png](./assets/approach_official.png)
+Details of the model architecture and experimental results can be found in [our following paper](https://arxiv.org/abs/2310.17109).<br>
+Please **CITE** our paper whenever this repository is used to help produce published results or incorporated into other software.
+```bibtex
+@inproceedings{pham2024lp,
+ title={LP-OVOD: Open-Vocabulary Object Detection by Linear Probing},
+ author={Pham, Chau and Vu, Truong and Nguyen, Khoi},
+ booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
+ pages={779--788},
+ year={2024}
+}
+```
+
+## Requirements
++ python3.8
++ pytorch 1.7.0
++ cuda 11.0
++ OpenAI CLIP
++ faiss
+
+```
+pip install -r requirements/build.txt
+pip install -e .
+pip install git+https://github.com/openai/CLIP.git
+pip uninstall pycocotools -y
+pip uninstall mmpycocotools -y
+pip install mmpycocotools
+pip install git+https://github.com/lvis-dataset/lvis-api.git
+pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
+conda install -c pytorch faiss-gpu
+```
+
+## **Preparation**
+### Data
+Download the following dataset [COCO](https://cocodataset.org/#home). 
+
+All models use the backbone pretrained with [SoCo](https://github.com/hologerry/SoCo). Download the [pretrained backbone](https://drive.google.com/file/d/1z6Tb2MPFJDv9qpEyn_J0cJcXOguKTiL0/view) and save to the folder `weights`. Also save the pretrained CLIP model to `weights`. 
+
+### Code structure
+```
+├── configs
+├── mmdet
+├── weights
+├── tools
+├── prepare
+├── retrieval
+├── scripts
+├── ovd_coco_text_embedding.pth
+├── data
+│ ├── coco
+│ │ ├── annotations
+│ │ ├── train2017
+│ │ ├── val2017
+
+```
+
+### Extract the CLIP text embeddings for COCO classes (Optional)
+```
+python ./prepare/clip_utils.py
+```
+A file `ovd_coco_text_embedding.pth` will be created (we have already extracted this for you).
+
+### Download OLN proposals
+Download OLN proposals from [this link](#)
+
+### Extract the CLIP visual embeddings on pre-computed proposals
+This embeddings will be used for computing the Knowledge Distillation loss and retrieving novel proposals
+```
+python -m torch.distributed.launch --nproc_per_node=4 prepare/extract_coco_embeddings_clip.py \
+ --data_root=path_to_data_root \
+ --clip_root=weights \
+ --proposal_file=path_to_oln_proposals \
+ --num_worker=48 \
+ --batch_size=128 \
+ --split=train \
+ --save_path=coco_clip_emb_train.pth \
+```
+Change `num_workers` and `batch_size` according to your machine.
+A file `coco_clip_emb_train.pth` (which is over 100GB) will be created, so please check for enough disk space before extracting.
+
+## Training and Testing
+### Pretraining for Base Classes
+```
+sh ./scripts/vild_sigmoid.sh
+```
+
+### Few-shot Fine-tuning for Novel Classes
+```
+sh ./scripts/vild_sigmoid_ft.sh
+```
+
+### Test the model on Both Base and Novel Classes
+```
+sh ./scripts/vild_sigmoid_test.sh
+```
+
+## **Contacts**
+If you have any questions about this project, contact via truongvu0911nd@gmail.com or open an issue in this repository
diff --git a/assets/approach_official.png b/assets/approach_official.png
diff --git a/configs/_base_/datasets/coco_detection.py b/configs/_base_/datasets/coco_detection.py
@@ -0,0 +1,105 @@
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+# img_norm_cfg = dict(
+# mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+
+img_norm_cfg = dict(
+ mean=[122.770935, 116.74601 , 104.093735], std=[68.500534, 66.63216 , 70.323166], to_rgb=True)
+
+train_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(type='LoadProposals', num_max_proposals=None),
+ dict(type='LoadAnnotations', with_bbox=True),
+ # dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
+ dict(
+ type='Resize',
+ img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
+ (1333, 768), (1333, 800)],
+ multiscale_mode='value',
+ keep_ratio=True),
+ dict(type='RandomFlip', flip_ratio=0.5),
+ dict(type='Normalize', **img_norm_cfg),
+ dict(type='Pad', size_divisor=32),
+ dict(type='DefaultFormatBundle'),
+ dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
+]
+
+test_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(type='LoadProposals', num_max_proposals=None),
+ # dict(type='LoadAnnotations', with_bbox=True),
+ dict(
+ type='MultiScaleFlipAug',
+ img_scale=(1333, 800),
+ flip=False,
+ transforms=[
+ dict(type='Resize', keep_ratio=True),
+ dict(type='RandomFlip'),
+ dict(type='Pad', size_divisor=32),
+ dict(type='Normalize', **img_norm_cfg),
+ dict(type='ImageToTensor', keys=['img']),
+ dict(type='Collect', keys=['img', 'proposals', 'objectness']),
+ ])
+]
+
+data = dict(
+ samples_per_gpu=1,
+ workers_per_gpu=1,
+ train=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_b.json',
+ proposal_file=data_root + 'proposals/instances_val2017_proposals.pkl',
+ img_prefix=data_root + 'val2017/',
+ proposal_id_map=data_root + 'annotations/val_proposal_id_map.json',
+ pipeline=train_pipeline),
+ val=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
+ proposal_file='checkpoints/predictions/rpn_rgb.pkl',
+ proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
+ proposal_file='checkpoints/predictions/rpn_rgb.pkl',
+ proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test_oln=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
+ proposal_file='data/coco/proposals/oln_proposals.pkl',
+ proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test_novel=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_t.json',
+ proposal_file='checkpoints/predictions/rpn_rgb.pkl',
+ proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test_novel_oln=dict(
+ type=dataset_type,
+ ann_file=data_root + 'annotations/ovd_ins_val2017_t.json',
+ proposal_file='data/coco/proposals/oln_proposals.pkl',
+ proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test_lvis=dict(
+ type=dataset_type,
+ ann_file=data_root + 'data/lvis_v1/annotations/lvis_v1_val.json',
+ proposal_file='data/coco/proposals/oln_proposals.pkl',
+ proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
+ img_prefix=data_root + 'val2017/',
+ pipeline=test_pipeline),
+ test_lvis_oln=dict(
+ type=dataset_type,
+ ann_file='data/lvis_v1/annotations/lvis_v1_val.json',
+ proposal_file='data/coco/proposals/oln_proposals.pkl',
+ proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
+ img_prefix='data/lvis_v1',
+ pipeline=test_pipeline)
+)
+evaluation = dict(interval=1, metric='bbox')