Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
truongvu2000nd committed Jan 10, 2024
0 parents commit 88d7804
Show file tree
Hide file tree
Showing 231 changed files with 35,196 additions and 0 deletions.
132 changes: 132 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

data
CLIP/
lvis-api/
workdirs
.history
.vscode
.idea
.DS_Store

# custom
*.pkl
*.pkl.json
*.log.json

# Pytorch
*.pth
*.pt
*.py~
*.sh~
changelog.py

checkpoints
analysis_results*
outputs
lightning_logs
notebooks
proposals
128 changes: 128 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#### **Table of contents**
1. [Introduction](#pytorch-implementation-of-lp-ovod-open-vocabulary-object-detection-by-linear-probing-wacv-2024)
2. [Requirements](#requirements)
3. [Preparation](#preparation)
4. [Training and Testing](#training-and-testing)
5. [Contacts](#contacts)


# **PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)**
<a href="https://arxiv.org/abs/2310.17109"><img src="https://img.shields.io/badge/arxiv-2310.17109-red?style=for-the-badge"></a>

Chau Pham, Truong Vu, Khoi Nguyen<br>
**VinAI Research, Vietnam**

> **Abstract:**
Preparing training data for deep vision models is a labor-intensive task. To ad-
dress this, generative models have emerged as an effective solution for generating
synthetic data. While current generative models produce image-level category
labels, we propose a novel method for generating pixel-level semantic segmen-
tation labels using the text-to-image generative model Stable Diffusion (SD). By
utilizing the text prompts, cross-attention, and self-attention of SD, we introduce
three new techniques: class-prompt appending, class-prompt cross-attention, and
self-attention exponentiation. These techniques enable us to generate segmentation
maps corresponding to synthetic images. These maps serve as pseudo-labels for
training semantic segmenters, eliminating the need for labor-intensive pixel-wise
annotation. To account for the imperfections in our pseudo-labels, we incorporate
uncertainty regions into the segmentation, allowing us to disregard loss from those
regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO,
and our approach significantly outperforms concurrent work.

![teaser.png](./assets/approach_official.png)
Details of the model architecture and experimental results can be found in [our following paper](https://arxiv.org/abs/2310.17109).<br>
Please **CITE** our paper whenever this repository is used to help produce published results or incorporated into other software.
```bibtex
@inproceedings{pham2024lp,
title={LP-OVOD: Open-Vocabulary Object Detection by Linear Probing},
author={Pham, Chau and Vu, Truong and Nguyen, Khoi},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={779--788},
year={2024}
}
```

## Requirements
+ python3.8
+ pytorch 1.7.0
+ cuda 11.0
+ OpenAI CLIP
+ faiss

```
pip install -r requirements/build.txt
pip install -e .
pip install git+https://github.com/openai/CLIP.git
pip uninstall pycocotools -y
pip uninstall mmpycocotools -y
pip install mmpycocotools
pip install git+https://github.com/lvis-dataset/lvis-api.git
pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
conda install -c pytorch faiss-gpu
```

## **Preparation**
### Data
Download the following dataset [COCO](https://cocodataset.org/#home).

All models use the backbone pretrained with [SoCo](https://github.com/hologerry/SoCo). Download the [pretrained backbone](https://drive.google.com/file/d/1z6Tb2MPFJDv9qpEyn_J0cJcXOguKTiL0/view) and save to the folder `weights`. Also save the pretrained CLIP model to `weights`.

### Code structure
```
├── configs
├── mmdet
├── weights
├── tools
├── prepare
├── retrieval
├── scripts
├── ovd_coco_text_embedding.pth
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
```

### Extract the CLIP text embeddings for COCO classes (Optional)
```
python ./prepare/clip_utils.py
```
A file `ovd_coco_text_embedding.pth` will be created (we have already extracted this for you).

### Download OLN proposals
Download OLN proposals from [this link](#)

### Extract the CLIP visual embeddings on pre-computed proposals
This embeddings will be used for computing the Knowledge Distillation loss and retrieving novel proposals
```
python -m torch.distributed.launch --nproc_per_node=4 prepare/extract_coco_embeddings_clip.py \
--data_root=path_to_data_root \
--clip_root=weights \
--proposal_file=path_to_oln_proposals \
--num_worker=48 \
--batch_size=128 \
--split=train \
--save_path=coco_clip_emb_train.pth \
```
Change `num_workers` and `batch_size` according to your machine.
A file `coco_clip_emb_train.pth` (which is over 100GB) will be created, so please check for enough disk space before extracting.

## Training and Testing
### Pretraining for Base Classes
```
sh ./scripts/vild_sigmoid.sh
```

### Few-shot Fine-tuning for Novel Classes
```
sh ./scripts/vild_sigmoid_ft.sh
```

### Test the model on Both Base and Novel Classes
```
sh ./scripts/vild_sigmoid_test.sh
```

## **Contacts**
If you have any questions about this project, contact via truongvu0911nd@gmail.com or open an issue in this repository
Binary file added assets/approach_official.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 105 additions & 0 deletions configs/_base_/datasets/coco_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
# img_norm_cfg = dict(
# mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

img_norm_cfg = dict(
mean=[122.770935, 116.74601 , 104.093735], std=[68.500534, 66.63216 , 70.323166], to_rgb=True)

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadProposals', num_max_proposals=None),
dict(type='LoadAnnotations', with_bbox=True),
# dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
]

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadProposals', num_max_proposals=None),
# dict(type='LoadAnnotations', with_bbox=True),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Pad', size_divisor=32),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img', 'proposals', 'objectness']),
])
]

data = dict(
samples_per_gpu=1,
workers_per_gpu=1,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_b.json',
proposal_file=data_root + 'proposals/instances_val2017_proposals.pkl',
img_prefix=data_root + 'val2017/',
proposal_id_map=data_root + 'annotations/val_proposal_id_map.json',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
proposal_file='checkpoints/predictions/rpn_rgb.pkl',
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
proposal_file='checkpoints/predictions/rpn_rgb.pkl',
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test_oln=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json',
proposal_file='data/coco/proposals/oln_proposals.pkl',
proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test_novel=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_t.json',
proposal_file='checkpoints/predictions/rpn_rgb.pkl',
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test_novel_oln=dict(
type=dataset_type,
ann_file=data_root + 'annotations/ovd_ins_val2017_t.json',
proposal_file='data/coco/proposals/oln_proposals.pkl',
proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test_lvis=dict(
type=dataset_type,
ann_file=data_root + 'data/lvis_v1/annotations/lvis_v1_val.json',
proposal_file='data/coco/proposals/oln_proposals.pkl',
proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test_lvis_oln=dict(
type=dataset_type,
ann_file='data/lvis_v1/annotations/lvis_v1_val.json',
proposal_file='data/coco/proposals/oln_proposals.pkl',
proposal_id_map='data/coco/annotations/oln/oln_id_map.json',
img_prefix='data/lvis_v1',
pipeline=test_pipeline)
)
evaluation = dict(interval=1, metric='bbox')
Loading

0 comments on commit 88d7804

Please sign in to comment.