-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 88d7804
Showing
231 changed files
with
35,196 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
data | ||
CLIP/ | ||
lvis-api/ | ||
workdirs | ||
.history | ||
.vscode | ||
.idea | ||
.DS_Store | ||
|
||
# custom | ||
*.pkl | ||
*.pkl.json | ||
*.log.json | ||
|
||
# Pytorch | ||
*.pth | ||
*.pt | ||
*.py~ | ||
*.sh~ | ||
changelog.py | ||
|
||
checkpoints | ||
analysis_results* | ||
outputs | ||
lightning_logs | ||
notebooks | ||
proposals |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
#### **Table of contents** | ||
1. [Introduction](#pytorch-implementation-of-lp-ovod-open-vocabulary-object-detection-by-linear-probing-wacv-2024) | ||
2. [Requirements](#requirements) | ||
3. [Preparation](#preparation) | ||
4. [Training and Testing](#training-and-testing) | ||
5. [Contacts](#contacts) | ||
|
||
|
||
# **PyTorch implementation of LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)** | ||
<a href="https://arxiv.org/abs/2310.17109"><img src="https://img.shields.io/badge/arxiv-2310.17109-red?style=for-the-badge"></a> | ||
|
||
Chau Pham, Truong Vu, Khoi Nguyen<br> | ||
**VinAI Research, Vietnam** | ||
|
||
> **Abstract:** | ||
Preparing training data for deep vision models is a labor-intensive task. To ad- | ||
dress this, generative models have emerged as an effective solution for generating | ||
synthetic data. While current generative models produce image-level category | ||
labels, we propose a novel method for generating pixel-level semantic segmen- | ||
tation labels using the text-to-image generative model Stable Diffusion (SD). By | ||
utilizing the text prompts, cross-attention, and self-attention of SD, we introduce | ||
three new techniques: class-prompt appending, class-prompt cross-attention, and | ||
self-attention exponentiation. These techniques enable us to generate segmentation | ||
maps corresponding to synthetic images. These maps serve as pseudo-labels for | ||
training semantic segmenters, eliminating the need for labor-intensive pixel-wise | ||
annotation. To account for the imperfections in our pseudo-labels, we incorporate | ||
uncertainty regions into the segmentation, allowing us to disregard loss from those | ||
regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, | ||
and our approach significantly outperforms concurrent work. | ||
|
||
![teaser.png](./assets/approach_official.png) | ||
Details of the model architecture and experimental results can be found in [our following paper](https://arxiv.org/abs/2310.17109).<br> | ||
Please **CITE** our paper whenever this repository is used to help produce published results or incorporated into other software. | ||
```bibtex | ||
@inproceedings{pham2024lp, | ||
title={LP-OVOD: Open-Vocabulary Object Detection by Linear Probing}, | ||
author={Pham, Chau and Vu, Truong and Nguyen, Khoi}, | ||
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision}, | ||
pages={779--788}, | ||
year={2024} | ||
} | ||
``` | ||
|
||
## Requirements | ||
+ python3.8 | ||
+ pytorch 1.7.0 | ||
+ cuda 11.0 | ||
+ OpenAI CLIP | ||
+ faiss | ||
|
||
``` | ||
pip install -r requirements/build.txt | ||
pip install -e . | ||
pip install git+https://github.com/openai/CLIP.git | ||
pip uninstall pycocotools -y | ||
pip uninstall mmpycocotools -y | ||
pip install mmpycocotools | ||
pip install git+https://github.com/lvis-dataset/lvis-api.git | ||
pip install mmcv-full==1.2.5 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html | ||
conda install -c pytorch faiss-gpu | ||
``` | ||
|
||
## **Preparation** | ||
### Data | ||
Download the following dataset [COCO](https://cocodataset.org/#home). | ||
|
||
All models use the backbone pretrained with [SoCo](https://github.com/hologerry/SoCo). Download the [pretrained backbone](https://drive.google.com/file/d/1z6Tb2MPFJDv9qpEyn_J0cJcXOguKTiL0/view) and save to the folder `weights`. Also save the pretrained CLIP model to `weights`. | ||
|
||
### Code structure | ||
``` | ||
├── configs | ||
├── mmdet | ||
├── weights | ||
├── tools | ||
├── prepare | ||
├── retrieval | ||
├── scripts | ||
├── ovd_coco_text_embedding.pth | ||
├── data | ||
│ ├── coco | ||
│ │ ├── annotations | ||
│ │ ├── train2017 | ||
│ │ ├── val2017 | ||
``` | ||
|
||
### Extract the CLIP text embeddings for COCO classes (Optional) | ||
``` | ||
python ./prepare/clip_utils.py | ||
``` | ||
A file `ovd_coco_text_embedding.pth` will be created (we have already extracted this for you). | ||
|
||
### Download OLN proposals | ||
Download OLN proposals from [this link](#) | ||
|
||
### Extract the CLIP visual embeddings on pre-computed proposals | ||
This embeddings will be used for computing the Knowledge Distillation loss and retrieving novel proposals | ||
``` | ||
python -m torch.distributed.launch --nproc_per_node=4 prepare/extract_coco_embeddings_clip.py \ | ||
--data_root=path_to_data_root \ | ||
--clip_root=weights \ | ||
--proposal_file=path_to_oln_proposals \ | ||
--num_worker=48 \ | ||
--batch_size=128 \ | ||
--split=train \ | ||
--save_path=coco_clip_emb_train.pth \ | ||
``` | ||
Change `num_workers` and `batch_size` according to your machine. | ||
A file `coco_clip_emb_train.pth` (which is over 100GB) will be created, so please check for enough disk space before extracting. | ||
|
||
## Training and Testing | ||
### Pretraining for Base Classes | ||
``` | ||
sh ./scripts/vild_sigmoid.sh | ||
``` | ||
|
||
### Few-shot Fine-tuning for Novel Classes | ||
``` | ||
sh ./scripts/vild_sigmoid_ft.sh | ||
``` | ||
|
||
### Test the model on Both Base and Novel Classes | ||
``` | ||
sh ./scripts/vild_sigmoid_test.sh | ||
``` | ||
|
||
## **Contacts** | ||
If you have any questions about this project, contact via truongvu0911nd@gmail.com or open an issue in this repository |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco/' | ||
# img_norm_cfg = dict( | ||
# mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
|
||
img_norm_cfg = dict( | ||
mean=[122.770935, 116.74601 , 104.093735], std=[68.500534, 66.63216 , 70.323166], to_rgb=True) | ||
|
||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadProposals', num_max_proposals=None), | ||
dict(type='LoadAnnotations', with_bbox=True), | ||
# dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), | ||
dict( | ||
type='Resize', | ||
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736), | ||
(1333, 768), (1333, 800)], | ||
multiscale_mode='value', | ||
keep_ratio=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']), | ||
] | ||
|
||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadProposals', num_max_proposals=None), | ||
# dict(type='LoadAnnotations', with_bbox=True), | ||
dict( | ||
type='MultiScaleFlipAug', | ||
img_scale=(1333, 800), | ||
flip=False, | ||
transforms=[ | ||
dict(type='Resize', keep_ratio=True), | ||
dict(type='RandomFlip'), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='ImageToTensor', keys=['img']), | ||
dict(type='Collect', keys=['img', 'proposals', 'objectness']), | ||
]) | ||
] | ||
|
||
data = dict( | ||
samples_per_gpu=1, | ||
workers_per_gpu=1, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_b.json', | ||
proposal_file=data_root + 'proposals/instances_val2017_proposals.pkl', | ||
img_prefix=data_root + 'val2017/', | ||
proposal_id_map=data_root + 'annotations/val_proposal_id_map.json', | ||
pipeline=train_pipeline), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json', | ||
proposal_file='checkpoints/predictions/rpn_rgb.pkl', | ||
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json', | ||
proposal_file='checkpoints/predictions/rpn_rgb.pkl', | ||
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test_oln=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_all.json', | ||
proposal_file='data/coco/proposals/oln_proposals.pkl', | ||
proposal_id_map='data/coco/annotations/oln/oln_id_map.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test_novel=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_t.json', | ||
proposal_file='checkpoints/predictions/rpn_rgb.pkl', | ||
proposal_id_map='data/coco/annotations/val_proposal_id_map_ovd_gt.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test_novel_oln=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/ovd_ins_val2017_t.json', | ||
proposal_file='data/coco/proposals/oln_proposals.pkl', | ||
proposal_id_map='data/coco/annotations/oln/oln_id_map.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test_lvis=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'data/lvis_v1/annotations/lvis_v1_val.json', | ||
proposal_file='data/coco/proposals/oln_proposals.pkl', | ||
proposal_id_map='data/coco/annotations/oln/oln_id_map.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test_lvis_oln=dict( | ||
type=dataset_type, | ||
ann_file='data/lvis_v1/annotations/lvis_v1_val.json', | ||
proposal_file='data/coco/proposals/oln_proposals.pkl', | ||
proposal_id_map='data/coco/annotations/oln/oln_id_map.json', | ||
img_prefix='data/lvis_v1', | ||
pipeline=test_pipeline) | ||
) | ||
evaluation = dict(interval=1, metric='bbox') |
Oops, something went wrong.