The official resitory for 8th NVIDIA AI City Challenge (Track4: Road Object Detection in Fish-Eye Cameras) from team Netspresso (Nota Inc.).
We use Co-DETR for detection baseline repository.
# git clone this repository
git clone https://github.com/nota-github/AIC2024_Track4_Nota.git
cd AIC2024_Track4_Nota
# Build a docker container
docker build -t aic2024_track4_nota .
docker run --name aic2024_track4_nota_0 --gpus '"device=0,1,2,3,4,5,6,7"' --shm-size=8g -it -v path_to_local_repository/:/AIC2024_Track4_Nota aic2024_track4_nota
# Install Co-DETR dependencies
cd Co-DETR
pip install -v -e .
pip install fvcore einops albumentations ensemble_boxes
cd sahi
pip install -e ."[dev]" # Refer to the original repo
Download training and challenge dataset.
- Fisheye8K dataset into Co-DETR/data/aicity_images
- AI CITY test set into Co-DETR/data/aicity_images
AIC2024_Track4_Nota
|── Co-DETR
|── data
|── aicity_images
|── aicity_images_sr # 2.0x upscaled AICITY test images(using SR)
|── Fisheye8K
|── test
| |── images
| |── test_lvis.json
| |── test.json
|── train
|── images
|── train_lvis.json
|── train_sr.json
|── train.json
...
- We use a semi-supervision dataset(background labels from the LVIS dataset) and an upscaled SR dataset. Each JSON file can be downloaded from drive.
Download checkpoints from the googledrive.
- ViT-l backbone download
# Co-DINO (ViT-L) + SAHI
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye.py \
co_dino_5scale_vit_large_fisheye.pth \
--out-file output \
--device cuda:0 \
--dataset fisheye8k
# Co-DINO (ViT-L) + SAHI + histogram equalization
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye.py \
co_dino_5scale_vit_large_fisheye.pth \
--out-file output \
--device cuda:0 \
--dataset fisheye8k \
--use_hist_equal \
--score-thr 0.3
- ViT-l backbone + basic augmentation download
# Co-DINO (ViT-L) + basic augmentation + SAHI
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye_basic_aug.py \
co_dino_5scale_vit_large_fisheye_basic_aug.pth \
--out-file output \
--device cuda:0 \
--dataset fisheye8k
# Co-DINO (ViT-L) + basic augmentation + SAHI + histogram equalization
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye_basic_aug.py \
co_dino_5scale_vit_large_fisheye_basic_aug.pth \
--out-file output \
--device cuda:0 \
--dataset fisheye8k \
--use_hist_equal
- ViT-l backbone + rotation augmentation download
# Co-DINO (ViT-L) + image rotation + SAHI
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye_rotate.py \
co_dino_5scale_vit_large_fisheye_rotate.pth \
--out-file output_rotate \
--device cuda:0 \
--dataset fisheye8k \
--score-thr 0.6
# Co-DINO (ViT-L) + image rotation + SAHI + histogram equalization
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye_rotate.py \
co_dino_5scale_vit_large_fisheye_rotate.pth \
--out-file output_rotate \
--device cuda:0 \
--dataset fisheye8k \
--use_hist_equal \
--score-thr 0.6
- Swin backbone + semi-supervision download
# Co-DINO (Swin-L) + image rotation + semi-supervision + SAHI
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_swin_fisheye8k_lvis_add_ann.py \
co_dino_swin_fisheye8k_lvis_add_ann.pth \
--out-file output_lvis \
--device cuda:0 \
--dataset fisheye8klvis
# Co-DINO (Swin-L) + image rotation + semi-supervision + SAHI + histogram equalization
python demo/submit_demo_sahi.py data/aicity_images \
projects/configs/AIC24/co_dino_swin_fisheye8k_lvis_add_ann.py \
co_dino_swin_fisheye8k_lvis_add_ann.pth \
--out-file output_lvis \
--device cuda:0 \
--dataset fisheye8klvis \
--use_hist_equal
- ViT-l backbone + SR download
# Co-DINO (ViT-L) + SR + SAHI
python demo/submit_demo_sahi.py data/aicity_images_sr \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye_sr.py \
co_dino_5scale_vit_large_fisheye_sr.pth \
--out-file output_sr \
--device cuda:0 \
--dataset fisheye8k \
--use_super_resolution True \
--aicity_test_images_dir data/aicity_images
- We use weighted boxes fusion(WBF) for ensemble. And We ensembled a total of 9 output json files.
- Co-DINO (ViT-L) + SAHI
- Co-DINO (ViT-L) + basic augmentation + SAHI
- Co-DINO (ViT-L) + image rotation + SAHI
- Co-DINO (Swin-L) + image rotation + semi-supervision + SAHI
- Co-DINO (ViT-L) + SAHI + histogram equalization
- Co-DINO (ViT-L) + basic augmentation + SAHI + histogram equalization
- Co-DINO (ViT-L) + image rotation + SAHI + histogram equalization
- Co-DINO (Swin-L) + image rotation + semi-supervision + SAHI + histogram equalization
- Co-DINO (ViT-L) + SR + SAHI
python ensemble.py
--test_dataset_path data/aicity_images \
--target_json_dir path_to_json_dir # The path of the dir containing the above 9 output json files is
--out_name ensemble.json \
--iou_thr 0.4 \
--score_thr 0.4
- Prepare pre-trained checkpoint from original Co-DETR repository.
- For Co-DETR with ViT-Large checkpoint, refer to this
- Modify the config file and enter the appropriate dataset path (refer to MMDetection's official instructions).
bash tools/dist_train.sh \
${CONFIG_FILE} \
${GPU_NUM} \
[optional arguments]
# Example
bash tools/dist_train.sh \
projects/configs/AIC24/co_dino_5scale_vit_large_fisheye.py \
8 \
work_dirs/vit_l
- We utilize super-resolution (SR) technique to obtain high-resolution images for training and testing by using pre-trained StableSR model.
- Configure the environment by referring to installation guide in the StableSR repository.
- Upscale the images of the Fisheye8K dataset by using the provided pre-trained model.
python scripts/sr_val_ddim_text_T_negativeprompt_canvas_tile.py \ --config configs/stableSRNew/v2-finetune_text_T_768v.yaml \ --ckpt stablesr_768v_000139.ckpt \ # Change the model if you need --vqgan_ckpt vqgan_cfw_00011.ckpt \ --init-img /home/data/fisheye_train \ # Dataset image path --outdir ./outputs_fisheye_train/ \ --ddim_steps 10 \ --dec_w 0.5 \ --colorfix_type wavelet \ --scale 7.0 \ --use_negative_prompt \ --upscale 1.5 \ --seed 42 \ --n_samples 1 \ --input_size 768 \ --tile_overlap 48 \ --ddim_eta 1.0 \ --fold 0
The model published in this repository was developed by combining several modules (e.g., object detector, super resolution model). Commercial use of any modifications, additions, or newly trained parameters made to combine these modules is not allowed. However, commercial use of the unmodified modules is allowed under their respective licenses. If you wish to use the individual modules commercially, you may refer to their original repositories and licenses provided below.