An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training
This repository contains the implementation of DART, an automated end-to-end object detection pipeline featuring:
- Data Diversification based on DreamBooth with Stable Diffusion XL
- Open-vocabulary bounding box Annotation via GroundingDINO
- LMM-based Review of pseudo-labels and image photorealism using InternVL-1.5 and GPT-4o
- Real-time object detector Training for YOLOv8 and YOLOv10
The current instantiation of DART significantly increases the average precision (AP) from 0.064 to 0.832 for a YOLOv8n model on the Liebherr Product dataset, demonstrating the effectiveness of our approach.
This repository contains a self-collected dataset of construction machines named Liebherr Product (LP), which contains over 15K high-quality images across 23 categories. This extensive collection focuses on a diverse range of construction machinery from Liebherr products, including articulated dump trucks, bulldozers, combined piling and drilling rigs, various types of cranes, excavators, loaders, and more. A list of all 23 classes can be found in classes.json. For detailed information on the data collection, curation, and preprocessing of this dataset, please check out our paper. The images can be downloaded and processed by following the instructions in this section.
This repository contains the following folders and files, each serving a specific purpose:
contains the code for training and inference of SDXL with dreambooth
, as well as generated class_data
and collected instance_data
.
contains figures used in the repo.
the dataset folder. images
should be downloaded separately (following instructions in this section). This folder also includes lists and statistics of pseudo labels
, metadata
containing useful information extracted during dasets preprocessing, responses from GPT-4-based reviews
, questionnaire
used for evaluating GPT-4's performance, and general tools
for facilitating interaction with the dataset.
contains code for two LMM-based review: GPT-4o-based pseudo-label review and image photorealism for generated data via InternVL-1-5.
contains code for bounding box generation with Grounidng DINO and label processing.
contains figures used in the paper and their corresponding code.
contains code and commands for data split, hyperparameter fine-tuning, training and prediction with yolov8.
-
Clone the repository:
git clone https://github.com/your-repo/dart.git
-
Create an Anaconda environment, e.g. named "dart":
conda create -n dart python=3.10 conda activate dart
-
Follow this link to install Grounding DINO.
-
Install other required dependencies:
pip install -r requirements.txt
- Download the dataset via this link, and extract the
images
folder to./Liebherr_Product/images/
. - Collect instance data and store them in
./diversification/instance_data/{class_name}/{instance_name}
, e.g../diversification/instance_data/articulated_dump_truck/TA230
. - Change the default paths in the following scripts or specify as arguments while running.
-
Annotate collected data with "orignal" and "co-occurring" prompt:
python ovd/labeling.py -p one
-
Annotate collected data with "synonym" and "co-occurring" prompt:
python ovd/labeling_sep.py -p one
-
Process labels:
python ovd/label_processing.py
-
Identify annotations that need to be processed by GPT-4o:
jupyter notebook Liebherr_Product/tools/check_anns.ipynb
-
Review pseudo-labels with GPT-4o:
python lmm/gpt4.py
-
Parse GPT-4o's responses:
jupyter notebook parse_gpt4_response.ipynb
-
Convert annotations to YOLO format:
jupyter notebook Liebherr_Product/tools/convert_to_yolo.ipynb
-
Split data into train/val/test sets:
jupyter notebook yolo/data_split.ipynb
-
Generate scripts for DreamBooth training of each instance:
jupyter notebook diversification/dreambooth/sdxl.ipynb
-
Run DreamBooth training scripts in bulk:
python diversification/dreambooth/run_command_bulk.py
-
Generate data using the trained DreamBooth model in bulk:
python diversification/dreambooth/data_generation_bulk.py
-
(Optionally) Generate data using the trained DreamBooth model for specific scenarios:
python diversification/dreambooth/data_generation_obj_partial_prompts.py
-
Convert images and create ID to name mapping:
jupyter notebook diversification/dreambooth/id_to_name.ipynb
-
Annotate generated data:
python ovd/labeling_gen.py
-
Review generated data with InternVL-Chat-V1-5:
python lmm/InternVL-Chat-V1-5_judge.py
-
Parse the responses:
jupyter notebook lmm/parse_lmm_response.ipynb
-
Process labels for generated data:
python ovd/label_processing_gen.py
-
(Optionally) Plot annotations:
jupyter notebook ovd/annotate_gen.ipynb
-
Process labels for manually diversified data in the original dataset:
python label_processing.py --label_dir labels_background --id_types b
-
Merge labels and stats of generate and original data:
jupyter notebook Liebherr_Product/tools/merge_labels_stats_dict.ipynb
-
Convert annotations to YOLO format:
jupyter notebook Liebherr_Product/tools/convert_to_yolo_gen.ipynb
-
Split all data into train/val/test sets:
jupyter notebook yolo/data_split_gen.ipynb
-
Create dataset configs according to experiments:
# Example: cfg/datasets/train.yaml # Example: cfg/datasets/fine-tune.yaml
-
Fine-tune hyperparameters:
python yolo/raytune.py --cfg fine-tune.yaml
or
python yolo/tune.py --cfg fine-tune.yaml
-
Train and evaluate the model the with the best hyperparameter set:
yolo detect train data=cfg/datasets/train_gen_0.75.yaml model=yolov8n.pt epochs=60 imgsz=640 optimizer=AdamW lr0=2e-4 lrf=0.5 warmup_epochs=2 batch=64 cos_lr=True
- predict based on trained models
jupyter notebook yolo/predict.ipynb
Here are some sample results. Please check out our read our paper for more!
@article{xin2024dart,
title={DART: An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training},
author={Xin, Chen and Hartel, Andreas and Kasneci, Enkelejda},
journal={Expert Systems with Applications},
pages={125124},
year={2024},
publisher={Elsevier}
}