DART

An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training

Paper | Dataset

Overview

This repository contains the implementation of DART, an automated end-to-end object detection pipeline featuring:

Data Diversification based on DreamBooth with Stable Diffusion XL
Open-vocabulary bounding box Annotation via GroundingDINO
LMM-based Review of pseudo-labels and image photorealism using InternVL-1.5 and GPT-4o
Real-time object detector Training for YOLOv8 and YOLOv10

The current instantiation of DART significantly increases the average precision (AP) from 0.064 to 0.832 for a YOLOv8n model on the Liebherr Product dataset, demonstrating the effectiveness of our approach.

Liebherr Product Dataset

This repository contains a self-collected dataset of construction machines named Liebherr Product (LP), which contains over 15K high-quality images across 23 categories. This extensive collection focuses on a diverse range of construction machinery from Liebherr products, including articulated dump trucks, bulldozers, combined piling and drilling rigs, various types of cranes, excavators, loaders, and more. A list of all 23 classes can be found in classes.json. For detailed information on the data collection, curation, and preprocessing of this dataset, please check out our paper. The images can be downloaded and processed by following the instructions in this section.

Repository Structure

This repository contains the following folders and files, each serving a specific purpose:

`./diversification`

contains the code for training and inference of SDXL with dreambooth, as well as generated class_data and collected instance_data.

`./figures`

contains figures used in the repo.

`./Liebherr_Product`

the dataset folder. images should be downloaded separately (following instructions in this section). This folder also includes lists and statistics of pseudo labels, metadata containing useful information extracted during dasets preprocessing, responses from GPT-4-based reviews, questionnaire used for evaluating GPT-4's performance, and general tools for facilitating interaction with the dataset.

`./lmm`

contains code for two LMM-based review: GPT-4o-based pseudo-label review and image photorealism for generated data via InternVL-1-5.

`./ovd`

contains code for bounding box generation with Grounidng DINO and label processing.

`./vis`

contains figures used in the paper and their corresponding code.

`./yolo`

contains code and commands for data split, hyperparameter fine-tuning, training and prediction with yolov8.

Setup

Clone the repository:

git clone https://github.com/your-repo/dart.git

Create an Anaconda environment, e.g. named "dart":

conda create -n dart python=3.10
conda activate dart

Follow this link to install Grounding DINO.
Install other required dependencies:
```
pip install -r requirements.txt
```

Usage

Data preparation

Download the dataset via this link, and extract the images folder to ./Liebherr_Product/images/.
Collect instance data and store them in ./diversification/instance_data/{class_name}/{instance_name}, e.g. ./diversification/instance_data/articulated_dump_truck/TA230.
Change the default paths in the following scripts or specify as arguments while running.

Annotation and review for collected data

Annotate collected data with "orignal" and "co-occurring" prompt:
```
python ovd/labeling.py -p one
```
Annotate collected data with "synonym" and "co-occurring" prompt:
```
python ovd/labeling_sep.py -p one
```
Process labels:
```
python ovd/label_processing.py
```
Identify annotations that need to be processed by GPT-4o:
```
jupyter notebook Liebherr_Product/tools/check_anns.ipynb
```
Review pseudo-labels with GPT-4o:
```
python lmm/gpt4.py
```

Parse GPT-4o's responses:

jupyter notebook parse_gpt4_response.ipynb

Convert annotations to YOLO format:

jupyter notebook Liebherr_Product/tools/convert_to_yolo.ipynb

Split data into train/val/test sets:
```
jupyter notebook yolo/data_split.ipynb
```

Annotation and review for generated diversified data

Generate scripts for DreamBooth training of each instance:
```
jupyter notebook diversification/dreambooth/sdxl.ipynb
```

Run DreamBooth training scripts in bulk:

python diversification/dreambooth/run_command_bulk.py

Generate data using the trained DreamBooth model in bulk:

python diversification/dreambooth/data_generation_bulk.py

(Optionally) Generate data using the trained DreamBooth model for specific scenarios:
```
python diversification/dreambooth/data_generation_obj_partial_prompts.py
```

Convert images and create ID to name mapping:

jupyter notebook diversification/dreambooth/id_to_name.ipynb

Annotate generated data:
```
python ovd/labeling_gen.py
```
Review generated data with InternVL-Chat-V1-5:
```
python lmm/InternVL-Chat-V1-5_judge.py
```

Parse the responses:

jupyter notebook lmm/parse_lmm_response.ipynb

Process labels for generated data:
```
python ovd/label_processing_gen.py
```
(Optionally) Plot annotations:
```
jupyter notebook ovd/annotate_gen.ipynb
```

Process labels for manually diversified data in the original dataset:

python label_processing.py --label_dir labels_background --id_types b

Merge labels and stats of generate and original data:

jupyter notebook Liebherr_Product/tools/merge_labels_stats_dict.ipynb

Convert annotations to YOLO format:

jupyter notebook Liebherr_Product/tools/convert_to_yolo_gen.ipynb

Split all data into train/val/test sets:

jupyter notebook yolo/data_split_gen.ipynb

Training and fine-tuning

Create dataset configs according to experiments:

# Example: cfg/datasets/train.yaml
# Example: cfg/datasets/fine-tune.yaml

Fine-tune hyperparameters:

python yolo/raytune.py --cfg fine-tune.yaml

or

python yolo/tune.py --cfg fine-tune.yaml

Train and evaluate the model the with the best hyperparameter set:

yolo detect train data=cfg/datasets/train_gen_0.75.yaml model=yolov8n.pt epochs=60 imgsz=640 optimizer=AdamW lr0=2e-4 lrf=0.5 warmup_epochs=2 batch=64 cos_lr=True

Inference

predict based on trained models
```
jupyter notebook yolo/predict.ipynb
```

Results

Here are some sample results. Please check out our read our paper for more!

Object detection results with and without DART on test set images.

Visualization of data diversification and bounding box annotation

Images annotated by Grounding DINO and approved by GPT-4o

Citation

@article{xin2024dart,
  title={DART: An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training},
  author={Xin, Chen and Hartel, Andreas and Kasneci, Enkelejda},
  journal={Expert Systems with Applications},
  pages={125124},
  year={2024},
  publisher={Elsevier}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DART

Overview

Liebherr Product Dataset

Repository Structure

`./diversification`

`./figures`

`./Liebherr_Product`

`./lmm`

`./ovd`

`./vis`

`./yolo`

Setup

Usage

Data preparation

Annotation and review for collected data

Annotation and review for generated diversified data

Training and fine-tuning

Inference

Results

Object detection results with and without DART on test set images.

Visualization of data diversification and bounding box annotation

Images annotated by Grounding DINO and approved by GPT-4o

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DART

Overview

Liebherr Product Dataset

Repository Structure

./diversification

./figures

./Liebherr_Product

./lmm

./ovd

./vis

./yolo

Setup

Usage

Data preparation

Annotation and review for collected data

Annotation and review for generated diversified data

Training and fine-tuning

Inference

Results

Object detection results with and without DART on test set images.

Visualization of data diversification and bounding box annotation

Images annotated by Grounding DINO and approved by GPT-4o

Citation

`./diversification`

`./figures`

`./Liebherr_Product`

`./lmm`

`./ovd`

`./vis`

`./yolo`