Skip to content

chen-xin-94/DART

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DART

An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training

Paper | Dataset

Overview

This repository contains the implementation of DART, an automated end-to-end object detection pipeline featuring:

  • Data Diversification based on DreamBooth with Stable Diffusion XL
  • Open-vocabulary bounding box Annotation via GroundingDINO
  • LMM-based Review of pseudo-labels and image photorealism using InternVL-1.5 and GPT-4o
  • Real-time object detector Training for YOLOv8 and YOLOv10

The current instantiation of DART significantly increases the average precision (AP) from 0.064 to 0.832 for a YOLOv8n model on the Liebherr Product dataset, demonstrating the effectiveness of our approach.

DART

Liebherr Product Dataset

This repository contains a self-collected dataset of construction machines named Liebherr Product (LP), which contains over 15K high-quality images across 23 categories. This extensive collection focuses on a diverse range of construction machinery from Liebherr products, including articulated dump trucks, bulldozers, combined piling and drilling rigs, various types of cranes, excavators, loaders, and more. A list of all 23 classes can be found in classes.json. For detailed information on the data collection, curation, and preprocessing of this dataset, please check out our paper. The images can be downloaded and processed by following the instructions in this section.

Repository Structure

This repository contains the following folders and files, each serving a specific purpose:

./diversification

contains the code for training and inference of SDXL with dreambooth, as well as generated class_data and collected instance_data.

./figures

contains figures used in the repo.

./Liebherr_Product

the dataset folder. images should be downloaded separately (following instructions in this section). This folder also includes lists and statistics of pseudo labels, metadata containing useful information extracted during dasets preprocessing, responses from GPT-4-based reviews, questionnaire used for evaluating GPT-4's performance, and general tools for facilitating interaction with the dataset.

./lmm

contains code for two LMM-based review: GPT-4o-based pseudo-label review and image photorealism for generated data via InternVL-1-5.

./ovd

contains code for bounding box generation with Grounidng DINO and label processing.

./vis

contains figures used in the paper and their corresponding code.

./yolo

contains code and commands for data split, hyperparameter fine-tuning, training and prediction with yolov8.

Setup

  1. Clone the repository:

    git clone https://github.com/your-repo/dart.git
  2. Create an Anaconda environment, e.g. named "dart":

    conda create -n dart python=3.10
    conda activate dart
  3. Follow this link to install Grounding DINO.

  4. Install other required dependencies:

    pip install -r requirements.txt

Usage

Data preparation

  1. Download the dataset via this link, and extract the images folder to ./Liebherr_Product/images/.
  2. Collect instance data and store them in ./diversification/instance_data/{class_name}/{instance_name}, e.g. ./diversification/instance_data/articulated_dump_truck/TA230.
  3. Change the default paths in the following scripts or specify as arguments while running.

Annotation and review for collected data

  1. Annotate collected data with "orignal" and "co-occurring" prompt:

    python ovd/labeling.py -p one
  2. Annotate collected data with "synonym" and "co-occurring" prompt:

    python ovd/labeling_sep.py -p one
  3. Process labels:

    python ovd/label_processing.py
  4. Identify annotations that need to be processed by GPT-4o:

    jupyter notebook Liebherr_Product/tools/check_anns.ipynb
  5. Review pseudo-labels with GPT-4o:

    python lmm/gpt4.py
  6. Parse GPT-4o's responses:

    jupyter notebook parse_gpt4_response.ipynb
  7. Convert annotations to YOLO format:

    jupyter notebook Liebherr_Product/tools/convert_to_yolo.ipynb
  8. Split data into train/val/test sets:

    jupyter notebook yolo/data_split.ipynb

Annotation and review for generated diversified data

  1. Generate scripts for DreamBooth training of each instance:

    jupyter notebook diversification/dreambooth/sdxl.ipynb
  2. Run DreamBooth training scripts in bulk:

    python diversification/dreambooth/run_command_bulk.py
  3. Generate data using the trained DreamBooth model in bulk:

    python diversification/dreambooth/data_generation_bulk.py
  4. (Optionally) Generate data using the trained DreamBooth model for specific scenarios:

    python diversification/dreambooth/data_generation_obj_partial_prompts.py
  5. Convert images and create ID to name mapping:

    jupyter notebook diversification/dreambooth/id_to_name.ipynb
  6. Annotate generated data:

    python ovd/labeling_gen.py
  7. Review generated data with InternVL-Chat-V1-5:

    python lmm/InternVL-Chat-V1-5_judge.py
  8. Parse the responses:

    jupyter notebook lmm/parse_lmm_response.ipynb
  9. Process labels for generated data:

    python ovd/label_processing_gen.py
  10. (Optionally) Plot annotations:

    jupyter notebook ovd/annotate_gen.ipynb
  11. Process labels for manually diversified data in the original dataset:

    python label_processing.py --label_dir labels_background --id_types b
  12. Merge labels and stats of generate and original data:

    jupyter notebook Liebherr_Product/tools/merge_labels_stats_dict.ipynb
  13. Convert annotations to YOLO format:

    jupyter notebook Liebherr_Product/tools/convert_to_yolo_gen.ipynb
  14. Split all data into train/val/test sets:

    jupyter notebook yolo/data_split_gen.ipynb

Training and fine-tuning

  1. Create dataset configs according to experiments:

    # Example: cfg/datasets/train.yaml
    # Example: cfg/datasets/fine-tune.yaml
  2. Fine-tune hyperparameters:

    python yolo/raytune.py --cfg fine-tune.yaml

    or

    python yolo/tune.py --cfg fine-tune.yaml
  3. Train and evaluate the model the with the best hyperparameter set:

    yolo detect train data=cfg/datasets/train_gen_0.75.yaml model=yolov8n.pt epochs=60 imgsz=640 optimizer=AdamW lr0=2e-4 lrf=0.5 warmup_epochs=2 batch=64 cos_lr=True

Inference

  1. predict based on trained models
    jupyter notebook yolo/predict.ipynb

Results

Here are some sample results. Please check out our read our paper for more!

Object detection results with and without DART on test set images.

with_or_without_DART_1 with_or_without_DART_2

Visualization of data diversification and bounding box annotation

approved_annotated_generated_images_app

Images annotated by Grounding DINO and approved by GPT-4o

image_grid_orig_aa

Citation

@article{xin2024dart,
  title={DART: An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training},
  author={Xin, Chen and Hartel, Andreas and Kasneci, Enkelejda},
  journal={Expert Systems with Applications},
  pages={125124},
  year={2024},
  publisher={Elsevier}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published