ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

paper

Abstract

Driver distraction detection is an important computer vision problem that can play a crucial role in enhancing traffic safety and reducing traffic accidents. This paper proposes a novel semi-supervised method for detecting driver distractions based on Vision Transformer (ViT). Specifically, a multi-modal Vision Transformer (ViT-DD) is developed that makes use of inductive information contained in training signals of distraction detection as well as driver emotion recognition. Further, a self-learning algorithm is designed to include driver data without emotion labels into the multi-task training of ViT-DD. Extensive experiments conducted on the SFDDD and AUCDD datasets demonstrate that the proposed ViT-DD outperforms the best state-of-the-art approaches for driver distraction detection by 6.5% and 0.9%, respectively.

Results

Experiments	Accuracy	NLL	Checkpoints
AUCDD	0.9359	0.2399	link
SFDDD split-by-driver	0.9251	0.3900	link
SFDDD split-by-image	0.9963	0.0171	link

Usage

Prerequisites

The code is built with following libraries:

Python >= 3.8
PyTorch
Lightning
timm
seaborn

Data Preparation

Please organize the data using the directory structures listed below:

data_root
|-- AUCDD
    |-- v2
        |-- cam1
            |-- test
            |-- train
              |-- c0
              |-- ...
              |-- c9
                |-- 188.jpg
                |-- ...
|-- SFDDD 
    |-- imgs
        |-- train
          |-- c0
          |-- ...
          |-- c9
            |-- img_19.jpg
            |-- ...

pseudo_label_path
|-- AUCDD
  |-- emo_list.csv
  |-- imgs
      |-- c0
      |-- ...
      |-- c9
          |-- 0_face.jpg
          |-- ...
|-- SFDDD
  |-- emo_list.csv
  |-- imgs
      |-- img_5_face.jpg
      |-- ...

We provide our generated pseudo emotion labels as well as cropped images of drivers' faces for the AUCDD and SFDDD datasets here.

Citation

If you find ViT-DD beneficial or relevant to your research, please kindly recognize our efforts by citing our paper:

@article{Ma2022MultiTaskVT,
  title={Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection},
  author={Yunsheng Ma and Ziran Wang},
  journal={arXiv},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
annotations		annotations
configs		configs
datasets		datasets
lib		lib
models		models
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

paper

Abstract

Results

Usage

Prerequisites

Data Preparation

Citation

About

Releases

Packages

Languages

License

PurdueDigitalTwin/ViT-DD

Folders and files

Latest commit

History

Repository files navigation

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection

paper

Abstract

Results

Usage

Prerequisites

Data Preparation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages