Skip to content

zhao-chunyu/SaliencyMamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo
We collect datasets and other popular model codes.
We will give a series of instructions and demo files.
We promise to release all code after the paper is accepted.
BDDA-1 BDDA-2 BDDA-2

🔥Update

  • 2024/11/08: update supplementary materials. Details

    • We release all the runnable code. 🎉🎊
    • Comparison of Runtime and GPU memory.

      Better than the lightest network in the paper (Deng et al. 2019).

    • Driver attention shift cases. (+15 cases)
    • Performance of different resolution for $SalM^2$. ($256^2$, $512^2$)
Dataset Image size AUC_B↑ AUC_J↑ NSS↑ CC↑ SIM↑ KLD↓ FLOPs↓
TrafficGaze
(📆2024.11.08)
3×256×256 0.92 0.98 5.90 0.94 0.78 0.28 4.45
3×512×512 0.92 0.98 6.04 0.95 0.80 0.26 4.72
DrFixD-rainy
(📆2024.11.10)
3×256×256 0.89 0.95 4.31 0.86 0.68 0.47 4.45
3×512×512 0.90 0.96 4.26 0.86 0.69 0.45 4.72
BDDA
(📆2024.11.12)
3×256×256 - - - 0.64 0.47 1.08 4.45
3×512×512 - - - 0.64 0.47 1.09 4.72
  • 2024/10/23: We release the uniform saliency dataset loader. You can simply use it by from utils.datasets import build_dataset.

  • 2024/07/24: All the code and models are completed.

  • 2024/07/05: We collect the possible datasets to use, and make a uniform dataloader.

  • 2024/06/14: 🤩Our model is proposed !

💬Motivation

(1) Using semantic information to guide driver attention.

Solution: We propose a dual-branch network that separately extracts semantic information and image information. The semantic information is used to guide the image information at the deepest level of image feature extraction.

(2) Reducing model parameters and computational complexity.

Solution: We develop a highly lightweight saliency prediction network based on the latest Mamba framework, with only 0.0785M (88% reduction compared to SOTA) parameters and 4.45G FLOPs (37% reduction compared to SOTA).

⚡Proposed Model

we propose a saliency mamba model, named $SalM^2$ that uses "Top-down" driving scene semantic information to guide "Bottom-up" driving scene image information to simulate human drivers' attention allocation.

📖Datasets

Name Train (video/frame) Valid (video/frame) Test (video/frame) Dataset example
TrafficGaze 49080 6655 19135 BDDA-3
DrFixD-rainy 52291 9816 19154 BDDA-1
BDDA 286251 63036 93260 BDDA-0
【note】 For all datasets we will provide our download link with the official link. Please choose according to your needs.

(1) TrafficGaze: This dataset we uploaded in link. We crop 5 frames before and after each video. Official web in link.

(2) DrFixD-rainy: This dataset we uploaded in link. We crop 5 frames before and after each video. Official web in link.

(3) BDDA: This dataset we uploaded in link. Some camera videos and gazemap videos frame rate inconsistency, we have matched and cropped them. Some camera videos do not correspond to gazemap videos, we have filtered them. Official web in link.

TrafficGaze DrFixD-rainy BDDA
./TrafficGaze
  |——fixdata
  |  |——fixdata1.mat
  |  |——fixdata2.mat
  |  |—— ... ...
  |  |——fixdata16.mat
  |——trafficframe
  |  |——01
  |  |  |——000001.jpg
  |  |  |—— ... ...
  |  |——02
  |  |—— ... ...
  |  |——16
  |——test.json
  |——train.json
  |——valid.json
./DrFixD-rainy
  |——fixdata
  |  |——fixdata1.mat
  |  |——fixdata2.mat
  |  |—— ... ...
  |  |——fixdata16.mat
  |——trafficframe
  |  |——01
  |  |  |——000001.jpg
  |  |  |—— ... ...
  |  |——02
  |  |—— ... ...
  |  |——16
  |——test.json
  |——train.json
  |——valid.json
./BDDA
  |——camera_frames
  |  |——0001
  |  |  |——0001.jpg
  |  |  |—— ... ...
  |  |——0002
  |  |—— ... ...
  |  |——2017
  |——gazemap_frames
  |  |——0001
  |  |  |——0001.jpg
  |  |  |—— ... ...
  |  |——0002
  |  |—— ... ...
  |  |——2017
  |——test.json
  |——train.json
  |——valid.json

🛠️ Deployment 🔁

Run train

​ 👉If you wish to train with our model, please use the proceeding steps below.

  1. Train our model. You can use --category to switch datasets, which include TrafficGaze, DrFixD-rainy, BDDA. --b sets batch size, --g sets id of cuda.
python train.py --network salmm --b 32 --g 0 --category xxx --root xxx

​ 2. Train compare model. If the model is a static prediction method, run the following command.

python train.py --network xxx --b 32 --g 1 --category xxx --root xxx

​ 3. Train compare model. If the model is a dynamic prediction method, set --seq_len and run the following command.

python train.py --network xxx --b 32 --seq_len 6 --g 2 --category xxx --root xxx

Run test

​ 👉If you wish to make predictions directly using our model results, you can do so using the proceeding steps.

​ 1. Test our model.

​ (a) You need to download our trained model file in link and put it to the specified folder path.

​ (b) You should use --category to switch datasets, which include TrafficGaze, DrFixD-rainy, BDDA. Run the following command.

python evaluate-metrics.py --network salmm --b 1 --g 0 --category xxx --root xxx --test_weight xxx

​ 👉If you are unable to adapt your environment for other reasons, you can also download our predictions directly.

​ 2. Download prediction results.

$SalM^2$ for TrafficGaze $SalM^2$ for DrFixD-rainy $SalM^2$ for BDDA
The prediction results link The prediction results link The prediction results link

🚀 Live Demo 🔁

BDDA-1 BDDA-2 BDDA-3

✨ Downstream Tasks

Some interesting downstream tasks are shown here, and our work will be of significant research interest.

  • Saliency object detection: saliency mapGuideobject detection
B
  • Event recognition: saliency mapGuideevent recognition
B
  • Other downstream tasks......

⭐️ Cite 🔁

If you find this repository useful, please use the following BibTeX entry for citation.

wait accepted

About

Saliency prediction of Traffic scene

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages