Skip to content

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Notifications You must be signed in to change notification settings

Sara-Ahmed/ASiT

Repository files navigation

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation.

The finetuning strategy is adopted from AST

Self-supervised pre-training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main_ASiT.py --batch_size 20 --epochs 100 --data_path 'path/to/audio/files' --data-train 'path/to/json/file'

Self-supervised pre-trained models using ASiT can be downloaded from here

Data Preparation

We mainly employed AudioSet for ASiT pre-training which contains YouTube videos. Please follow link to download and process AudioSet data.

If you use this code for a paper, please cite:

@article{atito2022asit,

  title={ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation},
  
  author={Atito, Sara and Awais, Muhammad and Wang, Wenwu and Plumbley, Mark D and Kittler, Josef},
  
  journal={arXiv preprint arXiv:2211.13189},
  
  year={2022}
  
}

About

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages