-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
where is 2M audioset data and pretrain_audioset2M.sh? #21
Comments
Same issue...... I checked the website and also only found the features instead of the original waveforms. How should we get the raw data or the raw data is not released at all? |
My stupid solution is: |
You can also use the .wav data which is provided by Huggingface: https://huggingface.co/datasets/confit/audioset-full or Baidu: https://pan.baidu.com/s/13WnzI1XDSvqXZQTS-Kqujg, password: 0vc2 (source: https://github.com/qiuqiangkong/audioset_tagging_cnn). In the Hugginface dataset (eval) there is one broken file: ID YmW3... (if i remember right) delete this one it can cause headach :D must look like this:
The label mapping for the wav-files/data can be done with the https://github.com/audioset/ontology and the provided CSV files (balanced_train_segments.csv, etc.) given on Audioset website: https://research.google.com/audioset/download.html Good Luck |
Thank you meta for your hard work on the audioMAE implementation.
I want to train with 2M data, but in fact, audioset is only releasing features, so I couldn't get the data. I was finally able to get 20k data from another website. Where do I download the 2M data and I can't find pretrain_audioset2M.sh. Check please.
The text was updated successfully, but these errors were encountered: