This is a pytorch code repository accompanying the following paper:
Christof Weiß and Geoffroy Peeters
Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings
IEEE/ACM Transactions on Audio, Speech & Language Processing, 2022
https://ieeexplore.ieee.org/document/9865174
This repository only contains exemplary code and pre-trained models for most of the paper's experiments as well as some individual examples. All datasets used in the paper are publicly available (at least partially):
- MusicNet
- Schubert Winterreise Dataset (SWD)
- TRIOS
- Bach10
- PHENICX-Anechoic
- Choral Singing Dataset
- RWC Classical
For details and references, please see the paper.
In addition, we provide information on version duplicates in MusicNet (MusicNet_stats.md) and detailed information on the different training-test splits used in our experiments (as JSON and Markdown files in folder dataset_splits).
In this top folder, two Jupyter notebooks (01_precompute_features and 02_predict_with_pretrained_model) demonstrate how to preprocess audio files for running our models and how to load a pretrained model for predicting pitches.
In the experiments folder, all experimental scripts as well as the log files (subfolder logs) and the filewise results (subfolder results_filewise) can be found. The folder models_pretrained contains pre-trained models for the main experiments. The subfolder predictions contains exemplary model predictions for two of the experiments. Plese note that re-training requires a GPU as well as the pre-processed training data (see the notebook 01_precompute_features for an example). Any script must be started from the repository top folder path in order to get the relative paths working correctly.
The experiment files' names relate to the paper's results in the following way:
Experiments from Section IV.B (Table II / Fig. 4) - Model Architectures and Sizes. Suffix __ rerun denotes additional training/test runs of a model.
- CNN:XS exp126a_musicnet_cnn_basic
- CNN:S exp126b_musicnet_cnn_wide
- CNN:M exp126c_musicnet_cnn_verywide
- CNN:L exp126d_musicnet_cnn_extremelywide
- DCNN:S exp127a_musicnet_cnn_deepbasic
- DCNN:M exp127b_musicnet_cnn_deepwide
- DCNN:L exp127c_musicnet_cnn_deepverywide
- DRCNN:S exp128a_musicnet_cnn_deepresnetbasic
- DRCNN:M exp128b_musicnet_cnn_deepresnetwide
- DRCNN:L exp128c_musicnet_cnn_deepresnetverywide
- — exp128c_musicnet_cnn_deepresnetverywide_rerun1
- — exp128c_musicnet_cnn_deepresnetverywide_rerun2
- Unet:S exp160d2_musicnet_unet_large_bugfix
- Unet:M exp160g_musicnet_unet_medium_bugfix
- — exp160g_musicnet_unet_medium_bugfix_rerun1
- — exp160g_musicnet_unet_medium_bugfix_rerun2
- Unet:L exp160e3_musicnet_unet_verylarge_bugfix_scaled
- — exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun1
- — exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun2
- Unet:XL exp160f_musicnet_unet_veryverylarge
- — exp160f_musicnet_unet_veryverylarge_rerun1
- — exp160f_musicnet_unet_veryverylarge_rerun2
- SAUnet:M exp180b_musicnet_unet_verylarge_doubleselfattn
- SAUnet:L exp180d_musicnet_unet_extremelylarge_doubleselfattn
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun1
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun2
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun3
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun4
- SAUnet:XL exp180e_musicnet_unet_insanelylarge_doubleselfattn
- — exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun1
- — exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun2
- SAUnet:XXL exp180f_musicnet_unet_intermedlarge_doubleselfattn
- — exp180f_musicnet_unet_intermedlarge_doubleselfattn_rerun
- SAUSnet:M exp181b_musicnet_unet_verylarge_doubleselfattn_twolayers
- SAUSnet:L exp181d_musicnet_unet_verylarge_doubleselfattn_twolayers
- SAUSnet:XL exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers
- — exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun1
- — exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun2
- SAUSnet:XXL exp181e_musicnet_unet_insanelylarge_doubleselfattn_twolayers
- BLUnet:M exp186b_musicnet_unet_verylarge_blstm
- BLUnet:L exp186d_musicnet_unet_extremelylarge_blstm
- BLUnet:XXL exp186e_musicnet_unet_insanelylarge_blstm
- PUnet:M exp195g_musicnet_unet_extremelylarge_polyphony_softmax
- PUnet:L exp195e3_musicnet_unet_extremelylarge_polyphony_softmax
- PUnet:XL exp195f_musicnet_unet_extremelylarge_polyphony_softmax
- — exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
- — exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2
Experiments from Section IV.C (Table IV) - Model Generalization (more training samples, other testsets). Suffix __ rerun denotes additional training/test runs of a model.
- Unet:XL exp160f_musicnet_unet_veryverylarge_moresamples
- — exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
- — exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
- SAUnet:L exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
- — exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
- SAUSnet:XL exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
- PUnet:XL exp195f_musicnet_unet_extremelylarge_polyphony_softmax_moresamples
- Unet:XL RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples
- — RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
- — RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
- SAUnet:L RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
- — RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
- — RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
- SAUSnet:XL RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
- — RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun1
- — RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun2
- PUnet:XL RETRAIN_exp195f_musicnet_unet_extremelylarge_polyphony_softmax
- see models from (a) Test set MuN-10a
- SAUnet:L RETRAIN2_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
- SAUnet:L RETRAIN3_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
- CNN:M RETRAIN4_exp127c_musicnet_cnn_verywide_moresamples
- DRCNN:L RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples
- — RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun1
- — RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun2
- Unet:M RETRAIN4_exp160f_musicnet_unet_veryverylarge_moresamples
- Unet:XL RETRAIN4_exp160g_musicnet_unet_medium_moresamples
- SAUnet:L RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
- — RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
- — RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
- SAUSnet:XL RETRAIN4_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
- BLUnet:L RETRAIN4_exp186d_musicnet_unet_extremelylarge_blstm_moresamples
- PUnet:XL RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax
- — RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
- — RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2
Experiments from Section IV.D (Fig. 6) - Cross-Version Study on Schubert Winterreise.
- Version split: exp200a_schubert_versionsplit_cnn_verywide
- Song split: exp200b_schubert_songsplit_cnn_verywide
- Neither split: exp200c_schubert_neithersplit_cnn_verywide
- Version split: exp201a_schubert_versionsplit_unet_extremelylarge_doubleselfattn
- Song split: exp201b_schubert_songsplit_unet_extremelylarge_doubleselfattn
- Neither split: exp201c_schubert_neithersplit_unet_extremelylarge_doubleselfattn
Experiments from Section IV.E (Fig. 7) - Cross-Dataset Study on Big Mix Dataset, compiled from all source datasets. Suffix __ rerun denotes additional training/test runs of a model.
- CNN:M exp216c_bigmix_cnn_verywide
- — exp216c_bigmix_cnn_verywide_rerun1
- — exp216c_bigmix_cnn_verywide_rerun2
- DRCNN:L exp214c_bigmix_cnn_deepresnetwide
- — exp214c_bigmix_cnn_deepresnetwide_rerun1
- — exp214c_bigmix_cnn_deepresnetwide_rerun2
- Unet:M exp213g_bigmix_unet_medium
- — exp213g_bigmix_unet_medium_rerun1
- — exp213g_bigmix_unet_medium_rerun2
- Unet:XL exp212f_bigmix_unet_veryverylarge
- — exp212f_bigmix_unet_veryverylarge_rerun1
- — exp212f_bigmix_unet_veryverylarge_rerun2
- SAUnet:L exp210d_bigmix_unet_extremelylarge_doubleselfattn
- — exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun1
- — exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun2
- SAUSnet:XL exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers
- — exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun1
- — exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun2
- BLUnet:L exp217d_bigmix_unet_extremelylarge_blstm
- — exp217d_bigmix_unet_extremelylarge_blstm_rerun1
- — exp217d_bigmix_unet_extremelylarge_blstm_rerun2
- PUnet:XL exp215f_bigmix_unet_extremelylarge_polyphony_softmax
- — exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun1
- — exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun2
Run scripts using e.g. the following commands:
conda activate multipitch_architectures
export CUDA_VISIBLE_DEVICES=1
python experiments/Exp1_SectionIV-B/exp126a_musicnet_cnn_basic.py