6x Training

Important remark: At Audiveris, we have been provided directly with the trained models from ZHAW for page and for patch classifiers, we never had the opportunity to train them on our own.

Therefore, this chapter is essentially a collection of references to ZHAW public information. We however indicate some possible improvements regarding the training data sets.

ZHAW Data Set

DeepScores is the project name for this ZHAW data set.

It is a collection of 300 000 pages of digitally rendered music scores.

This is real music but synthetic images, using 5 different musical fonts. From MuseScore library, MusicXML scores were all fed into Lilypond to produce 3 kinds of artifact:

images_png: Page image in .png format
pix_annotations_png: Pixel labelling using symbol index as gray value
xml_annotations: Collection of symbols tuples (symbol name, bounding box within image)

Main pointers:

Technical report is: "DeepScores - A Dataset for Segmentation, Detection and Classification of Tiny Objects"
Latest version at https://arxiv.org/pdf/1804.00525.pdf
GitHub (initial?) umbrella page: https://tuggeluk.github.io/deepscores/
Full data set (beware of the 69 GB to download) is in DeepScoresArchives
at https://drive.google.com/drive/folders/1KFxqi0rO-bJrd03rLk87fF1iOmnjpaoG
GitHub repository for related code: https://github.com/tuggeluk/DeepScoresExamples
Evolution axes were recently presented in article: "DeepScores and Deep Watershed Detection current state and open issues"
https://www.groundai.com/project/deepscores-and-deep-watershed-detection-current-state-and-open-issues/

Additional Data Sets

DeepScores is a very large data set, but its images are somewhat far from the day-to-day reality of scores to be OMR'ed:

All images are synthetic outputs and thus of perfect quality
They were all rendered by the same music renderer (Lilypond) resulting in similar layouts.

Although the large DeepScores data set is suitable for a massive "pre-training", it might benefit from a final training on smaller but different data sources.

MUSCIMA++

It is a data set of handwritten music notation for optical music recognition.
A bunch of pointers are in https://muscima.readthedocs.io/en/latest/

Although not part of DeepScores project, this set was used during the training of ZHAW Detection service.

MuseScore

The idea is again to start from MuseScore large library of real music, but this time using MuseScore own music renderer. This is likely to result in layouts somewhat different than Lilypond-rendered DeepScores images.

This approach was sketched between MuseScore and Audiveris during Salsburg 2017 Music Hackday. It is documented on https://github.com/Audiveris/omr-dataset-tools

Thanks to Animesh Tewari who spent his GSoC (Google Summer of Code) 2018 on this task, we now have a first set of 4000 scores. I need to review this material thoroughly before MuseScore launches a larger production.

IMSLP / Audiveris

The idea is to start from IMSLP library at https://imslp.org/ which gathers a huge collection of printed scores, most of them resulting from the scan of engraved music, thus not biased by any music renderer.

The major downside of this collection is of course the lack of related ground truth.

This actually was the first motivation in developing in Audiveris 5.1: With a decent end-user interface to allow an easy validation / correction of OMR output, Audiveris can now be used to gradually populate a real-world training data set.

See the "Annotate Book Symbols" section in Audiveris 5.1 Handbook.

Training the models

Please refer to publication "Deep Watershed Detector for Music Object Recognition" available at https://arxiv.org/abs/1805.10548

The related training code is available at https://github.com/tuggeluk/DeepWatershedDetection

My understanding is that both (page and patch) classifiers were trained on DeepScores data. And both used ResNet101 as an implementation basis.

Page Model

Trained model is at https://drive.google.com/open?id=1P_jWBP9Z0bad1wuzqiVAj3sUR67mSwa0

Patch Model

Trained model is at https://drive.google.com/open?id=1iXr3KGCVgzCGP9CUo1tefis3GFBwCxSQ

Software licensed under the GNU Affero General Public License (AGPL) Version 3
© 2000-2023 Audiveris. Logo designed by Katka.

Updated for 5.3 release

Perspective

For Users

For Developers

6.x prototype

6.x ideas

Ideas List

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6x Training

6x Training

ZHAW Data Set

Additional Data Sets

MUSCIMA++

MuseScore

IMSLP / Audiveris

Training the models

Page Model

Patch Model

For Users

For Developers

6.x prototype

6.x ideas

Clone this wiki locally