Skip to content

Perspective

Hervé Bitteur edited this page May 16, 2018 · 5 revisions

The 5.0 release was focused on the OMR engine and could run in batch and in interactive mode, but the 5.1 is the first release with true user interface features within the 5.x series.

There was indeed a graphical interface available in interactive mode of 5.0, but it was rather limited and moreover was targeted at the developer, to analyze, check and tune the OMR engine.

We initially thought that Audiveris could address only the engine features and then export the resulting data to any kind of tools like music editors, which are programs specifically designed for end-user interactions. Clear and simple!

Reality is more nuanced:

  • OMR export: Audiveris stores its internal data in .omr project files, which are zip archives of structured XML files, using a specific format for OMR data, both public and documented. Any external program could access the OMR information, but there is yet no standard for OMR information.

  • MusicXML export: Audiveris can export data in .mxl files, which are zip archives of structured XML files, using MusicXML format version 3.0. MusicXML is a de facto standard, meant for symbolic data exchange between music applications which can usually edit and play back the music. It lacks many physical information that OMR deals with, for example a note head can be located with respect to its staff, but the staff lines themselves are not defined.

    Many score printouts are of poor quality and OMR is still a research area, so there will always be remaining errors. And just to be able to export valuable MusicXML, Audiveris needs to reach pretty good OMR results.

  • Error correction: To validate a score or to correct the remaining errors, the user needs tools that are designed for this task. While most music editors focus on score creation, here we need score validation, comparison with original image, error detection and error correction.

    To this end, physical information is key, beginning with the precise location of each element in the original image, to allow visual checking of differences. This requires strict fidelity in music layout, something that most music editors don't provide. And if the original image exhibits wavy staff lines, the MusicXML-based rendering will never allow a clean overlay.

  • OMR Dataset: In order to train the coming classifiers based on deep learning techniques (we name them patch classifier and full-page classifier), we will need large amounts of training symbolic data. This data must originate from validated scores, both synthetic scores and manually validated OMR'ed printed scores. The first for pre-training, the second for fine training.

    Just to produce validated/corrected OMR'ed scores, we need a user interface that today can work only on Audiveris OMR format. For this simple reason, we set UI priority on the key training items: the symbols with fixed shape and size.