Skip to content

Commit

Permalink
Logo now with white outer stroke (#180)
Browse files Browse the repository at this point in the history
* logo now with white outer stroke + svg files

* fixing docs + adding favicon

* minor change docs index

* minor fixes MIDI docs

* mmm add_to_vocab called for both base_tokenizer and self

* docs fix in `DatasetMIDI`

* MMM now preprocess scores without merging tracks with same programs, fix in MMM splitting TokSequence per bar/beat, fix REMI ProgramChange/bar token type graph, `adapt_ref_score_for_tests_assertion` now no longer merge tracks (was useless)

* `is_track_empty` method and deleting tracks with not supported program in `preprocess_score`

* handling special case where there are no tracks but only tempos/ts

* adding arguments values in data aug report

* `get_bars_ticks` and `get_beats_ticks` handling non-positive time signature denominators

* fix on last commit

* fix on last commit

* fix on data augmentation `_filter_offset_tuples_to_score` + speeding it with numpy

* setting copy back to symusic 0.4.8

* docs fixes
  • Loading branch information
Natooz authored Jun 19, 2024
1 parent 0671ca1 commit 7b63a8c
Show file tree
Hide file tree
Showing 24 changed files with 291 additions and 106 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,6 @@ jobs:
run: |
# Install local package with tests dependencies extras
python -m pip install --upgrade pip
# TODO remove this when v0.5.0 is out
pip install git+https://github.com/Yikai-Liao/symusic
pip install -e ".[tests]"
- name: Test with pytest
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Python package to tokenize music files, introduced at the ISMIR 2021 LBDs.

![MidiTok Logo](docs/assets/logo.png?raw=true "")
![MidiTok Logo](docs/assets/miditok_logo_stroke.png?raw=true "")

[![PyPI version fury.io](https://badge.fury.io/py/miditok.svg)](https://pypi.python.org/pypi/miditok/)
[![Python 3.8](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/)
Expand Down
Binary file added docs/assets/favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/logo.png
Binary file not shown.
Binary file added docs/assets/miditok_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/miditok_logo.pxd
Binary file not shown.
21 changes: 21 additions & 0 deletions docs/assets/miditok_logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/miditok_logo_stroke.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/miditok_logo_stroke.pxd
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "furo"
html_title = "MidiTok's docs"
html_logo = "assets/miditok_logo_stroke.png"
html_favicon = "assets/favicon.png"
# tikz_proc_suite = "GhostScript" # required for readthedocs, produce png, not svg
5 changes: 2 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,13 @@
Welcome to MidiTok's documentation!
=========================================

.. image:: /assets/logo.png
.. image:: /assets/miditok_logo_stroke.png
:width: 600
:alt:

**MidiTok** is a Python package for MIDI file tokenization, introduced at the ISMIR 2021 LBDs `(paper) <https://archives.ismir.net/ismir2021/latebreaking/000005.pdf>`_.
It tokenize symbolic music files (MIDI, abc), i.e. convert them into sequences of tokens ready to be fed to models such as Transformer, for any generation, transcription or MIR task.
MidiTok features most known MIDI :ref:`tokenizations`, and is built around the idea that they all share common methods. Tokenizers can be trained with BPE, Unigram or WordPiece (:ref:`Training a tokenizer`) and be push to and pulled from the Hugging Face hub!
`Github repository <https://github.com/Natooz/MidiTok>`_

Installation
==================
Expand All @@ -29,7 +28,7 @@ Citation

If you use MidiTok for your research, a citation in your manuscript would be gladly appreciated. ❤️

You can also find BibTeX :ref:`citations` of tokenizations.
You can also find in this documentation BibTeX :ref:`citations` of related research works.

.. code-block:: bib
Expand Down
12 changes: 6 additions & 6 deletions docs/midi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ A MIDI file allows to store MIDI messages as a symbolic music file. It is the mo
History of MIDI
-----------------------------

MIDI first appeared in the early eighties, when digital instrument manufacturers needed a digital protocol for communication between devices such as synthesizers and computers. It was standardized in 1983 by the first specifications, and is currently maintained by the `MIDI Manufacturers Association <https://www.midi.org>`_\. Meanwhile `new specifications <https://www.midi.org/specifications>`_ were made, the two major ones and still the norm today being the General MIDI 1 (GM1) and General MIDI 2. These specifications aim to guide the manufacturers to design digital music devices compatible with the ones from other manufacturers.
MIDI first appeared in the early eighties, when digital instrument manufacturers needed a digital protocol for communication between devices such as synthesizers and computers. It was standardized in 1983 by the first specifications, and is currently maintained by the `MIDI Manufacturers Association <https://www.midi.org>`_\. Meanwhile `new specifications <https://www.midi.org/specifications>`_ were made, the two major ones and still the norm today being the General MIDI 1 (GM1) and General MIDI 2 (GM2). These specifications aim to guide the manufacturers to design digital music devices compatible with the ones from other manufacturers by making sure they implement the protocol by following the same recommendations.

The MIDI protocol allows to represent **notes, tempos, time signatures, key signatures, instruments (called programs) and effects (called controls) such as sustain pedal, pitch bend or modulation.**
MIDI is an event based protocol. It consists of a series of messages, which can occur in multiple channels. Each message is composed of two key information, 1) the delta time expressed, which is the distance in ticks with the previous event (in the same channel) and so represents its position in time, 2) a message which represents its content.
MIDI is an event based protocol. It consists of a series of messages, which can occur in multiple channels. Each message is composed of two key information, 1) the delta time expressed, which is the distance in ticks with the previous event (in the same channel) and so represents its position in time, 2) a series of bytes which represents its content.

The latest evolution of the MIDI protocol is the MIDI Polyphonic Expression (shortly called MPE). This new norm allows manufacturers to create MIDI devices on which a specific channel is assigned to each note allowing the user to apply pitch bend and modulation on each key independently. These devices are typically built with touch-sensitive keys. The MIDI Manufacturers Association released the complete `specifications <https://www.midi.org/midi-articles/midi-polyphonic-expression-mpe>`_ on March 2018.

Expand All @@ -30,14 +30,14 @@ A message expresses an event or an information. It takes the form of a series of
- *Program Change*: specifies the current instrument being played;
- *Control Change*: a control parameter is modified or applied. The modulation wheel, foot sustain pedal, volume control or bank select are for instance effects transcribed into Control Change messages.

Note that these messages are "voice messages", which means that each of them is applied within a channel that is specified in its status byte. The MIDI protocol handles up to sixteen channels which allows to connect multiple instruments that are playing and communicating all together. The channel 10 is reserved for drums, which is a specific "program" in which the pitch values corresponds to drum sounds like kicks, snares, or hi-hats.
Note that these messages are "voice messages", which means that each of them is applied within a channel that is specified in its status byte. The MIDI protocol handles up to sixteen channels which allows to connect multiple devices that are playing and communicating simultaneously. The channel 10 is reserved for drums, which is a specific "program" in which the pitch values corresponds to drum sounds like kicks, snares, or hi-hats.

Time in MIDI
-----------------------------

Time in MIDI is determined by its **time division**, which is a clock signal expressed in **ticks per quarter note** (tpq), and can be seen as a time resolution. Common time division values are 384 or 480 tpq.
The time division can also be set in ticks per second, but this solution is more rarely encountered as it makes less sense to use seconds while the tempo and time signature are known.
The time division is the first information that can be read at the beginning of a file, and a MIDI can only have one time division.
Time in MIDI is determined by its **time division**, which is a clock signal expressed in **ticks per quarter note** (tpq), and can be seen as a time resolution. Common time division values are 384, 480 and 960 tpq as they are divisible by 3, 4, 6 and 8 which are common time signature numerators and denominators.
The time division can also be set in ticks per second, but this option is more rarely encountered as it makes less sense to use seconds as the tempo and time signature are known in MIDI.
The time division is the first information that can be read at the beginning of a file, and a MIDI file can only have one time division.

The number of ticks per bar and ticks per beat can be calculated from the MIDI's time division (:math:`time_{div}`) and the current time signature (:math:`\frac{ts_{num}}{ts_{denom}}`):

Expand Down
10 changes: 5 additions & 5 deletions docs/music_formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Music formats
===================================

This page introduces the basic concepts of music, the MIDI protocol and sequential deep learning models. It aims to bring the basic knowledge around this subjects in order to understand how to use music with AI models, without going into too specific details, for which more comprehensive references are attached.
This page introduces the two representations of music and symbolic music file formats. It aims to present the basic differences between audio and symbolic music in order to better understand how they can be used with AI models, without going to much in the details, for which more comprehensive references are attached.

Music: symbolic and audio
---------------------------
Expand All @@ -15,25 +15,25 @@ Symbolic music represents the successions of notes, arranged in time and along w
:width: 800
:alt: A sheet music.

The `pianoroll <https://en.wikipedia.org/wiki/Piano_roll>`_ is another symbolic representation which consists of a two axis grid with one axis for the time and one for the note pitches. It was originally used in player pianos, and is now used in most `Digital Audio Wordstation (DAW) <https://en.wikipedia.org/wiki/Digital_audio_workstation>`_ softwares to show the notes and other effects of a track.
The `pianoroll <https://en.wikipedia.org/wiki/Piano_roll>`_ is another symbolic representation which consists of a two axis grid with one axis for the time and one for the note pitches. It was originally used in player pianos, and is now used in most `Digital Audio Wordstation (DAW) <https://en.wikipedia.org/wiki/Digital_audio_workstation>`_ software to show the notes and other effects of a track.

.. image:: /assets/bases/pianoroll_daw.png
:width: 800
:alt: A piano roll view in the Logic Pro X DAW.

Audio on the other hand represents the *physical* form of music, i.e. a sound signal, more specifically vibrations propagating in a material. Audio music is usually represented as waveforms (time domain) or spectrograms (frequency domain).

A waveform is stricticly the amplitude of a sound as a function of time. In the real world, a waveform is purely continuous. A digital audio waveform as found in audio files such as mp3s will feature a sampling frequency which indicates the number of samples per second used to represent this waveform. This time resolution is usually at least 44.1k samples per seconds, following the `Nyquist–Shannon theorem <https://en.wikipedia.org/wiki/Nyquist–Shannon_sampling_theorem>`_ .
A waveform is strictly the amplitude of a sound as a function of time. In the real world, a waveform is purely continuous. A digital audio waveform as found in audio files such as mp3s will feature a sampling frequency which indicates the number of samples per second used to represent this waveform. This time resolution is usually at least 44.1k samples per seconds, following the `Nyquist–Shannon theorem <https://en.wikipedia.org/wiki/Nyquist–Shannon_sampling_theorem>`_ .

A sound, wether from an instrument, a human voice or a music arrangement, is a superposition of many periodic frequencies, defined by their wavelength, amplitude and phase. A spectrogram depicts the intensity in dB of the frequencies as a function of time. It allow to have a representation of these frequencies which is useful when analyzing sound. It can be computed with a `Fourier Transform <https://en.wikipedia.org/wiki/Fourier_transform>`_ , usually a `Short Time Fourier Transform (STFT) <https://ieeexplore.ieee.org/document/1164317>`_ .
A sound, whether from an instrument, a human voice or a music arrangement, is a superposition of many periodic frequencies, defined by their wavelength, amplitude and phase. A spectrogram depicts the intensity in dB of the frequencies as a function of time. It allow to have a representation of these frequencies which is useful when analyzing sound. It can be computed with a `Fourier Transform <https://en.wikipedia.org/wiki/Fourier_transform>`_ , usually a `Short Time Fourier Transform (STFT) <https://ieeexplore.ieee.org/document/1164317>`_ .

.. image:: /assets/bases/spectrogram.png
:width: 800
:alt: The spectrogram of a sound, abscissa is time, ordinate is frequency and the color represents the intensity in dB.

Symbolic music can be seen as both discrete and continuous as it represent discrete notes that feature however "continuous-like" attributes, and potentially with a high time resolution (in samples per beat or other specific time duration). **For this reason, it is more commonly used with discrete sequential models**, which we introduce in :ref:`sequential-models-label`), **by being represented as sequences of tokens**, which is the purpose of MidiTok. Pianoroll has also been used with `Convolutional Neural Networks (CNNs) <https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ in past works (e.g. `MuseGan <https://aaai.org/papers/11312-musegan-multi-track-sequential-generative-adversarial-networks-for-symbolic-music-generation-and-accompaniment/>`_ ) but is now uncommon due to the limitations it imposes on the representation of musical elements.

On the other hand, audio is by nature a continuous modality, as it represent the waveform of the sound itself. From a practical point of view, modeling raw waveforms with neural networks is often intractable due to the high time resolution of audio, despite works that achieved to do it (`WaveNet <https://arxiv.org/pdf/1609.03499>`_ , `Jukebox <https://openai.com/index/jukebox/>`_ ). For this reason, audio has been more commonly formatted as spectrograms when used with neural networks, and used with CNNs as it conventiently takes the form of a 2-dimensional matrix with distinct continuous patterns like images.
On the other hand, audio is by nature a continuous modality, as it represent the waveform of the sound itself. From a practical point of view, modeling raw waveforms with neural networks is often intractable due to the high time resolution of audio, despite works that achieved to do it (`WaveNet <https://arxiv.org/pdf/1609.03499>`_ , `Jukebox <https://openai.com/index/jukebox/>`_ ). For this reason, audio has been more commonly formatted as spectrograms when used with neural networks, and used with CNNs as it conveniently takes the form of a 2-dimensional matrix with distinct continuous patterns like images.
Research in neural audio codecs allowed to "compress" audio waveform into a reduced number of discrete values allows to use waveforms as sequences of tokens with discrete models such as Transformers. For more details, see `SoundStream <https://ieeexplore.ieee.org/document/9625818>`_ and `EnCodec <https://openreview.net/forum?id=ivCd8z8zR2>`_ which are respectively used with `MusicLM <https://arxiv.org/abs/2301.11325>`_ and `MusicGen <https://proceedings.neurips.cc/paper_files/paper/2023/hash/94b472a1842cd7c56dcb125fb2765fbd-Abstract-Conference.html>`_ .


Expand Down
Loading

0 comments on commit 7b63a8c

Please sign in to comment.