You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! This is related to the changes in #68 on the handling of program tokens with MIDILike tokenization.
With use_programs enabled, miditok currently adds program tokens before each pitch token. To be consistent with previous MIDI-like tokenizations in Oore et al. and Huang et al., program tokens should only be added where program change messages occur in the original MIDI file. The resulting encoding is much more compact and practical, e.g., for a single track MIDI file with a single specified instrument, there would only be one program token in the tokenized stream rather than however many notes are.
The text was updated successfully, but these errors were encountered:
This should be doable. But as we rely in MIDIToolkit, we do not have track of the "real" program change messages. Instead, we can add a ProgramChange token when a note is being played by an instrument other than the last one, which should be the same anyway. I can work on this tomorrow
Also, Program tokens for each note increase significantly the sequence length, but this is greatly mitigated with BPE, at a point where it isn't a big issue.
Hi 👋,
The feature is added to main.
I'll wait some time, hoping to catch issues or feature requests, before releasing the next version, as this one doesn't fix anything major.
You can still try it by installing from git
Hi! This is related to the changes in #68 on the handling of program tokens with MIDILike tokenization.
With
use_programs
enabled, miditok currently adds program tokens before each pitch token. To be consistent with previous MIDI-like tokenizations in Oore et al. and Huang et al., program tokens should only be added where program change messages occur in the original MIDI file. The resulting encoding is much more compact and practical, e.g., for a single track MIDI file with a single specified instrument, there would only be one program token in the tokenized stream rather than however many notes are.The text was updated successfully, but these errors were encountered: