This release introduces the PerTok
tokenizer by Lemonaide AI, attribute controls tokens and minor fixes.
Highlights
PerTok: Performance Tokenizer
(associated paper to be released)
Developed by Julian Lenz (@JLenzy) at Lemonaide AI to capture expressive timing in symbolic scores while maintaining competitively low sequence lengths. It achieves this by dividing time differences into Macro and Micro categories, introducing a new MicroTime token type. Subtle deviations from the quantized beat are represented with these Timeshift tokens.
Furthermore, PerTok enables you to encode an unlimited number of note subdivisions by enabling multiple, overlapping values within the 'beat_res' parameter of the TokenizerConfig
.
The micro timing tokens will be extended to all tokenizers in a future update.
### Attribute Control tokens
Attribute controls are additional tokens allowing to train models in order to control them during inference, by enforcing a model to predict music with specific features.
What's Changed
- updates to Example_HuggingFace_Mistral_Transformer.ipynb by @briane412 in #164
_model_name
is now a protected property by @Natooz in #165- Fixing docs for tokenizer training by @Natooz in #167
- Default
continuing_subword_prefix
when splitting token sequences by @Natooz in #168 - small bug fix in MIDI pretokenization by @shenranwang in #170
- adding
no_preprocess_score
argument when tokenizing by @Natooz in #172 TokSequence
summable,concatenate_track_sequences
arg for MMM by @Natooz in #173- Docs update by @Natooz in #175
- Fixing split methods for empty files (no tracks and/or no notes) by @Natooz in #177
- Logo now with white outer stroke by @Natooz in #180
- Attribute controls feature by @helloWorld199 in #181
- better distinction between
one_token_stream
andconfig.one_token_stream_for_programs
by @Natooz in #182 - making sure MMM token sequences are not concatenated when splitting them per bar/beat in tokenizer_training_iterator.py by @Natooz in #183
- rST Documentation fixes by @scottclowe in #184
- Bump actions/stale from 5.1.1 to 9.0.0 by @dependabot in #185
- Bump actions/download-artifact from 3 to 4 by @dependabot in #186
- Bump codecov/codecov-action from 3.1.0 to 4.5.0 by @dependabot in #187
- Bump actions/upload-artifact from 3 to 4 by @dependabot in #188
- Fixing bugs caused by changes from symusic v0.5.0 by @Natooz in #192
use_velocities
anduse_duration
configuration parameters by @Natooz in #193- collator now handles decoder input ids (seq2seq models) by @Natooz in #194
- PerTok Tokenizer by @JLenzy in #191
New Contributors
- @briane412 made their first contribution in #164
- @helloWorld199 made their first contribution in #181
- @scottclowe made their first contribution in #184
- @dependabot made their first contribution in #185
Full Changelog: v3.0.3...v3.0.4