Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerTok Tokenizer #191

Merged
merged 19 commits into from
Sep 10, 2024
Merged

PerTok Tokenizer #191

merged 19 commits into from
Sep 10, 2024

Conversation

JLenzy
Copy link
Contributor

@JLenzy JLenzy commented Jul 22, 2024

This is Lemonaide's new Tokenizer, called PerTok (performance tokenizer)


📚 Documentation preview 📚: https://miditok--191.org.readthedocs.build/en/191/

@Natooz Natooz added the enhancement New feature or request label Jul 22, 2024
Copy link
Owner

@Natooz Natooz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from here we can take two paths:

  1. implement the basic micro-timing (without overlapping beat_res) in the microtiming-draft branch and make it work, then pull these changes to this branch and make the few adaptations, that shouldn't requite to change much things;
  2. Put the microtiming-draft and MT feature aside from now and just focus on PerTok, which would however need to fetch it's parameters solely from config.additional_params (for now) and undo the PerTok-only changes made to the config and MusicTokenizer classes.

In both cases we'll have to make the test pass.

For 1., I can take care of it (I don't know when, I'm quite busy for now, but hopefully this shouldn't take long, most of it is already implemented).
WDYT?

miditok/classes.py Outdated Show resolved Hide resolved
miditok/classes.py Outdated Show resolved Hide resolved
miditok/classes.py Outdated Show resolved Hide resolved
miditok/constants.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/tokenizations/pertok.py Outdated Show resolved Hide resolved
miditok/tokenizations/pertok.py Outdated Show resolved Hide resolved
miditok/tokenizations/pertok.py Outdated Show resolved Hide resolved
@JLenzy
Copy link
Contributor Author

JLenzy commented Aug 12, 2024

implement the basic micro-timing (without overlapping beat_res) in the microtiming-draft branch and make it work, then pull these changes to this branch and make the few adaptations, that shouldn't requite to change much things;
Put the microtiming-draft and MT feature aside from now and just focus on PerTok, which would however need to fetch it's parameters solely from config.additional_params (for now) and undo the PerTok-only changes made to the config and MusicTokenizer classes.

At this stage I would prefer to focus on the second option, as my experience has been that the duration/microtiming issue quickly becomes a rabbit-hole when it is implemented through the base classes. This would help get this update ready sooner and I do have to be conscious of time constraints here.

One other question I have: Would it be okay for me to add something like "use_full_resolution" to the TokenizerConfig, in addition to "use_microtiming"?
The reason being that PerTok needs to have full-resolution (not downsampled) files, regardless of whether it is using MicroShift tokens or not. This is how it is able to implement the overlapping Timeshift/Duration values. Right now, if I set 'use_microtiming=False' then it breaks the entire tokenizer, because the original MidiTok downsampling logic comes into play.

@Natooz
Copy link
Owner

Natooz commented Aug 12, 2024

Noted, let’s focus on the second one then!

For ‘use_full_resolution’, as long as it can be implemented only in PerTok that’s possible

miditok/classes.py Outdated Show resolved Hide resolved
Copy link
Owner

@Natooz Natooz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a few lint suggestions for the main files

miditok/classes.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
Copy link
Owner

@Natooz Natooz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint fixes in suggestions

miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
miditok/midi_tokenizer.py Outdated Show resolved Hide resolved
tests/utils_tests.py Outdated Show resolved Hide resolved
tests/utils_tests.py Show resolved Hide resolved
miditok/tokenizations/pertok.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 79.09091% with 69 lines in your changes missing coverage. Please review.

Project coverage is 90.15%. Comparing base (9d9781c) to head (07b3d1d).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
miditok/tokenizations/pertok.py 77.82% 55 Missing ⚠️
tests/utils_tests.py 76.66% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #191      +/-   ##
==========================================
- Coverage   90.65%   90.15%   -0.50%     
==========================================
  Files          40       41       +1     
  Lines        6011     6575     +564     
==========================================
+ Hits         5449     5928     +479     
- Misses        562      647      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Natooz Natooz merged commit 967209a into Natooz:main Sep 10, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants