-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenize_midi_dataset reproducing source file tree, overwrite_mode #82
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #82 +/- ##
==========================================
- Coverage 90.49% 90.41% -0.09%
==========================================
Files 31 31
Lines 4463 4536 +73
==========================================
+ Hits 4039 4101 +62
- Misses 424 435 +11
☔ View full report in Codecov by Sentry. |
@leleogere I'll leave this open a few days in case you want to review / test 😄 |
miditok/midi_tokenizer.py
Outdated
if isinstance(midi_paths, str): | ||
midi_paths = Path(midi_paths) | ||
root_dir = midi_paths | ||
midi_paths = sum(list(midi_paths.glob(f"**/*.{ext}")) for ext in MIDI_FILES_EXTENSIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try this? I suspect this would cause an error, I think that it should be more something like the following:
sum((list(midi_paths.glob(f"**/*.{ext}")) for ext in MIDI_FILES_EXTENSIONS), [])
^ ^^^^^
EDIT: I confirm that after testing, this raises an error. The []
is needed to specify to sum
the start of the aggregation.
EDIT2: You sould remove a dot either in the glob
or in the MIDI_FILES_EXTENSIONS
items as currently there is 2 dots and therefore midi_paths
is empty after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well found thank you!
For the extension we should just remove the dot in the line above.
Do you want to make the code change suggestion or do I make the changes myself ?
Thank you for the review, waiting for your response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest proposed changes (Now that I know how to do a code suggestion, I'll do that directly next time!)
Committed! |
I'll try to improve coverage before merging |
Co-authored-by: leleogere <71326140+leleogere@users.noreply.github.com>
2a5cd40
to
03a2582
Compare
Following #79, this PR make
tokenize_midi_dataset
reproduce the source files file tree inout_dir
.It also allows to handle file overwriting if files already exist in the saving path.
The file tree reproduction has also to be made for
data_augmentation_dataset