-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding func_to_get_labels argument to DatasetTok #80
Conversation
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #80 +/- ##
==========================================
+ Coverage 90.29% 90.42% +0.13%
==========================================
Files 31 31
Lines 4305 4334 +29
==========================================
+ Hits 3887 3919 +32
+ Misses 418 415 -3
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not tried it on my data yet, but at first glance it looks good!
Do not merge this immediately, I am getting an error when using the DatasetTok with the DataCollator when trying to |
The issue is in the last line of https://github.com/Natooz/MidiTok/blob/main/miditok/pytorch_data/collators.py#L78-L82 if y is not None:
if isinstance(y[0], LongTensor):
y = _pad_batch(y, self.labels_pad_idx, self.pad_on_left)
else: # classification
y = torch.stack(y) The y = torch.stack([torch.tensor(l) for l in y]) I'm not sure that this is the right place to do that but this change makes it work. However, this might cause an issue when training models as I think most models usually use a labels vector of shape |
(my apologies, wrong operation) I'm working on passing the labels as tensors Indeed the data loader will output a batch with labels with shape |
I did not have the time to have a look earlier, but I think there is still an issue with the Traceback (most recent call last):
File "/home/gerel/Documents/GrooveMIDI/data_preprocessing.py", line 133, in <module>
model.run_training(
File "/home/gerel/Documents/GrooveMIDI/models.py", line 60, in run_training
writer.add_graph(self, next(iter(train_loader))['input_ids'])
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gerel/anaconda3/envs/groove/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/gerel/anaconda3/envs/groove/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gerel/anaconda3/envs/groove/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/home/gerel/anaconda3/envs/groove/lib/python3.11/site-packages/miditok/pytorch_data/collators.py", line 80, in __call__
y = _pad_batch(y, self.labels_pad_idx, self.pad_on_left)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gerel/anaconda3/envs/groove/lib/python3.11/site-packages/miditok/pytorch_data/collators.py", line 155, in _pad_batch
length_of_first = batch[0].size(0)
^^^^^^^^^^^^^^^^
IndexError: Dimension specified as 0 but tensor has no dimensions The problem comes from the line |
Thanks for pointing it, I'll take a look tonight and hopefully fix this |
It should be fixed now! (Sorry for being late 😅) |
Following #78, this PR adds the ability to read labels from
DatasetTok
with afunc_to_get_labels
provided method.@leleogere you can review if desired before merging the PR. Otherwise I'll merge in a few days :)