Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems some bug in split_dataset #209

Closed
aiboys opened this issue Jan 10, 2022 · 1 comment · Fixed by #210
Closed

It seems some bug in split_dataset #209

aiboys opened this issue Jan 10, 2022 · 1 comment · Fixed by #210

Comments

@aiboys
Copy link

aiboys commented Jan 10, 2022

When building dataset dict, transformers would be overloaded in train and val split dataset in split_dataset

I fixed such bug using following codes:

import copy
dataset_dict[sub_dataset_id] = copy.deepcopy(sub_dataset) 
@yoshitomo-matsubara
Copy link
Owner

Hi @aiboys

Thank you for reporting that!

It seems that Subset class in PyTorch simply shallow-copies the original dataset. In PR #210, I made it deep-copy before the for-loop to fix the issue, instead of when substituting to dataset_dict. Otherwise, other splits might be affected by the change made before dataset_dict[sub_dataset_id] = copy.deepcopy(sub_dataset) you showed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants