Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug the tortuous logic in _prepare_dataset function #464

Merged
merged 3 commits into from
Jun 24, 2023

Conversation

BeibinLi
Copy link
Contributor

There are two issues with the previous _prepare_dataset function.

  1. Tortuous and burdensome logic: the is_already_dataset variable is confusing and not helpful. So, remove it.
  2. The comments and the logics do not match.

For instance, in the previous version, the comments said "check if torch dataset ... and do nothing". However, when "dataset" is a torch.utils.data.Dataset and packing = True? It will still move into the _prepare_non_packed_dataloader(...) function call.

The corrected version will do nothing if the dataset is already a torch dataloader/dataset/ConstantLengthDataset.

There are two issues with the previous `_prepare_dataset` function.

1. Tortuous and burdensome logic: the `is_already_dataset` variable is confusing and not helpful. So, remove it.
2. The comments and the logics do not match. 

For instance, in the previous version, the comments said "check if torch dataset ... and do nothing". However, when "dataset" is a torch.utils.data.Dataset and `packing = True`? It will still move into the _prepare_non_packed_dataloader(...) function call. 

The corrected version will do nothing if the dataset is already a torch dataloader/dataset/ConstantLengthDataset.
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 23, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @BeibinLi for proposing this refactor, can you run the styling checks in order to trigger the CI tests?

make style && make quality

Or alternatively

make precommit

If you cloned the repo after: #448
After that we should be good to merge

@BeibinLi
Copy link
Contributor Author

Thanks a lot @BeibinLi for proposing this refactor, can you run the styling checks in order to trigger the CI tests?

make style && make quality

Or alternatively

make precommit

If you cloned the repo after: #448 After that we should be good to merge

Thanks! I just re-ran the styling checks and pushed the latest code.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @BeibinLi !
This is much more cleaner now!

@younesbelkada younesbelkada merged commit 009b824 into huggingface:main Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants