Add the schema to the output of the `.repartition()` method #192

sararb · 2022-12-21T15:08:44Z

For multi-gpu training support in transformers4rec, we are using .repartition() to split the dataset across the number of available GPUs (i.e. global_size) equally. (in this line).
The multi-gpu script is broken because the repartition method returns a new Dataset object without copying the original dataset schema (where we set properties such as is_ragged=False and value_count to ensure the dataloader returns dense tensors instead of a tuple representation).
This PR proposes a quick-fix where the original schema is passed to the new Dataset object returned by dataset.repartition()

github-actions · 2022-12-21T15:16:03Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-192

add schema parameter to the repartition method

401f7e7

sararb added bug Something isn't working P0 labels Dec 21, 2022

sararb requested a review from edknv December 21, 2022 15:08

sararb self-assigned this Dec 21, 2022

sararb changed the title ~~Add the schema to the output of the . repartition() method~~ Add the schema to the output of the .repartition() method Dec 21, 2022

edknv approved these changes Dec 21, 2022

View reviewed changes

karlhigley approved these changes Dec 21, 2022

View reviewed changes

karlhigley merged commit cfbd860 into main Dec 21, 2022

sararb added a commit that referenced this pull request Dec 29, 2022

add schema parameter to the repartition method (#192)

2fc6889

rnyak mentioned this pull request Jan 11, 2023

[BUG] Multi-gpu notebook gives error when it is run with multi-gpu NVIDIA-Merlin/Transformers4Rec#582

Closed

karlhigley mentioned this pull request Jan 25, 2023

[BUG] repartition on Dataset removes tags from schema #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the schema to the output of the `.repartition()` method #192

Add the schema to the output of the `.repartition()` method #192

sararb commented Dec 21, 2022

github-actions bot commented Dec 21, 2022

Add the schema to the output of the .repartition() method #192

Add the schema to the output of the .repartition() method #192

Conversation

sararb commented Dec 21, 2022

github-actions bot commented Dec 21, 2022

Documentation preview

Add the schema to the output of the `.repartition()` method #192

Add the schema to the output of the `.repartition()` method #192