-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update padding of ragged features to enable dataloader change #647
Update padding of ragged features to enable dataloader change #647
Conversation
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-647 |
conts=None, | ||
labels=None, | ||
): | ||
schema = schema.select_by_name(conts + cats + labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove the =None
for cats, conts & labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's clearer without the default now. it wouldn't have worked as None before this change either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I've put the None back and added support for the None to the method by setting to an empty list if not provided.
It doesn't appear to be captured by any tests, but this would enable training a model with only continuous or only categorical features. Currently you need to have at least one of each for the current version of this MerlinDataloader to work.
|
||
batch_padded = {} | ||
for k, values in batch.items(): | ||
if k.endswith("__values"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to put __values
and __offsets
as constants somewhere in merlin-core? Or some util-method, like is_value
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps TensorTable
would be useful here since that seems to handle both the values/offsets dictionary and tuple variants for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 43d535d
return pad_fn | ||
|
||
@staticmethod | ||
def _augment_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this from schema_utiils
to here so that it's closer to the only place it's called in the codebase.
Goals ⚽
Implementation Details 🚧
Testing Details 🔍