-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add H&M fashion recommendation dataset #2708
Conversation
for more information, see https://pre-commit.ci
ludwig/datasets/loaders/utils.py
Outdated
return neg_items.tolist(), max(0, samples_required - available_samples) | ||
|
||
|
||
def negative_sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're exceeding the scope of the datasets API by performing negative sampling here. The datasets API intent is to return standard benckmark datasets 'as-is' (or as close to the original as we can get).
IMO the sampling implementation belongs in ludwig.data
, and should be performed as a separate phase after the dataset is loaded (also sampling has hyperparameters neg_pos_ratio
and log_pct
).
Is this possible, or would it be too inefficient to merge article and customer features prior to negative sampling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - removed the negative sampling from the loader and moved it to ludwig.data
in this PR
Unit Test Results 6 files ± 0 6 suites ±0 3h 29m 52s ⏱️ - 4m 11s For more details on these failures, see this check. Results for commit 3f2ab35. ± Comparison against base commit 3504538. ♻️ This comment has been updated with latest results. |
I extracted the negative sampling into a separate PR |
No description provided.