Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add H&M fashion recommendation dataset #2708

Merged
merged 10 commits into from
Oct 26, 2022
Merged

Add H&M fashion recommendation dataset #2708

merged 10 commits into from
Oct 26, 2022

Conversation

jppgks
Copy link
Contributor

@jppgks jppgks commented Oct 25, 2022

No description provided.

ludwig/datasets/loaders/dataset_loader.py Show resolved Hide resolved
ludwig/datasets/kaggle.py Show resolved Hide resolved
ludwig/datasets/loaders/hm_fashion_recommendations.py Outdated Show resolved Hide resolved
return neg_items.tolist(), max(0, samples_required - available_samples)


def negative_sample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're exceeding the scope of the datasets API by performing negative sampling here. The datasets API intent is to return standard benckmark datasets 'as-is' (or as close to the original as we can get).

IMO the sampling implementation belongs in ludwig.data, and should be performed as a separate phase after the dataset is loaded (also sampling has hyperparameters neg_pos_ratio and log_pct).

Is this possible, or would it be too inefficient to merge article and customer features prior to negative sampling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - removed the negative sampling from the loader and moved it to ludwig.data in this PR

@github-actions
Copy link

github-actions bot commented Oct 25, 2022

Unit Test Results

         6 files  ±  0           6 suites  ±0   3h 29m 52s ⏱️ - 4m 11s
  3 504 tests ±  0    3 383 ✔️ ±  0    79 💤 ±  0  42 ±0 
10 512 runs  +56  10 213 ✔️ +43  257 💤 +13  42 ±0 

For more details on these failures, see this check.

Results for commit 3f2ab35. ± Comparison against base commit 3504538.

♻️ This comment has been updated with latest results.

@jppgks
Copy link
Contributor Author

jppgks commented Oct 26, 2022

I extracted the negative sampling into a separate PR

@jppgks jppgks requested a review from dantreiman October 26, 2022 11:19
@jppgks jppgks merged commit abfdc05 into master Oct 26, 2022
@jppgks jppgks deleted the hm-recs branch October 26, 2022 17:54
jppgks added a commit that referenced this pull request Nov 4, 2022
jppgks added a commit that referenced this pull request Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants