Add H&M fashion recommendation dataset #2708

jppgks · 2022-10-25T21:30:23Z

No description provided.

for more information, see https://pre-commit.ci

ludwig/datasets/loaders/dataset_loader.py

ludwig/datasets/kaggle.py

ludwig/datasets/loaders/hm_fashion_recommendations.py

dantreiman · 2022-10-25T22:34:54Z

ludwig/datasets/loaders/utils.py

+    return neg_items.tolist(), max(0, samples_required - available_samples)
+
+
+def negative_sample(


I think we're exceeding the scope of the datasets API by performing negative sampling here. The datasets API intent is to return standard benckmark datasets 'as-is' (or as close to the original as we can get).

IMO the sampling implementation belongs in ludwig.data, and should be performed as a separate phase after the dataset is loaded (also sampling has hyperparameters neg_pos_ratio and log_pct).

Is this possible, or would it be too inefficient to merge article and customer features prior to negative sampling?

I agree - removed the negative sampling from the loader and moved it to ludwig.data in this PR

ludwig/datasets/loaders/hm_fashion_recommendations.py

github-actions · 2022-10-25T22:49:50Z

Unit Test Results

        6 files ±  0         6 suites ±0 3h 29m 52s ⏱️ - 4m 11s
  3 504 tests ±  0   3 383 ✔️ ±  0   79 💤 ±  0 42 ❌ ±0
10 512 runs +56 10 213 ✔️ +43 257 💤 +13 42 ❌ ±0

For more details on these failures, see this check.

Results for commit 3f2ab35. ± Comparison against base commit 3504538.

♻️ This comment has been updated with latest results.

jppgks · 2022-10-26T10:30:39Z

I extracted the negative sampling into a separate PR

This reverts commit abfdc05.

jppgks added 4 commits October 25, 2022 21:32

allow individual file downloads from kaggle

d707ef6

pipe download_filenames to kaggle download fn

60f472c

add dataset config for H&M Fashion Recommendations

417fea1

add custom loader

51e14fe

jppgks force-pushed the hm-recs branch from bfa8404 to 51e14fe Compare October 25, 2022 21:32

jppgks requested a review from dantreiman October 25, 2022 21:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

386c70e

for more information, see https://pre-commit.ci

dantreiman requested changes Oct 25, 2022

View reviewed changes

jppgks added 3 commits October 26, 2022 10:09

use local backend instead of mock

8ebf370

add docstring for sample

83ccf97

fix titanic test

eb00421

move negative_sample to ludwig.data

df440cf

jppgks force-pushed the hm-recs branch from 2382d47 to df440cf Compare October 26, 2022 10:33

do not negative sample in loader

3f2ab35

jppgks requested a review from dantreiman October 26, 2022 11:19

dantreiman approved these changes Oct 26, 2022

View reviewed changes

jppgks merged commit abfdc05 into master Oct 26, 2022

jppgks deleted the hm-recs branch October 26, 2022 17:54

jppgks added a commit that referenced this pull request Nov 4, 2022

Revert "Add H&M fashion recommendation dataset (#2708)"

aec761b

This reverts commit abfdc05.

jppgks added a commit that referenced this pull request Nov 4, 2022

Revert "Add H&M fashion recommendation dataset (#2708)" (#2724)

9894c4c

This reverts commit abfdc05.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add H&M fashion recommendation dataset #2708

Add H&M fashion recommendation dataset #2708

jppgks commented Oct 25, 2022

dantreiman Oct 25, 2022

jppgks Oct 26, 2022

github-actions bot commented Oct 25, 2022 •

edited

Loading

jppgks commented Oct 26, 2022

		return neg_items.tolist(), max(0, samples_required - available_samples)


		def negative_sample(

Add H&M fashion recommendation dataset #2708

Add H&M fashion recommendation dataset #2708

Conversation

jppgks commented Oct 25, 2022

dantreiman Oct 25, 2022

Choose a reason for hiding this comment

jppgks Oct 26, 2022

Choose a reason for hiding this comment

github-actions bot commented Oct 25, 2022 • edited Loading

Unit Test Results

jppgks commented Oct 26, 2022

github-actions bot commented Oct 25, 2022 •

edited

Loading