Add Kaggle test splits #2675

abidwael · 2022-10-19T08:50:48Z

Adds missing test splits from a number of Kaggle datasets.

github-actions · 2022-10-19T09:51:31Z

Unit Test Results

        6 files ±  0         6 suites ±0 3h 46m 1s ⏱️ + 16m 56s
  3 504 tests ±  0   3 383 ✔️ ±  0   79 💤 ±  0 42 ❌ ±0
10 456 runs - 56 10 170 ✔️ - 43 244 💤 - 13 42 ❌ ±0

For more details on these failures, see this check.

Results for commit 5c27609. ± Comparison against base commit d365d84.

♻️ This comment has been updated with latest results.

dantreiman

I think we might want to define a different category for these "test" sets, because these are the test sets for submission to Kaggle and they don't have truth labels included. Thus we can't use them as a "test" split in Ludwig.

Note that train.csv has a loss column (representing the monetary loss from the insurance claim), while test.csv does not

dantreiman

I'm concerned that it might mislead users to call splits without a label column "test" -- since in Ludwig's process the test set is a held-out labeled set. We should probably to introduce a new category to support unlabeled data for inference or contest submissions.

ludwig/datasets/configs/mercedes_benz_greener.yaml

ludwig/datasets/configs/allstate_claims_severity.yaml

abidwael · 2022-10-20T17:07:39Z

Good catch @dantreiman! I removed the test files from some of the other datasets that had them. We can add a functionality that submits to Kaggle in the future.

Wael Abid added 2 commits October 18, 2022 17:27

add test set for otto_group_product

2cc4cb4

add test splits

5c27609

abidwael requested a review from dantreiman October 19, 2022 08:50

dantreiman requested changes Oct 19, 2022

View reviewed changes

ludwig/datasets/configs/mercedes_benz_greener.yaml Show resolved Hide resolved

ludwig/datasets/configs/allstate_claims_severity.yaml Show resolved Hide resolved

abidwael requested a review from dantreiman October 20, 2022 17:15

abidwael changed the title ~~Add test splits for tabular datasets~~ Remove Kaggle test splits Oct 20, 2022

abidwael force-pushed the add-benchmarking-datasets branch from 91e2e33 to 5c27609 Compare October 25, 2022 07:32

abidwael changed the title ~~Remove Kaggle test splits~~ Add Kaggle test splits Oct 25, 2022

dantreiman approved these changes Oct 25, 2022

View reviewed changes

abidwael merged commit d49b4d7 into master Oct 25, 2022

abidwael deleted the add-benchmarking-datasets branch October 25, 2022 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kaggle test splits #2675

Add Kaggle test splits #2675

abidwael commented Oct 19, 2022

github-actions bot commented Oct 19, 2022 •

edited

Loading

dantreiman left a comment

dantreiman left a comment

abidwael commented Oct 20, 2022 •

edited

Loading

Add Kaggle test splits #2675

Add Kaggle test splits #2675

Conversation

abidwael commented Oct 19, 2022

github-actions bot commented Oct 19, 2022 • edited Loading

Unit Test Results

dantreiman left a comment

Choose a reason for hiding this comment

dantreiman left a comment

Choose a reason for hiding this comment

abidwael commented Oct 20, 2022 • edited Loading

github-actions bot commented Oct 19, 2022 •

edited

Loading

abidwael commented Oct 20, 2022 •

edited

Loading