-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kaggle test splits #2675
Add Kaggle test splits #2675
Conversation
Unit Test Results 6 files ± 0 6 suites ±0 3h 46m 1s ⏱️ + 16m 56s For more details on these failures, see this check. Results for commit 5c27609. ± Comparison against base commit d365d84. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to define a different category for these "test" sets, because these are the test sets for submission to Kaggle and they don't have truth labels included. Thus we can't use them as a "test" split in Ludwig.
Note that train.csv
has a loss
column (representing the monetary loss from the insurance claim), while test.csv
does not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that it might mislead users to call splits without a label column "test" -- since in Ludwig's process the test set is a held-out labeled set. We should probably to introduce a new category to support unlabeled data for inference or contest submissions.
Good catch @dantreiman! I removed the test files from some of the other datasets that had them. We can add a functionality that submits to Kaggle in the future. |
91e2e33
to
5c27609
Compare
Adds missing test splits from a number of Kaggle datasets.