-
-
Notifications
You must be signed in to change notification settings - Fork 43
Aligns training and testing data #33
base: master
Are you sure you want to change the base?
Conversation
I think that this is a good start. However I think we've seen cases where the divisions are the same and yet the number of rows in each partition still differ. I think that in that case we still raise a non-informative error. |
Thanks for the feedback @mrocklin! I've added a new |
I've also added some tests, but am running into issues with test failures. Some failures seem to be related to changes in this PR, while other failures are also in For example, running test_classifier traceback
While test_basic traceback
Any thoughts you may have here would be very appreciated |
It would be good to verify that we compute things only once, otherwise we may load and preprocess our data many times. In practice this can be annoying. There are currently two issues stopping this:
We have to persist the data in memory in the Generally I find things like this by trying them out on a small problem and watching the diagnostic dashboard. |
This PR is to ensure that training and testing data have balance partitions
Closes #32