You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataset is split randomly into train, validation, and test, according to preprocessing.split_probabilities
Dataset is split according to a special metadata column in the data, split, with a fixed set of special values to do the association: 0: train, 1: validation, 2: test.
We should extend this API to enable users to customize splitting by other columns and values.
preprocessing:
split:
(moved) force_split: false
(moved) split_probabilities: [0.7, 0.1, 0.2]
(new) split_column: split # Name of column that should be used for splitting
(new) train_values: [0] # Values in the split_column that should be associated with the training split
(new) validation_values: [1] # Values in the split_column that should be associated with the validation split
(new) test_values: [2] # Values in the split_column that should be associated with the test split
Note: we may want to revisit this API if/when we support multiple test sets.
The text was updated successfully, but these errors were encountered:
Currently, Ludwig's dataset splitting requires either:
preprocessing.split_probabilities
split
, with a fixed set of special values to do the association:0
: train,1
: validation,2
: test.We should extend this API to enable users to customize splitting by other columns and values.
Note: we may want to revisit this API if/when we support multiple test sets.
The text was updated successfully, but these errors were encountered: