You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During creation it checks if function has only 2 positional arguments. For shuffling to be used it should also accept a third argument, seed or seeds. Otherwise an exception is thrown when trying to pass shuffle=True to get_dataset().
The text was updated successfully, but these errors were encountered:
marton-avrios
changed the title
FunctionDataSource does not allow function with 3 positional arguments shuffle does not work
FunctionDataSource does not allow function with 3 positional arguments thus shuffling does not work
Aug 23, 2022
Hi @marton-avrios , this is working as intended iiuc. The first validation checks that the dataset_fn should have at least the split and shuffle_files as args, since seed may not be an arg. The second validation is triggered only when the user passes a seed when loading the dataset, meaning that the dataset_fn is expected to have a "seed" arg. "seeds" isn't supported. If your dataset_fn needs multiple seeds, then you can create new ones from the initial seed,
Maybe the fn needs a better name and documentation, but it doesn't validate that the fn has exactly the same args as expected_pos_args, but that the first len(expected_pos_args) are exactly the same
During creation it checks if function has only 2 positional arguments. For shuffling to be used it should also accept a third argument,
seed
orseeds
. Otherwise an exception is thrown when trying to passshuffle=True
toget_dataset()
.seqio/seqio/dataset_providers.py
Line 341 in 71e47ac
Also it only allows
seed
and notseeds
later. But this never comes into effect since the whole things fails during creation.seqio/seqio/dataset_providers.py
Line 373 in 71e47ac
The text was updated successfully, but these errors were encountered: