You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All dataset classes now have a preprocessor parameter which can be a path or object. Meaning that all manual preprocessor configs or paths are removed from the dataset configs.
Dataset configs now have a hf_load_kwargs to be passed to the datasets.load_dataset() .
Dataset config names are now supported either in hf_load_kwargs (hf_load_kwargs={"name": "fa"}) or by passing to path like Dataset.load("<path>:<config_name>")`.
Datasets now have the _load() method which is responsible for loading data files either from the hub or custom data reading.
Datasets now have a max_size in their config to overwrite the length of the dataset (all __len__ implementations are moved to the base class). This value can be a fraction too e.g, 0.3 means 30% of the original length.
Better LR scheduling implemented in the Trainer.
The docs have been rewritten or improved a lot!
Make log_steps and save_steps accept float values between 0 and 1 representing a fraction.