Release 0.39.0 · hezarai/hezar

All dataset classes now have a preprocessor parameter which can be a path or object. Meaning that all manual preprocessor configs or paths are removed from the dataset configs.
Dataset configs now have a hf_load_kwargs to be passed to the datasets.load_dataset() .
Dataset config names are now supported either in hf_load_kwargs (hf_load_kwargs={"name": "fa"}) or by passing to path like Dataset.load("<path>:<config_name>")`.
Datasets now have the _load() method which is responsible for loading data files either from the hub or custom data reading.
Datasets now have a max_size in their config to overwrite the length of the dataset (all __len__ implementations are moved to the base class). This value can be a fraction too e.g, 0.3 means 30% of the original length.
Better LR scheduling implemented in the Trainer.
The docs have been rewritten or improved a lot!
Make log_steps and save_steps accept float values between 0 and 1 representing a fraction.
Other bug fixes and improvements.

Provide feedback