Skip to content

0.39.0

Compare
Choose a tag to compare
@github-actions github-actions released this 14 Jun 11:18

Main Changes

  • All dataset classes now have a preprocessor parameter which can be a path or object. Meaning that all manual preprocessor configs or paths are removed from the dataset configs.
  • Dataset configs now have a hf_load_kwargs to be passed to the datasets.load_dataset() .
  • Dataset config names are now supported either in hf_load_kwargs (hf_load_kwargs={"name": "fa"}) or by passing to path like Dataset.load("<path>:<config_name>")`.
  • Datasets now have the _load() method which is responsible for loading data files either from the hub or custom data reading.
  • Datasets now have a max_size in their config to overwrite the length of the dataset (all __len__ implementations are moved to the base class). This value can be a fraction too e.g, 0.3 means 30% of the original length.
  • Better LR scheduling implemented in the Trainer.
  • The docs have been rewritten or improved a lot!
  • Make log_steps and save_steps accept float values between 0 and 1 representing a fraction.
  • Other bug fixes and improvements.