0.35.0
This is a big one! We made a lot of changes and improvements in Hezar.
Improvements
- Add support for
accelerate
for distributed training - Add resume from checkpoint feature to Trainer
- Improve saving/logging capabilities in Trainer
- Improve
print_info()
- Add
ImageCaptioningDataset
andImageCaptioningDataCollator
- Enhance padding in tokenizers
- Rewrite contribution docs
- Add tests workflow to actions
- Add
cache_dir
parameter to allload()
methods - Improve
OCRDataset
and bug fixes - Add training scripts for image captioning
- Add training script for CRNN training
- Clean
registry.py
- Change license from MIT to Apache 2.0
- Some improvements and bug fixes in
ViTRobertaImage2Text
- Bug fixes in tests
- Safe class var handling in configs
- Add
return_scores
toCRNNImage2Text
- Add
get_state_dict_from_hub
to support loading from any (non-Hezar) model on the Hub - Set default LR scheduler (reduce on plateau) to
Trainer
Bug fixes
- Fix image captioning decoding bug
- Fix mixed precision bug on CPU
- Fix embedding config bug
Deletions
- Delete empty models modules
- Remove all
Union
annotations and replace with|