Based on transformers v3.5.1
New
- Modular & custom prediction heads for flex head models (@hSterz via #88)
Fixed
- Fixes for DistilBERT layer norm and AdapterFusion (@calpt via #102)
- Fix for reloading full models with AdapterFusion (@calpt via #110)
- Fix attention and logits output for flex head models (@calpt via #103 & #111)
- Fix loss output of flex model with QA head (@hSterz via #88)