v4.8.0 Integration with the Hub and Flax/JAX support
v4.8.0 Integration with the Hub and Flax/JAX support
Integration with the Hub
Our example scripts and Trainer are now optimized for publishing your model on the Hugging Face Hub, with Tensorboard training metrics, and an automatically authored model card which contains all the relevant metadata, including evaluation results.
Trainer Hub integration
Use --push_to_hub to create a model repo for your training and it will be saved with all relevant metadata at the end of the training.
Other flags are:
push_to_hub_model_id
to control the repo namepush_to_hub_organization
to specify an organization
Visualizing Training metrics on huggingface.co (based on Tensorboard)
By default if you have tensorboard
installed the training scripts will use it to log, and the logging traces folder is conveniently located inside your model output directory, so you can push them to your model repo by default.
Any model repo that contains Tensorboard traces will spawn a Tensorboard server:
which makes it very convenient to see how the training went! This Hub feature is in Beta so let us know if anything looks weird :)
See this model repo
Model card generation
The model card contains info about the datasets used, the eval results, ...
Many users were already adding their eval results to their model cards in markdown format, but this is a more structured way of adding them which will make it easier to parse and e.g. represent in leaderboards such as the ones on Papers With Code!
We use a format specified in collaboration with [PaperswithCode] (https://github.com/huggingface/huggingface_hub/blame/main/modelcard.md), see also this repo.
Model, tokenizer and configurations
All models, tokenizers and configurations having a revamp push_to_hub()
method as well as a push_to_hub
argument in their save_pretrained()
method. The workflow of this method is changed a bit to be more like git, with a local clone of the repo in a folder of the working directory, to make it easier to apply patches (use use_temp_dir=True
to clone in temporary folders for the same behavior as the experimental API).
Flax/JAX support
Flax/JAX is becoming a fully supported backend of the Transformers library with more models having an implementation in it. BART, CLIP and T5 join the already existing models, find the whole list here.
- [Flax] FlaxAutoModelForSeq2SeqLM #12228 (@patil-suraj)
- [FlaxBart] few small fixes #12247 (@patil-suraj)
- [FlaxClip] fix test from/save pretrained test #12284 (@patil-suraj)
- [Flax] [WIP] allow loading head model with base model weights #12255 (@patil-suraj)
- [Flax] Fix flax test save pretrained #12256 (@patrickvonplaten)
- [Flax] Add jax flax to env command #12251 (@patrickvonplaten)
- add FlaxAutoModelForImageClassification in main init #12298 (@patil-suraj)
- Flax T5 #12150 (@vasudevgupta7)
- [Flax T5] Fix weight initialization and fix docs #12327 (@patrickvonplaten)
- Flax summarization script #12230 (@patil-suraj)
- FlaxBartPretrainedModel -> FlaxBartPreTrainedModel #12313 (@sgugger)
General improvements and bug fixes
- AutoTokenizer: infer the class from the tokenizer config if possible #12208 (@sgugger)
- update desc for map in all examples #12226 (@bhavitvyamalik)
- Depreciate pythonic Mish and support PyTorch 1.9 version of Mish #12240 (@digantamisra98)
- [t5 doc] make the example work out of the box #12239 (@stas00)
- Better CI feedback #12279 (@LysandreJik)
- Fix for making student ProphetNet for Seq2Seq Distillation #12130 (@vishal-burman)
- [DeepSpeed] don't ignore --adafactor #12257 (@stas00)
- Tensorflow QA example #12252 (@Rocketknight1)
- [tests] reset report_to to none, avoid deprecation warning #12293 (@stas00)
- [trainer + examples] set log level from CLI #12276 (@stas00)
- [tests] multiple improvements #12294 (@stas00)
- Trainer: adjust wandb installation example #12291 (@stefan-it)
- Fix and improve documentation for LEDForConditionalGeneration #12303 (@ionicsolutions)
- [Flax] Main doc for event orga #12305 (@patrickvonplaten)
- [trainer] 2 bug fixes and a rename #12309 (@stas00)
- [docs] performance #12258 (@stas00)
- Add CodeCarbon Integration #12304 (@JetRunner)
- Optimizing away the
fill-mask
pipeline. #12113 (@Narsil) - Add output in a dictionary for TF
generate
method #12139 (@stancld) - Rewrite ProphetNet to adapt converting ONNX friendly #11981 (@jiafatom)
- Add mention of the huggingface_hub methods for offline mode #12320 (@LysandreJik)
- [Flax/JAX] Add how to propose projects markdown #12311 (@patrickvonplaten)
- [TFWav2Vec2] Fix docs #12283 (@chenht2010)
- Add all XxxPreTrainedModel to the main init #12314 (@sgugger)
- Conda build #12323 (@LysandreJik)
- Changed modeling_fx_utils.py to utils/fx.py for clarity #12326 (@michaelbenayoun)