fix(deps): update dependency transformers to v4.43.3 #1226
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
4.42.4
->4.43.3
Release Notes
huggingface/transformers (transformers)
v4.43.3
: Patch deepspeedCompare Source
Patch release v4.43.3:
We still saw some bugs so @zucchini-nlp added:
Other fixes:
v4.43.2
: : Patch releaseCompare Source
v4.43.1
: : Patch releaseCompare Source
v4.43.0
: : Llama 3.1, Chameleon, ZoeDepth, HieraCompare Source
Llama
The Llama 3.1 models are released by Meta and come in three flavours: 8B, 70B, and 405B.
To get an overview of Llama 3.1, please visit the Hugging Face announcement blog post.
We release a repository of llama recipes to showcase usage for inference, total and partial fine-tuning of the different variants.
Chameleon
The Chameleon model was proposed in Chameleon: Mixed-Modal Early-Fusion Foundation Models by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response.
ZoeDepth
The ZoeDepth model was proposed in ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the DPT framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier.
Hiera
Hiera was proposed in Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer
The paper introduces “Hiera,” a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed “bells-and-whistles,” are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.
Agents
Our ReactAgent has a specific way to return its final output: it calls the tool final_answer, added to the user-defined toolbox upon agent initialization, with the answer as the tool argument. We found that even for a one-shot agent like CodeAgent, using a specific final_answer tools helps the llm_engine find what to return: so we generalized the final_answer tool for all agents.
Now if your code-based agent (like ReactCodeAgent) defines a function at step 1, it will remember the function definition indefinitely. This means your agent can create its own tools for later re-use!
This is a transformative PR: it allows the agent to regularly run a specific step for planning its actions in advance. This gets activated if you set an int for planning_interval upon agent initialization. At step 0, a first plan will be done. At later steps (like steps 3, 6, 9 if you set planning_interval=3 ), this plan will be updated by the agent depending on the history of previous steps. More detail soon!
Notable changes to the codebase
A significant RoPE refactor was done to make it model agnostic and more easily adaptable to any architecture.
It is only applied to Llama for now but will be applied to all models using RoPE over the coming days.
Breaking changes
TextGenerationPipeline and tokenizer kwargs
🚨🚨 This PR changes the code to rely on the tokenizer's defaults when these flags are unset. This means some models using
TextGenerationPipeline
previously did not add a<bos>
by default, which (negatively) impacted their performance. In practice, this is a breaking change.Example of a script changed as a result of this PR:
Bugfixes and improvements
get_seq_length
method by @sanchit-gandhi in #31661keras-nlp<0.14
pin by @gante in #31684tets/test_xxx_utils.py
) totests/utils
by @ydshieh in #31730pytest_num_workers=4
for some CircleCI jobs by @ydshieh in #31764sdpa
support for SigLIP by @qubvel in #31499TFBlipModelTest::test_pipeline_image_to_text
by @ydshieh in #31827TrainingArguments
by @andstor in #31812vocab_size
in other two VLMs by @zucchini-nlp in #31681.generate()
by @voidism in #29619_init_weights
forResNetPreTrainedModel
by @ydshieh in #31851_init_weights
forResNetPreTrainedModel
" by @ydshieh in #31868duplicate
field definitions in some classes by @Sai-Suraj-27 in #31888push_to_hub=True
inTrainingArguments
by @SunMarc in #31808warnings
in awith
block to avoid flaky tests by @ydshieh in #31893ConvertSlow
] make sure the order is preserved for addedtokens by @ArthurZucker in #31902Gemma2
] Support FA2 softcapping by @ArthurZucker in #318871st argument
name in classmethods by @Sai-Suraj-27 in #31907SlidingWindowCache.reset()
by @gante in #31917Trainer.get_optimizer_cls_and_kwargs
to be overridden by @apoorvkh in #31875GenerationMixin.generate
compatibility with pytorch profiler by @fxmarty in #31935Cache
andcache_position
being default by @gante in #31898sigmoid_focal_loss()
function call by @Sai-Suraj-27 in #31951logits_warper
update in models with custom generate fn by @gante in #31957create_repo()
function call by @Sai-Suraj-27 in #31947test_stage3_nvme_offload
by @faaany in #31881src/transformers/__init__.py
by @Sai-Suraj-27 in #31993log messages
that are resulting in TypeError due to too many arguments by @Sai-Suraj-27 in #32017SeamlessM4Tv2ConformerEncoderLayer.forward()
when gradient checkpointing is enabled by @anferico in #31945sdpa
and FA2 for CLIP by @qubvel in #31940numpy<2.0
by @ydshieh in #32018head_dim
through config (and do not requirehead_dim * num_heads == hidden_size
) by @xenova in #32050duplicate entries
in a dictionary by @Sai-Suraj-27 in #32041huggingface_hub
0.24 by @Wauplin in #32054mktemp()
function by @Sai-Suraj-27 in #32123ko/_toctree.yml
and removecustom_tools.md
to reflect latest changes by @jungnerd in #31969TypeError
instead ofValueError
for invalid type by @Sai-Suraj-27 in #32111trust_remote_code
when loading Libri Dummy by @sanchit-gandhi in #31748GPTNeoX
andGPT2
by @vasqu in #31944Significant community contributions
The following contributors have made significant changes to the library over the last release:
.generate()
(#29619)Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.