-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation #40837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
207 commits
Select commit
Hold shift + click to select a range
a855b64
init
ErfanBaghaei 2109ccf
added TopH
ErfanBaghaei 9902115
Update TopH logits_process.py
ErfanBaghaei 519d675
Update logits_process.py
ErfanBaghaei d56a261
Update test_logits_process.py
ErfanBaghaei cd30869
Update test_logits_process.py
ArminAzizi98 c7c1472
added test No. 4
ErfanBaghaei febfc04
Merge branch 'main' into Top-H-Decoding
ErfanBaghaei 91bc1b7
Resolving __init__.py issues
ErfanBaghaei 009aa73
Resolving configuration_utils.py Issues
ErfanBaghaei 872bd47
Resolving logits_process.py Issues
ErfanBaghaei 2054fb6
Resolving utils.py Issues
ErfanBaghaei 5bc900d
Resolving test_logits_process.py Issues
ErfanBaghaei 768bda6
Resolving __init__.py issues
ErfanBaghaei d843f1c
Resolving logits_process.py Issues
ErfanBaghaei 290f97d
Resolving __init__.py issues
ErfanBaghaei 2b785ad
Updated Docs
ErfanBaghaei f35b6ce
Updated Docstring
ErfanBaghaei a566561
style: autoformat with make fixup
ErfanBaghaei 49a611d
Fixing Docstring
ErfanBaghaei 3fb3a87
Update logits_process.py removed defaults
ErfanBaghaei 4917572
Variable H name -> cumulative_entropy
ErfanBaghaei 11ef0a2
Using torch.distributions.Categorical
ErfanBaghaei 90a3d94
Improve torch_dtype checks (#40808)
cyyever 2db9152
Add VideoProcessors to auto-backend requirements (#40843)
Cyrilvallez e71afc5
Adds Causal Conv 1D kernel for mamba models (#40765)
MekkCyber c19ca3e
Update no split modules in T5Gemma model (#40810)
npuichigo 1817410
Replace image classification loss functions to `self.loss_function` (…
qubvel fb2795e
Fix the misalignment between the l2norm in GDN of Qwen3-Next and the …
bozheng-hit a300d04
Fixes for continuous batching (#40828)
remi-or d5ab59f
[tests] re-enable aria fast tests (#40846)
gante e25fcbf
[SAM2] Fix inconsistent results with original implementation with inp…
yonigozlan 1e83816
[Sam2Video] Fix video inference with batched boxes and add test (#40797)
yonigozlan 55d3458
add: differential privacy research model (#40851)
RyanMullins 1814fa6
[test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
eustlb d78b3a9
[tests] move generative tests away from `test_modeling_common.py` (#4…
gante a790005
[generate] Always use decoder config to init cache (#40772)
gante e135660
Use checkpoint in auto_class_docstring (#40844)
cyyever c274330
Fix TrainingArguments.parallelism_config NameError with accelerate<1.…
albertvillanova 33c51a8
Redirect MI355 CI results to dummy dataset (#40862)
ahadnagy c8416c8
[Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
greg-kwasniewski1 6cf9c59
[docstrings / type hints] Update outdated annotations for `past_key_v…
gante 23e87bb
fix florence kwargs (#40826)
SunMarc 7e29410
fix: XIELU act parameters not being casted to correct dtype (#40812)
NanoCode012 bb5b768
Update model tags and integration references in bug report (#40881)
ArthurZucker d69d754
[Qwen3 Next] Use numerically stable `rsqrt` (#40848)
thalahors 4d3d07f
Adding Support for Qwen3-VL Series (#40795)
JJJYmmm f8b3311
[`VaultGemma`] Update expectations in integration tests (#40855)
vasqu dd64685
Fix modular consistency (#40883)
Cyrilvallez d8a69ff
🔴 Move variable output controls to `_prepare_generation_config ` (#40…
manueldeprada 777b559
Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
cyyever b13c6d8
Use torch.expm1 and torch.log1p for better numerical results (#40860)
cyyever 332286f
Add Fast PromptDepthAnything Processor (#40602)
SamuelBarryCS a4d417c
Fix deta loading & dataclass (#40878)
Cyrilvallez 493dd21
Remove dict branch of attention_mask in sdpa_attention_paged_forward …
cyyever aa8fea4
🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
HyunZ118 cbe9f2e
🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)
HyunZ118 23772dc
[generate] remove docs of a feature that no longer exists (#40895)
gante b4d7f5f
Make debugging failing tests (check and update expect output values) …
ydshieh 60c9553
Fixing the call to kernelize (#40628)
MekkCyber 294ec23
Fix getter regression (#40824)
molbap dcb52bf
Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)
ydshieh b947b60
[cache] Merge static sliding and static chunked layer (#40893)
Cyrilvallez f096c5b
Harmonize CacheLayer names (#40892)
Cyrilvallez 15d5f49
[cache] Only use scalars in `get_mask_sizes` (#40907)
Cyrilvallez 288352d
Set seed for `Glm4vIntegrationTest` (#40905)
ydshieh a418ac8
Add Olmo3 model (#40778)
2015aroras 8534f2d
remove dummy EncodingFast (#40864)
cyyever 1f0df5f
Improve module name handling for local custom code (#40809)
XuehaiPan b067650
Remove `runner_map` (#40880)
ydshieh 5c7684e
disable `test_fast_is_faster_than_slow` (#40909)
ydshieh f8fb8a5
[gemma3] `Gemma3ForConditionalGeneration` compatible with assisted ge…
gante 4248a67
[generate] misc fixes (#40906)
gante cc4f313
🔴Make `center_crop` fast equivalent to slow (#40856)
yonigozlan c689f16
Fix dtype in Paligemma (#40912)
zucchini-nlp cf7356b
[Docs] Adding documentation of MXFP4 Quantization (#40885)
ariG23498 053228a
Processor load with multi-processing (#40786)
zucchini-nlp 030af75
[Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer…
yaswanth19 3e2e555
Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokeni…
akshay-babbar 1575c03
[torchao safetensors] renaming get_state_dict function (#40774)
liangel-02 901e5d7
Adding activation kernels (#40890)
MekkCyber 8b942de
Minor fix for #40727 (#40929)
ydshieh 1935c22
Add support for Florence-2 training (#40914)
ducviet00 2e287d1
Add LongCat-Flash (#40730)
molbap 9baa3d6
[DOC] Add missing dates in model cards (#40922)
yonigozlan dccd2df
[models] remove unused `import torch.utils.checkpoint` (#40934)
gante da501ec
Intel CPU dockerfile (#40806)
jiqing-feng 385aeb6
docs(i18n): Correct the descriptive text in the README_zh-hans.md (#4…
lilin-1 e7a14d9
Fix trainer tests (#40823)
SunMarc d8d78c6
Fix `Glm4vMoeIntegrationTest` (#40930)
ydshieh f0150ad
Raise error instead of warning when using meta device in from_pretrai…
Cyrilvallez 8b8b353
Consistent naming for images kwargs (#40834)
zucchini-nlp 5301d16
Remove nested import logic for torchvision (#40940)
yonigozlan b5cbfd5
Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)
ydshieh b8207cb
Update expected values for some `test_speculative_generation` (#40949)
ydshieh f962aaf
Standardize audio embedding function name for audio multimodal models…
jackzhxng cd1a661
Add FlexOlmo model (#40921)
2015aroras 3ab94a1
Don't list dropout in eager_paged_attention_forward (#40924)
cyyever 9f65eab
Update expected values for one more `test_speculative_generation` aft…
ydshieh 4a5f348
FIX(trainer): ensure final checkpoint is saved when resuming training…
rangehow b38d52a
Add new model LFM2-VL (#40624)
zucchini-nlp 4d4932e
Fix outdated version checks of accelerator (#40969)
cyyever 9104de8
Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)
hamishs b9ad602
[Trainer] Fix DP loss (#40799)
SunMarc 55e48bf
[timm_wrapper] better handling of "Unknown model" exception in timm (…
harshaljanjani ca8eed3
Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate to…
brandenkmurray 3373554
[tests] Really use small models in all fast tests (#40945)
Cyrilvallez 1e8b8d3
Add captured actual outputs to CI artifacts (#40965)
ydshieh e5da669
Revert change in `compile_friendly_resize` (#40645)
qubvel 740ff67
Track the CI (model) jobs that don't produce test output files (proce…
ydshieh c9b01c3
Using torch.distributions.Categorical
ErfanBaghaei 345c86a
Remove `set_model_tester_for_less_flaky_tests` (#40982)
Cyrilvallez b16d054
Benchmarking v2 GH workflows (#40716)
ahadnagy e0fb372
🔴[`Attention`] Bert-based Models Attention Refactor (#38301)
vasqu 0dbfde2
Remove [[autodoc]] refs to TF/Flax objects (#40996)
Cyrilvallez 46922b3
ENH: Enable readline support for transformers chat (#40911)
BenjaminBossan dbc0952
[testing] test `num_hidden_layers` being small in model tester (#40992)
ydshieh 17be25b
blt wip (#38579)
itazap 4a17be0
[docs] rm stray tf/flax autodocs references (#40999)
gante e08f64c
[`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)
vasqu 40dcb51
Make `EfficientLoFTRModelTest` faster (#41000)
ydshieh 85702fd
Fix typoes in src and tests (#40845)
cyyever d471b2e
Fix more dates in model cards and wrong modalities in _toctree.yml (#…
yonigozlan ae88512
RUFF fix on CI scripts (#40805)
cyyever c52a158
fix dict like init for ModelOutput (#41002)
SunMarc 425b2b4
🚨 [v5] remove generate output retrocompatibility aliases (#40998)
gante 4e05e80
[tests] update `test_left_padding_compatibility` (and minimize overwr…
gante 387fb9a
Patch more `unittest.case.TestCase.assertXXX` methods (#41008)
ydshieh e1c13bc
🚨 [v5] remove deprecated entry point (#40997)
gante 9896a3f
🚨 [lightglue] fix: matches order changed because of early stopped ind…
sbucaille b16b156
Fix `PhimoeIntegrationTest` (#41007)
ydshieh 002d853
Fix Glm4v test (#41011)
Cyrilvallez 0f598ff
Update after #41007 (#41014)
ydshieh 00aa6c7
Fix benchmark runner argument name (#41012)
ahadnagy ceefb54
Adding support for Qwen3Omni (#41025)
BakerBunker 2f2d193
Making compute_loss_func always take priority in Trainer (#40632)
Flakes342 21031f5
Modify Qwen3Omni parameter name since VL changed it (#41045)
BakerBunker 17f5a92
Fix Qwen video tests (#41049)
zucchini-nlp 2e07406
[testing] Fix `qwen2_audio` (#41018)
ydshieh 73f6379
Fix typing of tuples (#41028)
cyyever a945d26
Remove optax (#41030)
cyyever 755a1e5
Fix typos in English/Chinese documentation (#41031)
cyyever 586c487
Use torch.autocast (#40975)
cyyever 0cfc691
docs: improved RoPE function Docstrings (#41004)
RyanMullins e832420
Fix condition for emitting warning when generation exceeds max model …
yannicks1 8b26d9f
Fix outdated torch version check (#40925)
cyyever e5e269e
Remove doc of tf and flax (#41029)
cyyever 0bedf8a
Add Whole Word Masking and Padding Strategy to DataCollatorForLanguag…
rjgleaton 54810d7
[testing] Fix `seed_oss` (#41052)
ydshieh 973b3fc
Remove repeated import (#40937)
cyyever c036a71
Simplify unnecessary Optional typing (#40839)
cyyever a062de7
Add write token for uploading benchmark results to the Hub (#41047)
ahadnagy edf22db
Ci utils (#40978)
remi-or 126962e
Remove <frameworkcontent> and <pt> tags from documentation (#41055)
cyyever fa3c2d7
Fix CI jobs being all red 🔴 (false positive) (#41059)
ydshieh 7d90855
Update quantization CI (#41068)
SunMarc 0f21b54
[i18n-bn] Add Bengali language README file (#40935)
saidurpulok f84f441
Improve documentation and errors in Mamba2-based models (#41063)
mapmeld b4f0c46
Update team member list for some CI workflows (#41094)
ydshieh 9b9fb23
fix crash when using chat to send 2+ request to gptoss (#40536)
sywangyi 6a8b33a
Minor addition, no split modules for VideoMAEE (#41051)
DuyguA 33aaccc
Switch to `python:3.10-slim` for CircleCI docker images (#41067)
ydshieh 8115fbd
Fix argument name in benchmarking script (#41086)
ahadnagy eb22858
Remove mention of TensorFlow/Flax/JAX from English documentation (#41…
cyyever 6c08b04
Fix typos in documentation (#41087)
cyyever 71a8ad0
Fix typing (#40788)
cyyever 6766e81
Remove unused arguments (#40916)
cyyever cd36b9b
Remove tf and flax from Chinese documentation (#41057)
cyyever f82b096
fix wrong height and width when read video use torchvision (#41091)
Juude 824415f
docs: Fix Tool Use links and remove dead RAG links (#41104)
RyanMullins 6a94124
🚨 [generate] update paligemma mask updates (and other assisted genera…
gante 78c6f7a
[tests] gpt2 + `CausalLMModelTester` (#41003)
gante 384b671
Fix `_get_test_info` for inherited tests (#41106)
ydshieh fe09b8a
Remove bad test skips (#41109)
Cyrilvallez e1b55ff
Format empty lines and white space in markdown files. (#41100)
cyyever 2dd5e73
Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
cyyever e450e0d
🚨 [V5] Remove deprecated training arguments (#41017)
cyyever 20a4c45
Support loading LFM2 GGUF (#41111)
HaroldBenoit f0b7d24
[torchao safetensors] integrate torchao safetensors support with tran…
liangel-02 34fd896
[Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule a…
notkisk ffa6a76
Fix the error where a keyword argument appearing before *args (#41099)
cyyever 6558e75
Fix broken `` expressions in markdown files (#41113)
cyyever 0ab9d77
Remove self-assignment (#41062)
cyyever 7d70f39
🚨Refactor: Update text2text generation pipelines to use max_new_token…
lilin-1 13f9a7d
Fixed MXFP4 model storage issue (#41118)
YangKai0616 0f312b2
Fixed loading LongT5 from legacy checkpoints (#40724)
Szustarol 295cf0b
dummy commit (#41133)
ydshieh 212e827
Fix loading logic flaw with regards to unexpected and missing keys (#…
LysandreJik 18941ba
Using torch.distributions.Categorical
ErfanBaghaei 94336c5
Resolving logits_process.py Issues
ErfanBaghaei 643d9c2
style: autoformat with make fixup
ErfanBaghaei 2cc41c6
Update logits_process.py removed defaults
ErfanBaghaei 5255a72
Variable H name -> cumulative_entropy
ErfanBaghaei 75c809c
Merge branch 'main' into Top-H-Decoding
ErfanBaghaei 70214c1
Resolving format error
ErfanBaghaei 9dad329
Correction of the loop variables in logit processor
ErfanBaghaei bf23aef
Vectorized the loop in logits_process
ErfanBaghaei 5829189
formatted logits_process
ErfanBaghaei cd9f22e
paper reference and stopping rule comment logits_process
ErfanBaghaei 116c55d
Trigger CI rerun
ErfanBaghaei 6b3eea3
Update logits_process.py
ArminAzizi98 0ebb99d
added test_TopH_example_integration
ErfanBaghaei f4ea5e4
added test_TopH_example_integration
ErfanBaghaei 5e7a92d
Update README.md
souvikku 0c83d0e
Restore CI config to match main (remove accidental changes)
ErfanBaghaei aa15f5d
Restore CI config to match upstream main (no diffs)
ErfanBaghaei 0d977a9
Merge branch 'main' into Top-H-Decoding
ErfanBaghaei File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.