Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Temporary] Add compressed-tensors HFQuantizer implementation #101

Open
wants to merge 936 commits into
base: upstream-a564d10af
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
936 commits
Select commit Hold shift + click to select a range
19e6e80
support qwen2-vl (#32318)
simonJJJ Aug 26, 2024
93e0e1a
CI: add torchvision to the consistency image (#32941)
gante Aug 26, 2024
894d421
Test: add higher `atol` in `test_forward_with_num_logits_to_keep` (#3…
gante Aug 26, 2024
72d4a3f
mps: add `isin_mps_friendly`, a wrapper function for `torch.isin` (#3…
gante Aug 26, 2024
a378a54
Add changes for uroman package to handle non-Roman characters (#32404)
nandwalritik Aug 26, 2024
3562772
fix: Fixed `pydantic` required version in dockerfiles to make it comp…
Sai-Suraj-27 Aug 26, 2024
26f043b
quickfix documentation (#32566)
molbap Aug 26, 2024
9578c25
Fixup py 38 type hints for mps friendly (#33128)
muellerzr Aug 26, 2024
3bf6dd8
fix: Fixed CodeGenTokenizationTest::test_truncation failing test (#32…
Sai-Suraj-27 Aug 27, 2024
7562366
fix: multilingual midel convert to tflite get wrong token (#32079)
Ayaa17 Aug 27, 2024
3806faa
disable scheduled daily CI temporarily (#33136)
ydshieh Aug 27, 2024
ab0ac3b
CI: fix `efficientnet` pipeline timeout and prevent future similar is…
gante Aug 27, 2024
746e114
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/jax-pr…
dependabot[bot] Aug 27, 2024
892d51c
Log additional test metrics with the CometCallback (#33124)
Lothiraldan Aug 27, 2024
6f0ecf1
[docs] add quick usage snippet to Whisper. (#31289)
Vaibhavs10 Aug 27, 2024
d1f39c4
Update stateful_callbacks state before saving checkpoint (#32115)
pedrobrs Aug 27, 2024
834ec7b
fix Idefics2VisionConfig type annotation (#33103)
chenzizhao Aug 27, 2024
9956c2b
Add a fix for custom code tokenizers in pipelines (#32300)
Rocketknight1 Aug 27, 2024
c6b23fd
Llama: make slow tests green 🟢 (#33138)
gante Aug 27, 2024
d47a9e8
fix redundant checkpointing in example training scripts (#33131)
eminorhan Aug 27, 2024
7ee4363
update torch req for 4-bit optimizer (#33144)
SunMarc Aug 27, 2024
6101d93
🌐 [i18n-KO] Translated `conversations.md` to Korean (#32468)
newfull5 Aug 27, 2024
27903de
Very small change to one of the function parameters (#32548)
alisalamatian1 Aug 27, 2024
7591ca5
🚨 Add Blip2ForImageTextRetrieval (#29261)
jpizarrom Aug 27, 2024
c35d2cc
Granite language models (#31502)
mayank31398 Aug 27, 2024
386931d
fix model name and copyright (#33152)
mayank31398 Aug 28, 2024
3bfd3e4
Fix: Jamba batched generation (#32914)
vasqu Aug 28, 2024
e0b87b0
[whisper] pass attention_mask to generate_with_fallback() (#33145)
benniekiss Aug 28, 2024
f1a385b
[RoBERTa-based] Add support for sdpa (#30510)
hackyon Aug 28, 2024
f9ed05d
Fix import paths for test_module (#32888)
rasmi Aug 28, 2024
f4c86d0
Zero-shot pipelines: minor doc changes (#33127)
pcuenca Aug 28, 2024
5c84682
Customise the separator used for splicing in DataCollatorWithFlatteni…
beep-bebop Aug 28, 2024
74e19e8
Fix spell mistakes (#33149)
matsuo1234567 Aug 28, 2024
3d79dcb
update push CI workflow files for security (#33142)
ydshieh Aug 28, 2024
5c1027b
added quick clarification (#33166)
DuyguA Aug 28, 2024
39bfb2f
pass module to Params4bit.from_prequantized to ensure quant_state (#3…
winglian Aug 29, 2024
92a75ff
Mamba2 conversion script for original models (#32580)
vasqu Aug 29, 2024
5129671
Add a static cache that offloads to the CPU or other device (#32161)
gerbenvv Aug 29, 2024
c409cd8
use a single for loop (#33148)
ArthurZucker Aug 29, 2024
b127fb8
Pipeline: fix bad generation kwargs docs (#33205)
gante Aug 30, 2024
4987463
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/codepa…
dependabot[bot] Aug 30, 2024
9a6956b
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/decisi…
dependabot[bot] Aug 30, 2024
e259d6d
Add missing quotes in modeling_llava_next_video.py (#33214)
juliendenize Aug 30, 2024
fbff276
Add warning for stop string edge case (#33169)
Rocketknight1 Aug 30, 2024
38d58a4
Fix local repos with remote code not registering for pipelines (#33100)
Rocketknight1 Aug 30, 2024
b017a9e
Refactor CI: more explicit (#30674)
ArthurZucker Aug 30, 2024
c79bfc7
Create local Transformers Engine (#33218)
aymeric-roucher Aug 30, 2024
db70426
🌐 [i18n-KO] Translated `llm_optims.md` to Korean (#32325)
yijun-lee Aug 30, 2024
51e6526
Fix red amin (#33220)
ArthurZucker Aug 30, 2024
ea9e927
run_compressed compatability
Aug 30, 2024
746104b
Test fetcher: missing return on filtered tests; don't write empty fil…
gante Aug 30, 2024
eb5b968
Generate: throw warning when `return_dict_in_generate` is False but s…
gante Aug 31, 2024
2e3f8f7
Add video text to text docs (#33164)
merveenoyan Sep 1, 2024
b9bc691
Add GraniteRMSNorm (#33177)
NielsRogge Sep 2, 2024
1ca9ff5
Add duckduckgo search tool (#32882)
aymeric-roucher Sep 2, 2024
409fcfd
Fix: Suppressed 'use_reentrant=False' warning (#33208)
ankush13r Sep 2, 2024
963ed98
docs: Replace package abbreviations with full name(`bitsandbytes`) in…
rapsealk Sep 2, 2024
2d37085
Bump opencv-python from 4.4.0.42 to 4.8.1.78 in /examples/research_pr…
dependabot[bot] Sep 2, 2024
52a0213
Add assistant prefill for chat templates and TextGenerationPipeline (…
Rocketknight1 Sep 2, 2024
97c0f45
Generate: fix assistant in different device (#33257)
gante Sep 2, 2024
9ea1eac
remove to restriction for 4-bit model (#33122)
SunMarc Sep 2, 2024
2895224
Fixed typo repeated word in DETR docs (#33250)
sergiopaniego Sep 2, 2024
cff06aa
Fix: use `torch.from_numpy()` to create tensors for np.ndarrays (#33201)
shinyano Sep 2, 2024
5663026
remove torch input dependant control flow (#33245)
ArthurZucker Sep 3, 2024
7ed9789
Fix: `num_logits_to_keep` in composite models (#33168)
zucchini-nlp Sep 3, 2024
979f477
Fix Bark saving (#33266)
ylacombe Sep 3, 2024
edeca43
🚨 Support dequantization for most GGML types (#32625)
Isotr0py Sep 3, 2024
0d86727
Update chat template docs to remove Blenderbot (#33254)
Rocketknight1 Sep 3, 2024
e969d88
Bump opencv-python from 4.4.0.42 to 4.8.1.78 in /examples/research_pr…
dependabot[bot] Sep 3, 2024
03c12d0
Add sdpa support for Albert (#32092)
OmarManzoor Sep 3, 2024
6b7d64a
Only disallow DeepSpeed Zero-3 for auto bs finder (#31731)
muellerzr Sep 3, 2024
979d24e
fix the parallel number of CI nodes when it is smaller than number of…
ArthurZucker Sep 3, 2024
d6534f9
Repo checks: check documented methods exist (#32320)
gante Sep 3, 2024
ecd61c6
Add OLMoE (#32406)
Muennighoff Sep 3, 2024
1c3ad5c
revert changes not needed for compression
Sep 3, 2024
aa1a4f9
no longer need unexpected keys fn
Sep 3, 2024
81a13dd
unexpected keys not needed either
Sep 3, 2024
35f72eb
Fix: multigpu training (#33271)
zucchini-nlp Sep 4, 2024
ebbe8d8
Cache docs: update (#32929)
zucchini-nlp Sep 4, 2024
d750b50
Config: unified logic to retrieve text config (#33219)
gante Sep 4, 2024
d703477
[fix] LlavaNextProcessor '_get_unpadded_features' method (#33263)
laurentd-lunit Sep 4, 2024
178cb6b
wait 15m before SSH into runner workflow stops (#33300)
ydshieh Sep 4, 2024
122ded0
Bugfix/alexsherstinsky/fix none check for attention factor in rope sc…
alexsherstinsky Sep 4, 2024
5731dc8
Bump cryptography from 42.0.0 to 43.0.1 in /examples/research_project…
dependabot[bot] Sep 4, 2024
d2dcff9
[InstructBLIP] qformer_tokenizer is required input (#33222)
amyeroberts Sep 4, 2024
2cb543d
Multi agents with manager (#32687)
aymeric-roucher Sep 4, 2024
01c8c6c
Add a warning to the chat template docs about the tool_calls format (…
Rocketknight1 Sep 4, 2024
cfd92c6
Add new documentation page for advanced agent usage (#33265)
aymeric-roucher Sep 4, 2024
a1faf22
[BUG] fix upper nltk version (#33301)
ylacombe Sep 4, 2024
b390998
Fix excessive CPU memory usage with FSDP and cpu_ram_efficient_loadin…
matthewdouglas Sep 4, 2024
9230d78
Add validate images and text inputs order util for processors and tes…
yonigozlan Sep 4, 2024
43df47d
Llava Onevision: add model (#32673)
zucchini-nlp Sep 5, 2024
47b0964
Fix: Fix `FalconMamba` training issues due to incompatible kernels (#…
younesbelkada Sep 5, 2024
03164ba
Add paper link (#33305)
Muennighoff Sep 5, 2024
c6d2848
🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision …
xenova Sep 5, 2024
132e875
Update SECURITY.md (#32680)
Michellehbn Sep 5, 2024
5d11de4
Add Qwen2Moe GGUF loading support (#33264)
VladOS95-cyber Sep 5, 2024
21fac7a
simple align qwen2vl kv_seq_len calculation with qwen2 (#33161)
simonJJJ Sep 5, 2024
5792c45
Add a community notebook for fine-tuning with QLoRA, PEFT, and MLflow…
daniellok-db Sep 6, 2024
1759bb9
Fix: StaticCache & `inputs_embeds` (#32932)
zucchini-nlp Sep 6, 2024
2b789f2
Docs: add more cross-references to the KV cache docs (#33323)
gante Sep 6, 2024
51d15eb
[whisper] alternative fix for long-form timestamps (#32131)
sanchit-gandhi Sep 6, 2024
1bd9d1c
fix qwen2vl vision eager-attention (#33213)
simonJJJ Sep 6, 2024
e1c2b69
Load dynamic module (remote code) only once if code isn't change (#33…
XuehaiPan Sep 6, 2024
363301f
support loading model without config.json file (#32356)
itazap Sep 6, 2024
3314fe1
Add validation for maximum sequence length in modeling_whisper.py (#3…
AmirMohammadFakhimi Sep 6, 2024
2b18354
add self.head_dim for VisionAttention in Qwen2-VL (#33211)
GeLee-Q Sep 6, 2024
342e800
support 3D attention mask in bert (#32105)
gathierry Sep 6, 2024
e48e5f1
Support reading tiktoken tokenizer.model file (#31656)
itazap Sep 6, 2024
2d75700
red-ci on main, fix copies (#33356)
ArthurZucker Sep 6, 2024
6ff6069
RoPE: fix BC warning (#33331)
gante Sep 6, 2024
d7b04ea
Fix Prefill docs (#33352)
Rocketknight1 Sep 6, 2024
a70286f
Update author for QLorA/PEFT community notebook (#33338)
daniellok-db Sep 6, 2024
66bc4de
add sdpa mbart (#32033)
nbroad1881 Sep 7, 2024
60226fd
Fix quantized cache tests (#33351)
zucchini-nlp Sep 9, 2024
62aecd8
schedulefree optimizers (#30079)
winglian Sep 9, 2024
489cbfd
Add visit webpage tool (#33353)
aymeric-roucher Sep 9, 2024
eedd21b
Fixed Majority of the Typos in `transformers[en]` Documentation (#33350)
nnilayy Sep 9, 2024
65bb284
Compile compatibilty for decoder-only models (#32617)
zucchini-nlp Sep 9, 2024
0574fa6
Adjust templates (#33384)
LysandreJik Sep 9, 2024
f745e7d
Remove repeated prepare_images in processor tests (#33163)
amyeroberts Sep 9, 2024
f53d7b9
Apply suggestions from code review
Satrat Sep 9, 2024
d8f7073
add to_diff_dict
Sep 9, 2024
7f112ca
Fix import of `FalconMambaForCausalLM` (#33381)
younesbelkada Sep 10, 2024
f24f084
Import structure & first three model refactors (#31329)
LysandreJik Sep 10, 2024
7d2d6ce
VLM: fixes after refactor (#32907)
zucchini-nlp Sep 10, 2024
8e8e7d8
fixed Mask2Former image processor segmentation maps handling (#33364)
maciej-adamiak Sep 10, 2024
96429e7
Add support for GGUF Phi-3 (#31844)
a8nova Sep 10, 2024
6ed2b10
Bug Fix: Update hub.py to fix NoneType error (#33315)
rishiraj Sep 10, 2024
dfee4f2
Update WhisperTokenizer Doc: Timestamps and Previous Tokens Behaviour…
bruno-hays Sep 10, 2024
f38590d
Make StaticCache configurable at model construct time (#32830)
guangy10 Sep 10, 2024
781bbc4
use diff internal model in tests (#33387)
itazap Sep 11, 2024
e719b65
Fix `FbgemmFp8Linear` not preserving tensor shape (#33239)
vgel Sep 11, 2024
91f19a5
Fix failing windows (#33436)
LysandreJik Sep 11, 2024
42babe8
Remove deprecated task in load_dataset (#33433)
albertvillanova Sep 11, 2024
7a51cbc
Dynamic number of speculative tokens in order to accelerate speculati…
jmamou Sep 11, 2024
ecf7024
Fix: Cast prefetch_bucket_size to integer for deepspeed >= 0.15 (#33402)
kiddj Sep 11, 2024
c403441
[docs] add the missing huggingface hub username (#33431)
faaany Sep 11, 2024
cea9ec0
[docs] add the missing tokenizer when pushing models to huggingface h…
faaany Sep 11, 2024
c4fbf70
update docs and expand testing
Sep 11, 2024
d7a553b
Update stale.yml (#33434)
LysandreJik Sep 12, 2024
e0ff432
Docs - update formatting of llama3 model card (#33438)
MichaelCurrin Sep 12, 2024
516ee6a
Fix incomplete sentence in `Zero-shot object detection` documentation…
sergiopaniego Sep 12, 2024
8ed6352
Fix flax whisper tokenizer bug (#33151)
hannan72 Sep 12, 2024
c8ea675
Clean-up deprecated code (#33446)
zucchini-nlp Sep 12, 2024
d71d6cb
Fix default revision for pipelines (#33395)
ankane Sep 12, 2024
5334b61
Revive AMD scheduled CI (#33448)
ydshieh Sep 12, 2024
e688996
Allow send `SSH into runner` info. to DM (#33346)
ydshieh Sep 12, 2024
8f8af0f
Correct Whisper's beam search scores computation (#32336)
ylacombe Sep 12, 2024
2f611d3
Qwen2-VL: clean-up and add more tests (#33354)
zucchini-nlp Sep 12, 2024
5c6257d
[whisper] Clarify error message when setting max_new_tokens (#33324)
benniekiss Sep 12, 2024
a05ce55
[docs] refine the doc for `train with a script` (#33423)
faaany Sep 12, 2024
9c4639b
Return image hidden states (#33426)
zucchini-nlp Sep 13, 2024
1027a53
add a callback hook right before the optimizer step (#33444)
winglian Sep 13, 2024
4b0418d
Enable `padding_side` as call time kwargs (#33385)
zucchini-nlp Sep 13, 2024
7a56598
Mitigate a conflict when using sentencepiece (#33327)
tengomucho Sep 13, 2024
dfd3115
[Phi-3] Bug on stale kv cache (#33129)
garg-amit Sep 13, 2024
6cc4dfe
Fix the initialization of the cache when we have multi gpu (#33303)
SunMarc Sep 13, 2024
0963229
Enable finetuning with torchao quantized model (#33361)
SunMarc Sep 13, 2024
e39b6c1
Corrected `Agents and tools` documentation links typos (#33471)
sergiopaniego Sep 13, 2024
7bb1c99
chore: fix typo in comment in tokenization_utils_base.py (#33466)
DavidLemayian Sep 13, 2024
8bd2b1e
Add support for Pixtral (#33449)
ArthurZucker Sep 14, 2024
95e816f
Cohere: update RoPE structure (#33408)
gante Sep 16, 2024
5ce0a11
Fix SSH workflow (#33451)
ydshieh Sep 16, 2024
ce62a41
Add keypoint-detection task guide (#33274)
merveenoyan Sep 16, 2024
2f62146
Uniformize kwargs for LLaVa processor and update docs (#32858)
yonigozlan Sep 16, 2024
c7a91f5
`Agents, supercharged - Multi-agents, External tools, and more` docs …
sergiopaniego Sep 16, 2024
c2d0589
[i18n-ar] Add File : `docs/source/ar/_toctree.yml` (#32696)
AhmedAlmaghz Sep 16, 2024
98adf24
[Whisper test] Fix some failing tests (#33450)
ylacombe Sep 16, 2024
4ba531c
Fix: Qwen2-VL training on video datasets (#33307)
hiyouga Sep 17, 2024
ba1f1dc
Updated Trainer's liger-kernel integration to call correct patching A…
shimizust Sep 17, 2024
9f196ef
Replace `accelerator.use_fp16` in examples (#33513)
hlky Sep 17, 2024
18e1a9c
Fix parametrization-based weight norm (#33275)
ylacombe Sep 17, 2024
bcf8946
Fix number of patch check for different vision feature select strateg…
insujang Sep 17, 2024
642256d
chore: migrate coverage cfg to pyproject.toml (#32650)
SauravMaheshkar Sep 17, 2024
74026b4
idefics2 enable_input_require_grads not aligned with disable_input_re…
sywangyi Sep 17, 2024
ac5a055
Update chameleon.md — fix runtime type error (#33494)
maxwbuckley Sep 17, 2024
7635484
Add explicit example for RAG chat templating (#33503)
A-Duss Sep 17, 2024
3476c19
CI Build image - move runners (#33530)
glegendre01 Sep 17, 2024
46c2757
fix to jamba config, asserting attention and expert offset (#33316)
ErezSC42 Sep 17, 2024
c29a869
Fix missing `sequences_scores` in the Whisper beam search output (#3…
Nik-Kras Sep 17, 2024
d8500cd
Uniformize kwargs for Pixtral processor (#33521)
yonigozlan Sep 17, 2024
6c051b4
Add revision to trainer push_to_hub (#33482)
teamclouday Sep 17, 2024
1992a88
Merge remote-tracking branch 'upstream/main' into compressed-tensors-…
Sep 17, 2024
454a0f2
fix patch_attention_mask incorrect setting which leads to the differe…
sywangyi Sep 17, 2024
fee8651
Support LLaVa-OV-Chat (#33532)
zucchini-nlp Sep 18, 2024
e6d9f39
Decorator for easier tool building (#33439)
aymeric-roucher Sep 18, 2024
52e22cb
Fix for slow the bug tokenizer adding spaces to single id decodes (#3…
DuyguA Sep 18, 2024
db72894
Chat template: save and load correctly for processors (#33462)
zucchini-nlp Sep 18, 2024
298a638
Update _toctree.yml with compressed-tensors
Satrat Sep 18, 2024
9f2b8cc
Fix missing head_dim in llama config from gguf model (#33526)
Isotr0py Sep 18, 2024
5427eaa
[i18n-ur] Added README_ur.md file (#33461)
akkefa Sep 18, 2024
4f1e9ba
fix the wandb logging issue (#33464)
ZIYU-DEEP Sep 18, 2024
f883827
Fix tests in ASR pipeline (#33545)
ylacombe Sep 18, 2024
fc83a4d
Added support for bfloat16 to zero-shot classification pipeline (#33554)
umarbutler Sep 18, 2024
7542fac
Pipeline: no side-effects on `model.config` and `model.generation_con…
gante Sep 18, 2024
8efc06e
Return attention mask in ASR pipeline to avoid warnings (#33509)
Rocketknight1 Sep 18, 2024
9db963a
enforce original size to be a list (#33564)
dom-dziela Sep 18, 2024
7b1ce63
Improve compiled RT-DETR inference speed (#33412)
yonigozlan Sep 18, 2024
6019f3f
Fix bnb dequantization (#33546)
SunMarc Sep 18, 2024
5af7d41
Codec integration (#33565)
ylacombe Sep 18, 2024
e40bb48
Load and save video-processor from separate folder (#33562)
zucchini-nlp Sep 19, 2024
d7975a5
VLMs: enable generation tests (#33533)
zucchini-nlp Sep 19, 2024
f3b3810
rag: fix CI (#33578)
gante Sep 19, 2024
80b774e
Cache: don't show warning in forward passes when `past_key_values` is…
gante Sep 19, 2024
4f0246e
fix tests with main revision and read token (#33560)
molbap Sep 19, 2024
413008c
add uniform processors for altclip + chinese_clip (#31198)
molbap Sep 19, 2024
d9d59e7
Generate: check that `attention_mask` is 2D (#33575)
gante Sep 19, 2024
162056a
change sequence_bias type of SequenceBiasLogitsProcessor to list, add…
VladOS95-cyber Sep 19, 2024
b50ff59
[`Mamba2`] Move dt calculations to kernel (#33520)
vasqu Sep 19, 2024
52920b5
Cache: don't throw warnings on `gemma2` when instantiating a new cach…
gante Sep 19, 2024
f111d5b
Uniformize kwargs for Paligemma processor and update docs (#33571)
yonigozlan Sep 19, 2024
b87755a
[tests] skip tests for xpu (#33553)
faaany Sep 19, 2024
4d8908d
[tests] enable GemmaIntegrationTest on XPU (#33555)
faaany Sep 19, 2024
0c718f1
Fix Llama 3 TikToken conversion (#33538)
pcuenca Sep 19, 2024
bdf4649
Docs: add the ability to manually trigger jobs (#33598)
gante Sep 20, 2024
6dc3646
Fix CircleCI nightly run (#33558)
ydshieh Sep 20, 2024
31650a5
Allow CI could be run on private forked repositories (e.g. new model …
ydshieh Sep 20, 2024
8bd1f2f
[tests] make more tests device-agnostic (#33580)
faaany Sep 20, 2024
ec1424c
Update modeling_mamba2.py, fix pad size (#32599)
klae01 Sep 20, 2024
266d0a6
Generate: remove flakyness in `test_generate_from_inputs_embeds_decod…
gante Sep 20, 2024
f9b4409
Remove unnecessary CPM model tests (#33621)
amyeroberts Sep 20, 2024
653eb40
Add sdpa for BioGpt (#33592)
OmarManzoor Sep 20, 2024
2fdb5e7
VLM generate: tests can't generate image/video tokens (#33623)
gante Sep 20, 2024
31caf0b
Fix missing test in `torch_job` (#33593)
ydshieh Sep 20, 2024
c0c6815
Add support for args to ProcessorMixin for backward compatibility (#3…
yonigozlan Sep 20, 2024
dc8b6ea
Fix contrastive search to correctly handle input with padding (#33507)
ducviet00 Sep 20, 2024
77c5d59
Generate: assistant should sample when the main model samples (#33534)
gante Sep 20, 2024
077b552
Fix some missing tests in circleci (#33559)
ydshieh Sep 20, 2024
75c878d
Update daily ci to use new cluster (#33627)
ydshieh Sep 20, 2024
e9356a4
Fix qwen2vl float16 inference bug (#33312)
GeLee-Q Sep 20, 2024
7b2b536
Fix typos (#33583)
litianjian Sep 20, 2024
49a0bef
enable low-precision pipeline (#31625)
jiqing-feng Sep 20, 2024
e472e07
Granitemoe (#33207)
mayank31398 Sep 20, 2024
e71bf70
Pixtral update example checkpoint (#33633)
amyeroberts Sep 21, 2024
78b2929
Sdpa dino v2 (#33403)
avishaiElmakies Sep 21, 2024
3cb4415
Update src/transformers/utils/quantization_config.py
Satrat Sep 23, 2024
9eb9385
Clean up Unpack imports (#33631)
molbap Sep 23, 2024
b7c381f
Fix DPT /Dinov2 sdpa regression on main (#33660)
molbap Sep 23, 2024
6d02968
handle dependency errors in check_imports (#33622)
molbap Sep 23, 2024
214db9e
add back self.max_position_embeddings = config.max_position_embedding…
chengchengpei Sep 23, 2024
be9cf07
Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower…
Isotr0py Sep 23, 2024
1456120
Uniformize kwargs for Udop processor and update docs (#33628)
yonigozlan Sep 23, 2024
e15687f
Generation: deprecate `PreTrainedModel` inheriting from `GenerationMi…
gante Sep 23, 2024
11c27dd
Enable BNB multi-backend support (#31098)
jiqing-feng Sep 24, 2024
01aec8c
Fix error string after refactoring into get_chat_template (#33652)
tibor-reiss Sep 24, 2024
75b7485
uniformize git processor (#33668)
yonigozlan Sep 24, 2024
a943157
Merge branch 'main' into compressed-tensors-quantizer
dsikka Sep 24, 2024
64f475a
update doc
dsikka Sep 24, 2024
fabe8a3
add note about saving a loaded model
dsikka Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3826,7 +3826,7 @@ def from_pretrained(
dispatch_model(model, **device_map_kwargs)

if hf_quantizer is not None:
hf_quantizer.postprocess_model(model)
hf_quantizer.postprocess_model(model, resolved_archive_file=resolved_archive_file)
model.hf_quantizer = hf_quantizer

if _adapter_model_path is not None:
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/quantizers/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
AqlmConfig,
AwqConfig,
BitsAndBytesConfig,
CompressedTensorsConfig,
EetqConfig,
GPTQConfig,
HqqConfig,
Expand All @@ -30,6 +31,7 @@
from .quantizer_awq import AwqQuantizer
from .quantizer_bnb_4bit import Bnb4BitHfQuantizer
from .quantizer_bnb_8bit import Bnb8BitHfQuantizer
from .quantizer_compressed_tensors import CompressedTensorsHfQuantizer
from .quantizer_eetq import EetqHfQuantizer
from .quantizer_gptq import GptqHfQuantizer
from .quantizer_hqq import HqqHfQuantizer
Expand All @@ -45,6 +47,7 @@
"quanto": QuantoHfQuantizer,
"eetq": EetqHfQuantizer,
"hqq": HqqHfQuantizer,
"compressed-tensors": CompressedTensorsHfQuantizer,
}

AUTO_QUANTIZATION_CONFIG_MAPPING = {
Expand All @@ -56,6 +59,7 @@
"aqlm": AqlmConfig,
"quanto": QuantoConfig,
"hqq": HqqConfig,
"compressed-tensors": CompressedTensorsConfig,
}


Expand Down
71 changes: 71 additions & 0 deletions src/transformers/quantizers/quantizer_compressed_tensors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ..utils import is_torch_available, logging
from ..utils.quantization_config import QuantizationConfigMixin
from .base import HfQuantizer


if is_torch_available():
import torch

logger = logging.get_logger(__name__)


class CompressedTensorsHfQuantizer(HfQuantizer):
"""
Quantizer for the compressed_tensors package. Loads and restores models to
quantized state with compressed_tensors
"""

requires_calibration = False
# requires_parameters_quantization = True
required_packages = ["compressed_tensors"]

def __init__(self, quantization_config: QuantizationConfigMixin, **kwargs):
super().__init__(quantization_config, **kwargs)

from compressed_tensors.compressors import ModelCompressor

self.compressor = ModelCompressor.from_compression_config(quantization_config)

def validate_environment(self, *args, **kwargs):
# check torch and compressed_tensors are available, let ImportError raise otherwise
import torch # noqa
from compressed_tensors.compressors import ModelCompressor # noqa

def update_torch_dtype(self, torch_dtype: "torch.dtype") -> "torch.dtype":
if torch_dtype is None:
torch_dtype = torch.float16
elif torch_dtype != torch.float16:
logger.info(
"We suggest you to set `torch_dtype=torch.float16` for better efficiency with compressed_tensors."
)
return torch_dtype

def _process_model_before_weight_loading(self, model, **kwargs):
if self.quantization_config.quantization_config is not None:
from compressed_tensors.quantization import apply_quantization_config

apply_quantization_config(model, self.quantization_config.quantization_config)

def _process_model_after_weight_loading(self, model, resolved_archive_file, **kwargs):
self.compressor.decompress(model_path=resolved_archive_file, model=model)

@property
def is_trainable(self):
return False

@property
def is_serializable(self):
return False
77 changes: 75 additions & 2 deletions src/transformers/utils/quantization_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@
import importlib.metadata
import json
import os
from dataclasses import dataclass
from dataclasses import asdict, dataclass, is_dataclass
from enum import Enum
from typing import Any, Dict, List, Optional, Union

from compressed_tensors.quantization.quant_config import QuantizationStatus
from compressed_tensors.quantization.quant_scheme import QuantizationScheme
from packaging import version
from pydantic import BaseModel

from ..utils import is_auto_awq_available, is_hqq_available, is_torch_available, logging

Expand All @@ -42,6 +45,7 @@ class QuantizationMethod(str, Enum):
QUANTO = "quanto"
EETQ = "eetq"
HQQ = "hqq"
COMPRESSED_TENSORS = "compressed-tensors"


class AWQLinearVersion(str, Enum):
Expand All @@ -67,6 +71,23 @@ class AwqBackendPackingMethod(str, Enum):
LLMAWQ = "llm-awq"


def convert_to_dict(obj):
if is_dataclass(obj):
return asdict(obj)
elif isinstance(obj, BaseModel):
return obj.dict()
elif isinstance(obj, Enum):
return obj.value
elif isinstance(obj, dict):
return {k: convert_to_dict(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [convert_to_dict(i) for i in obj]
elif isinstance(obj, tuple):
return tuple(convert_to_dict(i) for i in obj)
else:
return obj


@dataclass
class QuantizationConfigMixin:
"""
Expand Down Expand Up @@ -130,7 +151,7 @@ def to_dict(self) -> Dict[str, Any]:
Serializes this instance to a Python dictionary. Returns:
`Dict[str, Any]`: Dictionary of all the attributes that make up this configuration instance.
"""
return copy.deepcopy(self.__dict__)
return convert_to_dict(copy.deepcopy(self.__dict__))

def __iter__(self):
"""allows `dict(obj)` for situations where obj may be a dict or QuantizationConfigMixin"""
Expand Down Expand Up @@ -1038,3 +1059,55 @@ def post_init(self):
accepted_weights = ["int8"]
if self.weights not in accepted_weights:
raise ValueError(f"Only support weights in {accepted_weights} but found {self.weights}")


@dataclass
class CompressedTensorsConfig(QuantizationConfigMixin):
"""
This is a wrapper class that handles compressed-tensors quantization config options.
It is a wrapper around `compressed_tensors.QuantizationConfig`

Args:
weights (`str`, *optional*, defaults to `"int8"`):
The target dtype for the weights. Supported value is only "int8"
modules_to_not_convert (`list`, *optional*, default to `None`):
The list of modules to not quantize, useful for quantizing models that explicitly require to have
some modules left in their original precision.
"""

def __init__(
self,
config_groups: Dict[str, Union["QuantizationScheme", List[str]]] = None,
quant_method: str = "compressed-tensors",
format: str = "dense", # "fakequant" not in CompressionFormat
quantization_status: "QuantizationStatus" = "initialized",
global_compression_ratio: Optional[float] = None,
ignore: Optional[List[str]] = None,
sparsity_config: Dict[str, Any] = None,
**kwargs,
):
from compressed_tensors import QuantizationConfig
from compressed_tensors.config import SparsityCompressionConfig

self.quantization_config = None
self.sparsity_config = None

# parse from dict to load nested QuantizationScheme objects
if config_groups:
self.quantization_config = QuantizationConfig.parse_obj(
{
"config_groups": config_groups,
"quant_method": quant_method,
"format": format,
"quantization_status": quantization_status,
"global_compression_ratio": global_compression_ratio,
"ignore": ignore,
}
)

if sparsity_config:
self.sparsity_config = SparsityCompressionConfig.load_from_registry(
sparsity_config.get("format"), **sparsity_config
)

super().__init__(quant_method=QuantizationMethod.COMPRESSED_TENSORS)
Empty file.
76 changes: 76 additions & 0 deletions tests/quantization/compressed_tensor/test_compressed_tensors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# from transformers.quantizers.quantizer_compressed_tensors import CompressedTensorsHfQuantizer
# from transformers.quantizers.quantizer_compressed_tensors import CompressedTensorsHfQuantizer

import gc
import unittest

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, CompressedTensorsConfig


class CompressedTensorsTest(unittest.TestCase):
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
source_quantized_model_name = "nm-testing/tinyllama-oneshot-w8a8-test-static-shape-change-v3"

prompt = "Paris is the capital of which country?"
# ['<s> Paris is the capital of which country?\n\nA. London\n\nB. New York\n\nC. Paris\n\nD. Tokyo\n\n4. Which country is the capital of the European Union?\n\nA. France\n']
expected_response = ""

def tear_down(self):
gc.collect()
torch.cuda.empty_cache()
gc.collect()

@classmethod
def setUpClass(self):
"""
Setup quantized model
"""
self.tokenizer = AutoTokenizer.from_pretrained(self.source_quantized_model_name)
self.source_quantized_model = AutoModelForCausalLM.from_pretrained(self.source_quantized_model_name)

self.device = self.source_quantized_model.device
compression_config = self.source_quantized_model.config.quantization_config.quantization_config.config_groups

self.config = CompressedTensorsConfig(
config_groups=compression_config,
sparsity_config=self.source_quantized_model.config.quantization_config.sparsity_config.dict(),
)

self.assertIsNotNone(self.config.sparsity_config, "sparsity_config should not be None")
self.assertIsNotNone(self.config.quantization_config, "quantization_config should not be None")

@unittest.skip("scales not populated")
def test_apply_quantization(self):
# fails bc state_dict_scale = state_dict[f"{module_name}.{scale_name}"]
# KeyError: 'model.layers.0.self_attn.q_proj.weight_scale
self.quantization_model = AutoModelForCausalLM.from_pretrained(
horheynm marked this conversation as resolved.
Show resolved Hide resolved
self.model_name, quantization_config=self.config
)
# check that the input layers of self.source_quantized_model and self.quantization_model is the same

def test_quantized_model(self):
# test the quantized model, not the original model

inputs = self.tokenizer(self.prompt, return_tensors="pt").to(self.device)
generated_ids = self.source_quantized_model.generate(**inputs, max_length=50)
outputs = self.tokenizer.batch_decode(generated_ids)

self.expected_response = outputs
self.assertEqual(outputs, self.expected_response)
self.tear_down()

def test_forward(self):
batch_size = context_size = 1024
tensor1 = torch.rand(1024).long()
tensor2 = torch.rand(1024).long()

input_tensor = torch.cat((tensor1, tensor2), dim=0)
input_tensor = input_tensor.unsqueeze(0)
with torch.no_grad():
out = self.source_quantized_model(input_tensor)
self.assertEqual(out.shape[0], batch_size)
self.assertEqual(out.shape[1], context_size)

self.tear_down()
Loading