forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Temporary] Add compressed-tensors HFQuantizer implementation #101
Open
bfineran
wants to merge
936
commits into
upstream-a564d10af
Choose a base branch
from
compressed-tensors-quantizer
base: upstream-a564d10af
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Satrat
reviewed
Jun 11, 2024
Satrat
reviewed
Jun 11, 2024
Satrat
reviewed
Jun 11, 2024
horheynm
reviewed
Jun 13, 2024
tests/quantization/compressed_tensor/test_compressed_tensors.py
Outdated
Show resolved
Hide resolved
* support-qwen2-vl * tidy * tidy * tidy * tidy * tidy * tidy * tidy * hyphen->underscore * make style * add-flash2-tipd * delete-tokenize=False * remove-image_processor-in-init-file * add-qwen2_vl-in-MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES * format-doct * support-Qwen2VLVisionConfig * remove-standardize_cache_format * fix-letter-varaibles * remove-torch-in-image-processor * remove-useless-docstring * fix-one-letter-varaible-name * change-block-name * default-quick-gelu-in-vision * remove-useless-doc * use-preimplemented-flash-forward * fix-doc * fix-image-processing-doc * fix-apply-rotary-embed * fix-flash-attn-sliding-window * refactor * remove-default_template * remove-reorder_cache * simple-get-rope_deltas * update-prepare_inputs_for_generation * update-attention-mask * update-rotary_seq_len * remove-state * kv_seq_length * remove-warning * _supports_static_cache * remove-legacy-cache * refactor * fix-replace * mrope-section-doc * code-quality * code-quality * polish-doc * fix-image-processing-test * update readme * Update qwen2_vl.md * fix-test * Update qwen2_vl.md * nit * processor-kwargs * hard-code-norm_layer * code-quality * discard-pixel-values-in-gen * fix-inconsistent-error-msg * unify-image-video * hidden_act * add-docstring * vision-encode-as-PreTrainedModel * pixel-to-target-dtype * update doc and low memoryvit * format * format * channel-foramt * fix vit_flashatt * format * inherit-Qwen2VLPreTrainedModel * simplify * format-test * remove-one-line-func-in-image-processing * avoid-one-line-reshape * simplify-rotary_seq_len * avoid-single-letter-variable * no-for-loop-sdpa * avoid-single-letter-variable * remove-one-line-reshape * remove-one-line-reshape * remove-no-rope-in-vit-logic * default-mrope * add-copied-from * more-docs-for-mrope * polish-doc * comment-and-link * polish-doc * single-letter-variables * simplify-image-processing * video->images * kv_seq_len-update * vision-rope-on-the-fly * vision-eager-attention * change-processor-order --------- Co-authored-by: baishuai <baishuai.bs@alibaba-inc.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
…gface#32404) * Add changes for uroman package to handle non-Roman characters * Update docs for uroman changes * Modifying error message to warning, for backward compatibility * Update instruction for user to install uroman * Update docs for uroman python version dependency and backward compatibility * Update warning message for python version compatibility with uroman * Refine docs
…atible with DeepSpeed (huggingface#33105) Fixed pydantic required version in dockerfiles.
* fix documentation * update config
…gingface#32850) * Fixed failing CodeGenTokenizationTest::test_truncation. * [run_slow] Codegen * [run_slow] codegen
…e#32079) * fix: multilingual midel convert to tflite get wrong token * fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min --------- Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com> Co-authored-by: Aya <[kent831217@gmail.com]>
disable scheduled daily CI temporary Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…sues due to large image size (huggingface#33123) * fix param not being passed in tested; add exceptions * better source of model name * Update utils/create_dummy_models.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…ojects/hybrid_clip (huggingface#33137) Bump torch in /examples/research_projects/jax-projects/hybrid_clip Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](pytorch/pytorch@v1.13.1...v2.2.0) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Log additional test metrics with the CometCallback. Also follow the same metric naming convention as other callbacks * Merge 2 subsequent if-statements * Trigger Build --------- Co-authored-by: Aliaksandr Kuzmik <alexander.kuzmik99@gmail.com>
* [docs] add quick usage snippet to Whisper. * Apply suggestions from review. * 💉 Fix the device for pipeline.
…#32115) * update ExportableState callbacks state before saving trainer_state on save_checkpoint * run make fixup and fix format * manage multiple stateful callbacks of same class
* fix Idefics2VisionConfig type annotation * Update modeling_idefics2.py * Update modeling_idefics2.py add ignore copy * Update modeling_idefics2.py * Update modeling_idefics2.py
* Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests
…33131) * fix redundant checkpointing in example scripts * Update examples/pytorch/image-classification/run_image_classification_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/translation/run_translation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/token-classification/run_ner_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/text-classification/run_glue_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/summarization/run_summarization_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/semantic-segmentation/run_semantic_segmentation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_mlm_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_fim_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/language-modeling/run_clm_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/image-pretraining/run_mim_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/multiple-choice/run_swag_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/question-answering/run_qa_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/object-detection/run_object_detection_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update examples/pytorch/question-answering/run_qa_beam_search_no_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* docs: ko: conversations.md * feat: hand-crafted translate docs * fix: modify typo after Grammar Check * Update docs/source/ko/conversations.md 감사합니다 Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> * fix: accept suggestions about anchor and spacing * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * Update docs/source/ko/conversations.md Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com> * fix: anchor 'what happened inside piepeline?' be removed question mark * fix: translate the comments in the code block --------- Co-authored-by: SeungAhSon <gongsoonyee@gmail.com> Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com> Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Very small change to one of the parameters np.random.randint second parameter is not included in the possible options. Therefore, we want the upper range to be 2, so that we have some 1 labels in our classification as well.
…er_only` (huggingface#33602) almost zero is not zero
Remove model tests
* Add sdpa for BioGpt * Updates * Add the docs * [run_slow] biogpt * Use the copy mechanism to ensure consistency * [run_slow] biogpt
fix missing tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…ggingface#33479) * add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin * change size and crop_size in processor kwargs tests to do_rescale and rescale_factor * remove unnecessary llava processor kwargs test overwrite * nit * change data_arg_name to input_name * Remove unnecessary test override * Remove unnecessary tests Paligemma * Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring
…gface#33507) * fix: handle padding in contrastive search for decoder-only models * fix: handle padding in contrastive search for encoder-decoder models * tests: move padding contrastive test to test_util, add t5 test * fix: handle if model_kwargs["decoder_attention_mask"] is None * refactor: improve padding input contrastive search generation tests * chore: _ranking_fast to use LongTensor for cosine_matrix_mask
* fix * fix * fix * fix * skip * skip more --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* update * re-enable daily CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix qwen2vl float16 inference bug * [run-slow] qwen2_vl
Co-authored-by: litianjian <litianjian@bytedance.com>
* enable low-precision pipeline * fix parameter for ASR * reformat * fix asr bug * fix bug for zero-shot * add dtype check * rm useless comments * add np.float16 check * Update src/transformers/pipelines/image_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/pipelines/token_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix comments * fix asr check * make fixup * No more need for is_torch_available() --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com>
* first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * add granitemoe * add decoration * remove moe from sequenceclassification * fix test * fix * fix * fix * move rope? * merge * drop bias * drop bias * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * drop * drop * fix * fix * cleanup * cleanup * fix * fix granite tests * fp32 test * fix * drop jitter * fix * rename * rename * fix config * add gen test --------- Co-authored-by: Yikang Shen <yikang.shn@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update pixtral example checkpoint * Fix typo
* add sdpa to dinov2 * fixup * add dinov2 to sdpa doc * update doc order * [run-slow] dinov2 * common to eager * [run-slow] dinov2 * update attn implementation in common * update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit * [run-slow] dinov2 --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
clean up Unpack imports
* fallback to eager if output attentions. * fix copies
* handle dependency errors in check_imports * change log level to warning
huggingface#33550) * add back self.max_position_embeddings = config.max_position_embeddings * fix-copies
…huggingface#33613) fix llavaqwen2 model conversion
* Add optional kwargs and uniformize udop * cleanup Unpack * nit Udop
* enable cpu bnb path * fix style * fix code style * fix 4 bit path * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * add multi backend refactor tests * fix style * tweak 4bit quantizer + fix corresponding tests * tweak 8bit quantizer + *try* fixing corresponding tests * fix dequant bnb 8bit * account for Intel CPU in variability of expected outputs * enable cpu and xpu device map * further tweaks to account for Intel CPU * fix autocast to work with both cpu + cuda * fix comments * fix comments * switch to testing_utils.torch_device * allow for xpu in multi-gpu tests * fix tests 4bit for CPU NF4 * fix bug with is_torch_xpu_available needing to be called as func * avoid issue where test reports attr err due to other failure * fix formatting * fix typo from resolving of merge conflict * polish based on last PR review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix CI * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix error log * fix error msg * add \n in error log * make quality * rm bnb cuda restriction in doc * cpu model don't need dispatch * fix doc * fix style * check cuda avaliable in testing * fix tests * Update docs/source/en/model_doc/chameleon.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * Update tests/quantization/bnb/test_4bit.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix doc * fix check multibackends * fix import sort * remove check torch in bnb * docs: update bitsandbytes references with multi-backend info * docs: fix small mistakes in bnb paragraph * run formatting * reveret bnb check * move bnb multi-backend check to import_utils * Update src/transformers/utils/import_utils.py Co-authored-by: Aarni Koskela <akx@iki.fi> * fix bnb check * minor fix for bnb * check lib first * fix code style * Revert "run formatting" This reverts commit ac108c6. * fix format * give warning when bnb version is low and no cuda found] * fix device assignment check to be multi-device capable * address akx feedback on get_avlbl_dev fn * revert partially, as we don't want the function that public, as docs would be too much (enforced) --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
…e#33652) * Fix error string after refactoring into get_chat_template * Take suggestion from CR Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* uniformize git processor * update doctring
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
for internal review
Use on compressed-tensor branch
neuralmagic/compressed-tensors#79
For the tests to pass, the quantized model (base model applying config) needs to have scale and zp.