fix use cache issue with OPT #3

younesbelkada · 2023-09-26T11:16:02Z

What does this PR do?

Correctly deals with the case users use OPT+FA2+ use_cache

to reproduce:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-350m",
    use_flash_attention_2=True,
    torch_dtype=torch.float16
).to(0)

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")

text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt").to(0)

out = model.generate(**inputs, max_new_tokens=30, use_cache=True)
print(tokenizer.batch_decode(out, skip_special_tokens=True))

susnato · 2023-09-26T11:20:07Z

Hi @younesbelkada thanks for the fix here!

* added flash attention for opt * added to list * fix use cache (#3) * style fix * fix text * test fix2 * reverted until 689f599 * torch fx tests are working now! * small fix * added TODO docstring * changes * comments and .md file modification --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aa. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f3. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d6012. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f3. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd97. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bc. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d89. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes

* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (huggingface#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Lysandre Debut <hi@lysand.re>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (huggingface#5) * Pr fixes (huggingface#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (huggingface#8) * tokenizer test * format fix * Adding Docs and other minor changes (huggingface#7) * Add modeling tests (huggingface#9) * Smol Fix (huggingface#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (huggingface#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (huggingface#14) * Update chat templates to use the new API (huggingface#15) --------- Co-authored-by: ahmetustun <ahmetustun89@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

fix use cache

aaefeed

younesbelkada mentioned this pull request Sep 26, 2023

[FA2] Add flash attention for opt huggingface/transformers#26414

Merged

5 tasks

susnato merged commit 689f599 into susnato:flash_attn_opt Sep 26, 2023

younesbelkada deleted the fix-opt-fa-cache branch September 26, 2023 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix use cache issue with OPT #3

fix use cache issue with OPT #3

younesbelkada commented Sep 26, 2023

susnato commented Sep 26, 2023

fix use cache issue with OPT #3

fix use cache issue with OPT #3

Conversation

younesbelkada commented Sep 26, 2023

What does this PR do?

susnato commented Sep 26, 2023