Add MLLama #33703

ArthurZucker · 2024-09-25T17:29:48Z

What does this PR do?

Adds model

…omps

…ything

…rything

…dition-mllama into refactor-mlamma

LysandreJik

Let's get it in!

HuggingFaceDocBuilderDev · 2024-09-25T18:16:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

After transformers merged this PR: huggingface/transformers#33703 The bool of past_key_values (a Cache instance) would change from False to True in one of our checks. Use get_seq_length() method instead, which is consistent before and after that commit. I checked the tests with the new change for both transformers before and after that commit and they passed, so this change should be backwards compatible.

After transformers merged this PR: huggingface/transformers#33703 The bool of past_key_values (a Cache instance) would change from False to True in one of our checks. Use get_seq_length() method instead, which is consistent before and after that commit. I checked the tests with the new change for both transformers before and after that commit and they passed, so this change should be backwards compatible. Unrelated change: Mark X-LoRA scaling test as xfail-ing for now. This should be addressed in a separate PR. Marking it to xfail for now to get the original fix through CI.

* current changes * nit * Add cross_attenttion_mask to processor * multi-image fixed * Add cross_attenttion_mask to processor * cross attn works in all cases * WIP refactoring function for image processor * WIP refactoring image processor functions * Refactor preprocess to use global loops instead of list nested list comps * Docstrings * Add channels unification * fix dtype issues * Update docsrings and format * Consistent max_image_tiles * current script * updates * Add convert to rgb * Add image processor tests * updates! * update * god damn it I am dumb sometimes * Precompute aspect ratios * now this works, full match * fix 😉 * nits * style * fix model and conversion * nit * nit * kinda works * hack for sdpa non-contiguous bias * nits here and there * latest c hanges * merge? * run forward * Add aspect_ratio_mask * vision attention mask * update script and config variable names * nit * nits * be able to load * style * nits * there * nits * make forward run * small update * enable generation multi-turn * nit * nit * Clean up a bit for errors and typos * A bit more constant fixes * 90B keys and shapes match * Fix for 11B model * Fixup, remove debug part * Docs * Make max_aspect_ratio_id to be minimal * Update image processing code to match new implementation * Adjust conversion for final checkpoint state * Change dim in repeat_interleave (accordig to meta code) * tmp fix for num_tiles * Fix for conversion (gate<->up, q/k_proj rope permute) * nits * codestyle * Vision encoder fixes * pass cross attn mask further * Refactor aspect ratio mask * Disable text-only generation * Fix cross attention layers order, remove q/k norm rotation for cross atention layers * Refactor gated position embeddings * fix bugs but needs test with new weights * rope scaling should be llama3 * Fix rope scaling name * Remove debug for linear layer * fix copies * Make mask prepare private func * Remove linear patch embed * Make precomputed embeddings as nn.Embedding module * MllamaPrecomputedAspectRatioEmbedding with config init * Remove unused self.output_dim * nit, intermediate layers * Rename ln and pos_embed * vision_chunk_size -> image_size * return_intermediate -> intermediate_layers_indices * vision_input_dim -> hidden_size * Fix copied from statements * fix most tests * Fix more copied from * layer_id->layer_idx * Comment * Fix tests for processor * Copied from for _prepare_4d_causal_attention_mask_with_cache_position * Style fix * Add MllamaForCausalLM * WIP fixing tests * Remove duplicated layers * Remove dummy file * Fix style * Fix consistency * Fix some TODOs * fix language_model instantiation, add docstring * Move docstring, remove todos for precomputed embeds (we cannot init them properly) * Add initial docstrings * Fix * fix some tests * lets skip these * nits, remove print, style * Add one more copied from * Improve test message * Make validate func private * Fix dummy objects * Refactor `data_format` a bit + add comment * typos/nits Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix dummy objects and imports * Add chat template config json * remove num_kv_heads from vision attention * fix * move some commits and add more tests * fix test * Remove `update_key_name` from modeling utils * remove num-kv-heads again * some prelimiary docs * Update chat template + tests * nit, conversion script max_num_tiles from params * Fix warning for text-only generation * Update conversion script for instruct models * Update chat template in converstion + test * add tests for CausalLM model * model_max_length, avoid null chat_template * Refactor conversion script * Fix forward * Fix integration tests * Refactor vision config + docs * Fix default * Refactor text config * Doc fixes * Remove unused args, fix docs example * Squashed commit of the following: commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830 Author: qubvel <qubvel@gmail.com> Date: Wed Sep 18 13:39:15 2024 +0000 Move model + add output hidden states and output attentions * Fix num_channels * Add mllama text and mllama vision models * Fixing repo consistency * Style fix * Fixing repo consistency * Fixing unused config params * Fix failed tests after refactoring * hidden_activation -> hidden_act for text mlp * Remove from_pretrained from sub-configs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Reuse lambda in conversion script * Remove run.py * Update docs/source/en/model_doc/mllama.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mllama/processing_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused LlamaTokenizerFast * Fix logging * Refactor gating * Remove cycle for collecting intermediate states * Refactor text-only check, add integration test for text-only * Revert from pretrained to configs * Fix example * Add auto `bos_token` adding in processor * Fix tips * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Enable supports_gradient_checkpointing model flag * add eager/sdpa options * don't skip attn tests and bring back GC skips (did i really remove those?) * Fix signature, but get error with None gradient * Fix output attention tests * Disable GC back * Change no split modules * Fix dropout * Style * Add Mllama to sdpa list * Add post init for vision model * Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model * if skipped, say it, don't pass * Clean vision tester config * Doc for args * Update tests/models/mllama/test_modeling_mllama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add cross_attention_mask to test * typehint * Remove todo * Enable gradient checkpointing * Docstring * Style * Fixing and skipping some tests for new cache * Mark flaky test * Skip `test_sdpa_can_compile_dynamic` test * Fixing some offload tests * Add direct GenerationMixin inheritance * Remove unused code * Add initializer_range to vision config * update the test to make sure we show if split * fix gc? * Fix repo consistency * Undo modeling utils debug changes * Fix link * mllama -> Mllama * [mllama] -> [Mllama] * Enable compile test for CausalLM model (text-only) * Fix TextModel prefix * Update doc * Docs for forward, type hints, and vision model prefix * make sure to reset * fix init * small script refactor and styling * nit * updates! * some nits * Interpolate embeddings for 560 size and update integration tests * nit * does not suppor static cache! * update * fix * nit2 * this? * Fix conversion * Style * 4x memory improvement with image cache AFAIK * Token decorator for tests * Skip failing tests * update processor errors * fix split issues * style * weird * style * fix failing tests * update * nit fixing the whisper tests * fix path * update --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal> Co-authored-by: qubvel <qubvel@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

ArthurZucker and others added 30 commits September 9, 2024 08:17

current changes

8ba624d

nit

2dd4828

Add cross_attenttion_mask to processor

29d9f0d

multi-image fixed

829ee71

Add cross_attenttion_mask to processor

8d27b9e

cross attn works in all cases

01dd827

WIP refactoring function for image processor

547ffe2

WIP refactoring image processor functions

a814074

Refactor preprocess to use global loops instead of list nested list c…

75a8608

…omps

Docstrings

e97e7c6

Add channels unification

9a26b55

fix dtype issues

aa9b752

Update docsrings and format

ac1c665

Consistent max_image_tiles

720b5e2

current script

5d5183c

updates

522c87a

Add convert to rgb

b3bc43a

Add image processor tests

f36e877

updates!

90e962a

update

6436a0c

god damn it I am dumb sometimes

2dbb5f7

Precompute aspect ratios

472ade3

now this works, full match

0c76794

fix 😉

185b431

nits

0951023

style

17fd76a

Merge remote-tracking branch 'origin/refactor-mlamma' into merge-ever…

74b2911

…ything

Merge remote-tracking branch 'origin/cache-generation' into merge-eve…

b673b11

…rything

fix model and conversion

6086d49

nit

93112a8

ArthurZucker and others added 18 commits September 25, 2024 17:17

this?

d685124

Fix conversion

fba6b53

Style

4aedcea

4x memory improvement with image cache AFAIK

928bcf6

Merge branch 'refactor-mlamma' of github.com:huggingface/new-model-ad…

ef5f5a4

…dition-mllama into refactor-mlamma

Token decorator for tests

19d7e89

Skip failing tests

63aff72

update processor errors

1d1dc6c

fix split issues

ed90bda

Merge branch 'refactor-mlamma' of github.com:huggingface/new-model-ad…

dafa298

…dition-mllama into refactor-mlamma

style

b0eff7e

weird

af0b3eb

style

db626e2

fix failing tests

4999784

update

c798599

nit fixing the whisper tests

3d8405d

fix path

a17100c

update

0a991ce

ArthurZucker marked this pull request as ready for review September 25, 2024 17:53

LysandreJik approved these changes Sep 25, 2024

View reviewed changes

ArthurZucker merged commit 19d58d3 into main Sep 25, 2024
15 of 25 checks passed

ArthurZucker deleted the refactor-mlamma branch September 25, 2024 17:56

simon-mo mentioned this pull request Sep 25, 2024

[Model] Add support for the multi-modal Llama 3.2 model vllm-project/vllm#8811

Merged

Jintao-Huang mentioned this pull request Sep 26, 2024

Support fine-tuning MLLama. modelscope/ms-swift#2132

Merged

1 task

BenjaminBossan mentioned this pull request Sep 27, 2024

FIX: Change check if past_key_values is empty huggingface/peft#2106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLLama #33703

Add MLLama #33703

ArthurZucker commented Sep 25, 2024

LysandreJik left a comment

HuggingFaceDocBuilderDev commented Sep 25, 2024

Add MLLama #33703

Add MLLama #33703

Conversation

ArthurZucker commented Sep 25, 2024

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 25, 2024