Granite language models #31502

mayank31398 · 2024-06-19T18:05:56Z

What does this PR do?

This PR adds support for IBM's upcoming LLMs 3B and 8B.

text models: @ArthurZucker and @younesbelkada

amyeroberts · 2024-06-20T09:30:04Z

younesbelkada

Thanks a lot !
isn't granite support already added in #30031 ? If not we could leverage diff tool that has been recently added - see for example #31211 for reference. I'll let @ArthurZucker comment on this

mayank31398 · 2024-06-20T18:15:17Z

hey @younesbelkada
This is for our upcoming open model releases.
3B and 8B language models (lots of tokens :D)

lets just leave this PR for now.
I will get back to this in a few days.

mayank31398 · 2024-08-27T10:51:00Z

@ArthurZucker this is ready for merge

ArthurZucker

LGTM, 2 small nits and let's merge

docs/source/en/model_doc/granite.md

ArthurZucker · 2024-08-27T11:34:19Z

tests/models/granite/test_modeling_granite.py

-    @slow
-    @require_torch_gpu
-    @require_read_token
-    def test_compile_static_cache(self):


does this not work (or why was it removed!)

yeah, it does not work.

the models use mup and the error is way to high to compare generated outputs

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

mayank31398 · 2024-08-27T15:35:29Z

I have addressed the changes

ArthurZucker · 2024-08-27T18:00:13Z

Thanks for bearing with me 🤗

mayank31398 · 2024-08-27T18:56:29Z

passed docs 🥳

mayank31398 · 2024-08-27T19:42:05Z

thanks Arthur :)

ArthurZucker · 2024-08-28T06:38:14Z

Thank you as well! 🤗

Jintao-Huang · 2024-08-28T08:51:50Z

hello!

module 'torch.nn' has no attribute 'RMSNorm'

The version of torch < 2.4.0 will report an error.

Jintao-Huang · 2024-08-28T08:54:49Z

src/transformers/pytorch_utils.py

@@ -22,7 +22,7 @@
 from .utils import is_torch_xla_available, logging


-ALL_LAYERNORM_LAYERS = [nn.LayerNorm]
+ALL_LAYERNORM_LAYERS = [nn.LayerNorm, nn.RMSNorm]


Encountered the same issue, opened #33177 to fix it

* first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

mayank31398 added 4 commits June 19, 2024 13:48

first commit

750ca7f

drop tokenizer

3b26730

drop tokenizer

9c017b0

drop tokenizer

876f4b5

younesbelkada reviewed Jun 20, 2024

View reviewed changes

mayank31398 added 22 commits June 28, 2024 14:22

Merge branch 'main' into granite

0f716ec

drop convert

e3cdcaf

granite

3e4391e

drop tokenization test

6f0cf35

mup

2d1a58c

fix

ac560ae

reformat

78c81a0

reformat

3b6c755

reformat

f46bf82

fix docs

272af5c

stop checking for checkpoint

c9b2288

update support

19ec830

attention multiplier

a9dba03

update model

df90fbd

tiny drop

c3369a0

saibo drop

6a7c814

skip test

dad1e4a

fix test

5cba841

fix test

e8f5886

drop

1678792

drop useless imports

9498556

update docs

039b377

mayank31398 marked this pull request as ready for review July 1, 2024 22:59

mayank31398 added 3 commits August 26, 2024 20:40

drop non-causal

dc9faaa

readme

545449c

drop useless imports

5e5cad9

mayank31398 requested a review from ArthurZucker August 27, 2024 10:51

ArthurZucker approved these changes Aug 27, 2024

View reviewed changes

mayank31398 and others added 3 commits August 27, 2024 11:33

Update docs/source/en/model_doc/granite.md

24029b2

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update docs/source/en/model_doc/granite.md

eaeff2a

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update docs/source/en/model_doc/granite.md

ee9c0f6

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker added New model run-slow labels Aug 27, 2024

ArthurZucker merged commit c35d2cc into huggingface:main Aug 27, 2024
25 checks passed

mayank31398 deleted the granite branch August 27, 2024 19:42

mayank31398 mentioned this pull request Aug 27, 2024

fix model name and copyright #33152

Merged

Jintao-Huang reviewed Aug 28, 2024

View reviewed changes

NielsRogge mentioned this pull request Aug 28, 2024

Add GraniteRMSNorm #33177

Merged

gheinrich mentioned this pull request Aug 29, 2024

PR #31502 bumps up PyTorch requirement to >=2.4 #33197

Closed

4 tasks

kq-chen mentioned this pull request Aug 31, 2024

AWQ performance on RTX3090, with flash_attn2 QwenLM/Qwen2-VL#8

Open

gabe-l-hart mentioned this pull request Sep 10, 2024

IBM Granite Architecture ggerganov/llama.cpp#9412

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Granite language models #31502

Granite language models #31502

mayank31398 commented Jun 19, 2024 •

edited

Loading

amyeroberts commented Jun 20, 2024

younesbelkada left a comment

mayank31398 commented Jun 20, 2024

mayank31398 commented Aug 27, 2024

ArthurZucker left a comment

ArthurZucker Aug 27, 2024

mayank31398 Aug 27, 2024

mayank31398 Aug 27, 2024

mayank31398 commented Aug 27, 2024

ArthurZucker commented Aug 27, 2024

mayank31398 commented Aug 27, 2024

mayank31398 commented Aug 27, 2024

ArthurZucker commented Aug 28, 2024

Jintao-Huang commented Aug 28, 2024

Jintao-Huang Aug 28, 2024

NielsRogge Aug 28, 2024

Granite language models #31502

Granite language models #31502

Conversation

mayank31398 commented Jun 19, 2024 • edited Loading

What does this PR do?

amyeroberts commented Jun 20, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

mayank31398 commented Jun 20, 2024

mayank31398 commented Aug 27, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 27, 2024

Choose a reason for hiding this comment

mayank31398 Aug 27, 2024

Choose a reason for hiding this comment

mayank31398 Aug 27, 2024

Choose a reason for hiding this comment

mayank31398 commented Aug 27, 2024

ArthurZucker commented Aug 27, 2024

mayank31398 commented Aug 27, 2024

mayank31398 commented Aug 27, 2024

ArthurZucker commented Aug 28, 2024

Jintao-Huang commented Aug 28, 2024

Jintao-Huang Aug 28, 2024

Choose a reason for hiding this comment

NielsRogge Aug 28, 2024

Choose a reason for hiding this comment

mayank31398 commented Jun 19, 2024 •

edited

Loading