Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granite language models #31502

Merged
merged 87 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
750ca7f
first commit
mayank31398 Jun 19, 2024
3b26730
drop tokenizer
mayank31398 Jun 19, 2024
9c017b0
drop tokenizer
mayank31398 Jun 19, 2024
876f4b5
drop tokenizer
mayank31398 Jun 19, 2024
0f716ec
Merge branch 'main' into granite
mayank31398 Jun 28, 2024
e3cdcaf
drop convert
mayank31398 Jun 28, 2024
3e4391e
granite
mayank31398 Jun 28, 2024
6f0cf35
drop tokenization test
mayank31398 Jun 28, 2024
2d1a58c
mup
mayank31398 Jun 30, 2024
ac560ae
fix
mayank31398 Jun 30, 2024
78c81a0
reformat
mayank31398 Jun 30, 2024
3b6c755
reformat
mayank31398 Jun 30, 2024
f46bf82
reformat
mayank31398 Jun 30, 2024
272af5c
fix docs
mayank31398 Jun 30, 2024
c9b2288
stop checking for checkpoint
mayank31398 Jun 30, 2024
19ec830
update support
mayank31398 Jun 30, 2024
a9dba03
attention multiplier
mayank31398 Jun 30, 2024
df90fbd
update model
mayank31398 Jul 1, 2024
c3369a0
tiny drop
mayank31398 Jul 1, 2024
6a7c814
saibo drop
mayank31398 Jul 1, 2024
dad1e4a
skip test
mayank31398 Jul 1, 2024
5cba841
fix test
mayank31398 Jul 1, 2024
e8f5886
fix test
mayank31398 Jul 1, 2024
1678792
drop
mayank31398 Jul 1, 2024
9498556
drop useless imports
mayank31398 Jul 1, 2024
039b377
update docs
mayank31398 Jul 1, 2024
1bea763
Merge branch 'main' into granite
mayank31398 Jul 2, 2024
2a9d734
Merge branch 'main' into granite
mayank31398 Jul 11, 2024
2442492
drop flash function
mayank31398 Jul 11, 2024
2efe0a6
copied from
mayank31398 Jul 11, 2024
8da50b5
drop pretraining tp
mayank31398 Jul 11, 2024
73d4f2d
drop pretraining tp
mayank31398 Jul 11, 2024
5f02075
drop pretraining tp
mayank31398 Jul 11, 2024
de33d60
drop unused import
mayank31398 Jul 11, 2024
42035c4
drop code path
mayank31398 Jul 12, 2024
f833ca6
change name
mayank31398 Jul 12, 2024
5ca5b08
softmax scale
mayank31398 Jul 12, 2024
abb359d
head dim
mayank31398 Jul 12, 2024
cfa8210
drop legacy cache
mayank31398 Jul 12, 2024
b1dad99
rename params
mayank31398 Jul 12, 2024
79bdf6b
cleanup
mayank31398 Jul 22, 2024
91a0253
Merge branch 'main' into granite
mayank31398 Jul 22, 2024
7df943f
fix copies
mayank31398 Jul 22, 2024
18d577d
comments
mayank31398 Jul 22, 2024
90c7906
add back legacy cache
mayank31398 Jul 22, 2024
a765b89
multipliers
mayank31398 Jul 22, 2024
ce070ad
multipliers
mayank31398 Jul 22, 2024
8b1b7e0
multipliers
mayank31398 Jul 22, 2024
fcc7bf7
text fix
mayank31398 Jul 22, 2024
bd6bee6
Merge branch 'main' into granite
mayank31398 Jul 23, 2024
37eb40f
fix copies
mayank31398 Jul 23, 2024
d743ff7
Merge branch 'main' into granite
mayank31398 Jul 23, 2024
1142dbb
merge
mayank31398 Jul 23, 2024
c3185de
multipliers
mayank31398 Jul 23, 2024
6ccf5b5
attention multiplier
mayank31398 Jul 23, 2024
52440ad
drop unused imports
mayank31398 Jul 24, 2024
b39cb7d
Merge branch 'main' into granite
mayank31398 Jul 26, 2024
6fa1774
Merge branch 'main' into granite
mayank31398 Jul 29, 2024
46524c7
Merge branch 'main' into granite
mayank31398 Jul 30, 2024
71c2cde
fix
mayank31398 Jul 30, 2024
fe64841
fix
mayank31398 Jul 30, 2024
559204d
fix
mayank31398 Jul 30, 2024
b64c16d
move rope?
mayank31398 Jul 30, 2024
cd9a911
Update src/transformers/models/granite/configuration_granite.py
mayank31398 Aug 1, 2024
124065c
fix
mayank31398 Aug 1, 2024
02c1073
Update src/transformers/models/granite/modeling_granite.py
mayank31398 Aug 1, 2024
8c9112f
fix
mayank31398 Aug 1, 2024
9493a86
fix
mayank31398 Aug 1, 2024
4932097
fix
mayank31398 Aug 1, 2024
b7fe0d3
fix
mayank31398 Aug 1, 2024
fa5550a
Merge branch 'main' into granite
mayank31398 Aug 11, 2024
1eeab96
fix-copies
mayank31398 Aug 11, 2024
6a13285
Merge branch 'main' into granite
mayank31398 Aug 14, 2024
7d68382
torch rmsnorm
mayank31398 Aug 14, 2024
4a4a581
add authors
mayank31398 Aug 19, 2024
57a5b9e
Merge branch 'main' into granite
mayank31398 Aug 24, 2024
c03e395
change model path
mayank31398 Aug 26, 2024
ffed3d1
fix
mayank31398 Aug 26, 2024
a7fbd30
test
mayank31398 Aug 27, 2024
8f00c1b
drop static cache test
mayank31398 Aug 27, 2024
586fdf1
uupdate readme
mayank31398 Aug 27, 2024
dc9faaa
drop non-causal
mayank31398 Aug 27, 2024
545449c
readme
mayank31398 Aug 27, 2024
5e5cad9
drop useless imports
mayank31398 Aug 27, 2024
24029b2
Update docs/source/en/model_doc/granite.md
mayank31398 Aug 27, 2024
eaeff2a
Update docs/source/en/model_doc/granite.md
mayank31398 Aug 27, 2024
ee9c0f6
Update docs/source/en/model_doc/granite.md
mayank31398 Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,8 @@
title: GPTSAN Japanese
- local: model_doc/gpt-sw3
title: GPTSw3
- local: model_doc/granite
title: Granite
- local: model_doc/herbert
title: HerBERT
- local: model_doc/ibert
Expand Down
63 changes: 63 additions & 0 deletions docs/source/en/model_doc/granite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Granite

## Overview

The Granite model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
<INSERT SHORT SUMMARY HERE>

The abstract from the paper is the following:

*<INSERT PAPER ABSTRACT HERE>*

Tips:

<INSERT TIPS ABOUT MODEL HERE>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be filled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, the paper is not out @ArthurZucker
should we put the old Granite Code paper as a placeholder?
even though that model is using Llama class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you want we can wait until paper is out or if you want to release the checkpoints first we can say it's a model release for now!

This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).


## GraniteConfig

[[autodoc]] GraniteConfig

## GraniteModel

[[autodoc]] GraniteModel
- forward

## GraniteForCausalLM

[[autodoc]] GraniteForCausalLM
- forward

## GraniteForSequenceClassification

[[autodoc]] GraniteForSequenceClassification
- forward

## GraniteForQuestionAnswering

[[autodoc]] GraniteForQuestionAnswering
- forward

## GraniteForTokenClassification

[[autodoc]] GraniteForTokenClassification
- forward
20 changes: 20 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,7 @@
"models.levit": ["LevitConfig"],
"models.lilt": ["LiltConfig"],
"models.llama": ["LlamaConfig"],
"models.granite": ["GraniteConfig"],
"models.llava": [
"LlavaConfig",
"LlavaProcessor",
Expand Down Expand Up @@ -2403,6 +2404,16 @@
"LlamaPreTrainedModel",
]
)
_import_structure["models.granite"].extend(
[
"GraniteForCausalLM",
"GraniteForQuestionAnswering",
"GraniteForSequenceClassification",
"GraniteForTokenClassification",
"GraniteModel",
"GranitePreTrainedModel",
]
)
_import_structure["models.llava"].extend(
[
"LlavaForConditionalGeneration",
Expand Down Expand Up @@ -5097,6 +5108,7 @@
from .models.levit import LevitConfig
from .models.lilt import LiltConfig
from .models.llama import LlamaConfig
from .models.granite import GraniteConfig
from .models.llava import (
LlavaConfig,
LlavaProcessor,
Expand Down Expand Up @@ -6822,6 +6834,14 @@
LlamaModel,
LlamaPreTrainedModel,
)
from .models.granite import (
GraniteForCausalLM,
GraniteForQuestionAnswering,
GraniteForSequenceClassification,
GraniteForTokenClassification,
GraniteModel,
GranitePreTrainedModel,
)
from .models.llava import (
LlavaForConditionalGeneration,
LlavaPreTrainedModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@
levit,
lilt,
llama,
granite,
llava,
llava_next,
longformer,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@
("levit", "LevitConfig"),
("lilt", "LiltConfig"),
("llama", "LlamaConfig"),
("granite", "GraniteConfig"),
("llava", "LlavaConfig"),
("llava_next", "LlavaNextConfig"),
("longformer", "LongformerConfig"),
Expand Down Expand Up @@ -414,6 +415,7 @@
("levit", "LeViT"),
("lilt", "LiLT"),
("llama", "LLaMA"),
("granite", "Granite"),
("llama2", "Llama2"),
("llama3", "Llama3"),
("llava", "LLaVa"),
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@
("levit", "LevitModel"),
("lilt", "LiltModel"),
("llama", "LlamaModel"),
("granite", "GraniteModel"),
("longformer", "LongformerModel"),
("longt5", "LongT5Model"),
("luke", "LukeModel"),
Expand Down Expand Up @@ -463,6 +464,7 @@
("jamba", "JambaForCausalLM"),
("jetmoe", "JetMoeForCausalLM"),
("llama", "LlamaForCausalLM"),
("granite", "GraniteForCausalLM"),
("mamba", "MambaForCausalLM"),
("marian", "MarianForCausalLM"),
("mbart", "MBartForCausalLM"),
Expand Down Expand Up @@ -873,6 +875,7 @@
("led", "LEDForSequenceClassification"),
("lilt", "LiltForSequenceClassification"),
("llama", "LlamaForSequenceClassification"),
("granite", "GraniteForSequenceClassification"),
("longformer", "LongformerForSequenceClassification"),
("luke", "LukeForSequenceClassification"),
("markuplm", "MarkupLMForSequenceClassification"),
Expand Down Expand Up @@ -955,6 +958,7 @@
("led", "LEDForQuestionAnswering"),
("lilt", "LiltForQuestionAnswering"),
("llama", "LlamaForQuestionAnswering"),
("granite", "GraniteForQuestionAnswering"),
("longformer", "LongformerForQuestionAnswering"),
("luke", "LukeForQuestionAnswering"),
("lxmert", "LxmertForQuestionAnswering"),
Expand Down Expand Up @@ -1050,6 +1054,7 @@
("layoutlmv3", "LayoutLMv3ForTokenClassification"),
("lilt", "LiltForTokenClassification"),
("llama", "LlamaForTokenClassification"),
("granite", "GraniteForTokenClassification"),
("longformer", "LongformerForTokenClassification"),
("luke", "LukeForTokenClassification"),
("markuplm", "MarkupLMForTokenClassification"),
Expand Down
63 changes: 63 additions & 0 deletions src/transformers/models/granite/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright 2024 EleutherAI and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_granite": ["GraniteConfig"],
}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_granite"] = [
"GraniteForCausalLM",
"GraniteModel",
"GranitePreTrainedModel",
"GraniteForSequenceClassification",
"GraniteForQuestionAnswering",
"GraniteForTokenClassification",
]

if TYPE_CHECKING:
from .configuration_granite import GraniteConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_granite import (
GraniteForCausalLM,
GraniteForQuestionAnswering,
GraniteForSequenceClassification,
GraniteForTokenClassification,
GraniteModel,
GranitePreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading