Adding adapter support for NeoX #523

ajesujoba · 2023-03-21T17:10:26Z

Added adapter support for GPTNeoX with tests. Although at the moment, while training the adapter module, it also trains the CLM head (which is not the expected situation). This I already raised here.

ajesujoba · 2023-04-19T13:39:18Z

@calpt , sorry to bother you. did you by chance check this?

calpt

@ajesujoba Sorry for the delay in getting back to you. I've done an initial review of your changes and left some comments. Overall, this looks very promising, thanks again for your efforts.

Before this can be merged, please additionally make sure:

all checks and tests in the CI pipeline pass
all points in our contributing guide are addressed
in tests_adapters/models, a new module for GPT-NeoX is added (this is currently not correctly documented in the contributing guide, we'll update)

Additionally, I've provided a fix for the issue regarding output embeddings you've raised and we've discussed in #537.

calpt · 2023-04-19T19:10:36Z

src/transformers/__init__.py

+ _import_structure["models.gpt_neox"].extend(
+ [
+ "TFGPTNeoXForCausalLM",
+ "TFGPTNeoXForQuestionAnswering",
+ "TFGPTNeoXForSequenceClassification",
+ "TFGPTNeoXModel",
+ "TFGPTNeoXPreTrainedModel",
+ ]
+ )


Is the addition of these imports related to changes of this PR?

No, I removed them

Regarding all points in our contributing guide are addressed, it is optional in the documentation and I tried it but having some mismatches in dimension, Is it still optional?

Could you clarify what exactly you're referring to from the contributing guide? The Parallel inference and static head conversion points are still optional (although highly recommended). If Parallel support is not implemented, please make sure to remove the test mixin classes starting with "Parallel..." from the model test class.

Yes, I was referring to the Parallel inference and static head conversion points are still optional (although highly recommended). And as you recommended I would remove test mixin classes starting with "Parallel...". I would also remove some of the other tests such as IA3TestMixin, LoRATestMixin, PrefixTuningTestMixin, UniPELTTestMixin, as they require adding classification head 'add_classification_head' and the GPT_NeoX model in this version does not have that .

calpt · 2023-04-19T19:12:09Z

src/transformers/adapters/models/gpt_neox/adapter_model.py

+ head = QuestionAnsweringHead(self, head_name, num_labels, layers, activation_function, id2label)
+ self.add_prediction_head(head, overwrite_ok)
+
+class GPTNeoXModelWithHeads(GPTNeoXAdapterModel):


As XModelWithHeads classes are deprecated, we wouldn't want to add this class for newly supported models.

Sure, we can remove them.

@calpt , one more thing to point out. GPTNeoX configuration does not include 'hidden_dropout_prob' and attention_probs_dropout_prob. And running test_adapters with this would result in an error. What do you suggest? Similarly, GPTNeoXTokenizer also does not exist, the current tests_adapter does not support it.

Regarding the tokenizer, I think we should we should allow using fast tokenizers anyway. Could you check if/ which issues occur when you switch to loading the fast tokenizer for GPTNeoX?

Regarding the dropout issue, we would probably add some default dropout if the keys are not present in the model config. I can look into this. You could just add the missing key to the model config temporarily to make the tests pass and we can change it later on.

Ok! Sounds good

calpt · 2023-04-19T19:23:55Z

src/transformers/models/gpt_neox/modeling_gpt_neox.py

@@ -95,7 +105,7 @@ def __init__(self, config):
 self.rotary_ndims, config.max_position_embeddings, base=config.rotary_emb_base
 )
 self.norm_factor = torch.sqrt(torch.tensor(self.head_size, dtype=torch.float32)).to(torch.get_default_dtype())
- self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)
+ self.query_key_value = LoRALinear(config.hidden_size, 3 * config.hidden_size, "selfattn", config)


As GPT-NeoX groups all attention matrices together into one linear layers similar to GPT-2, this should probably use LoRAMergedLinear instead of LoRALinear if i'm not mistaken (see GPT-2 implementation for reference).

I fixed this already!

calpt · 2023-09-09T09:18:47Z

Hey, thanks again for your efforts in contributing new model architectures to adapter-transformers and sorry for the silence on our side.

In the last few weeks, we've been working on a large refactoring of our project, which will ultimately result in the release of Adapters, the next-generation adapters library. See #584.

As a consequence, we plan to merge any new model integrations directly to the new codebase, which currently can be found on this branch. Unfortunately, this necessitates some changes in the model integration code (detailed here, see already integrated models such as BERT, BART etc. for reference).

If you'd be willing to update your model integration to target the new library yourself, we'd be super happy to help you on this. Otherwise, we might look into upgrading and merging some of the open model integration PRs ourselves in the future. For more details, again see #584.

Fixes #666 and issue described in #523.

Jesujoba Alabi added 5 commits March 15, 2023 14:45

Adding adapter support for NeoX

c4c5194

Adding adapter support for NeoX

b7d59db

Adding adapter support for NeoX

492d16a

restored the None NeoX tokenizer

0d25ae9

Adding adapters to GPTNeoX

f920882

calpt linked an issue Mar 27, 2023 that may be closed by this pull request

Adapter support for GPTNeoX #521

Open

fixed neox attention_adapters

034dea1

calpt reviewed Apr 19, 2023

View reviewed changes

ajesujoba force-pushed the gpt_neox branch from 0f848ff to 034dea1 Compare April 20, 2023 15:10

Jesujoba Alabi added 7 commits April 21, 2023 10:45

Fixed PR reviews

8e2c52d

updated adapter head

2522877

set LoRAMergedLinear to False for NeoX

23034e6

set reformatted the files with black

31606a5

use isort to fix the needed files

f228e9a

fixed unsued imports

355b315

Remove TFGPTNeoX

8c5395b

calpt added the model-requires-upgrade label Sep 9, 2023

calpt mentioned this pull request Apr 20, 2024

Use default head dropout prob if not provided by model #685

Merged

calpt added a commit that referenced this pull request Apr 25, 2024

Use default head dropout prob if not provided by model (#685)

25797a0

Fixes #666 and issue described in #523.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding adapter support for NeoX #523

Adding adapter support for NeoX #523

ajesujoba commented Mar 21, 2023

ajesujoba commented Apr 19, 2023 •

edited

Loading

calpt left a comment

calpt Apr 19, 2023

ajesujoba Apr 22, 2023

ajesujoba Apr 22, 2023 •

edited

Loading

calpt Apr 27, 2023

ajesujoba Apr 28, 2023

calpt Apr 19, 2023

ajesujoba Apr 19, 2023

ajesujoba Apr 20, 2023 •

edited

Loading

calpt Apr 27, 2023

ajesujoba May 8, 2023

calpt Apr 19, 2023

ajesujoba May 8, 2023

calpt commented Sep 9, 2023 •

edited

Loading

Adding adapter support for NeoX #523

Are you sure you want to change the base?

Adding adapter support for NeoX #523

Conversation

ajesujoba commented Mar 21, 2023

ajesujoba commented Apr 19, 2023 • edited Loading

calpt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajesujoba Apr 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajesujoba Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calpt commented Sep 9, 2023 • edited Loading

ajesujoba commented Apr 19, 2023 •

edited

Loading

ajesujoba Apr 22, 2023 •

edited

Loading

ajesujoba Apr 20, 2023 •

edited

Loading

calpt commented Sep 9, 2023 •

edited

Loading