Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding adapter support for NeoX #523

Open
wants to merge 13 commits into
base: legacy
Choose a base branch
from

Conversation

ajesujoba
Copy link

Added adapter support for GPTNeoX with tests. Although at the moment, while training the adapter module, it also trains the CLM head (which is not the expected situation). This I already raised here.

@calpt calpt linked an issue Mar 27, 2023 that may be closed by this pull request
@ajesujoba
Copy link
Author

ajesujoba commented Apr 19, 2023

@calpt , sorry to bother you. did you by chance check this?

Copy link
Member

@calpt calpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajesujoba Sorry for the delay in getting back to you. I've done an initial review of your changes and left some comments. Overall, this looks very promising, thanks again for your efforts.

Before this can be merged, please additionally make sure:

  • all checks and tests in the CI pipeline pass
  • all points in our contributing guide are addressed
  • in tests_adapters/models, a new module for GPT-NeoX is added (this is currently not correctly documented in the contributing guide, we'll update)

Additionally, I've provided a fix for the issue regarding output embeddings you've raised and we've discussed in #537.

Comment on lines 2947 to 2955
_import_structure["models.gpt_neox"].extend(
[
"TFGPTNeoXForCausalLM",
"TFGPTNeoXForQuestionAnswering",
"TFGPTNeoXForSequenceClassification",
"TFGPTNeoXModel",
"TFGPTNeoXPreTrainedModel",
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the addition of these imports related to changes of this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I removed them

Copy link
Author

@ajesujoba ajesujoba Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding all points in our contributing guide are addressed, it is optional in the documentation and I tried it but having some mismatches in dimension, Is it still optional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what exactly you're referring to from the contributing guide? The Parallel inference and static head conversion points are still optional (although highly recommended). If Parallel support is not implemented, please make sure to remove the test mixin classes starting with "Parallel..." from the model test class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was referring to the Parallel inference and static head conversion points are still optional (although highly recommended). And as you recommended I would remove test mixin classes starting with "Parallel...". I would also remove some of the other tests such as IA3TestMixin, LoRATestMixin, PrefixTuningTestMixin, UniPELTTestMixin, as they require adding classification head 'add_classification_head' and the GPT_NeoX model in this version does not have that .

head = QuestionAnsweringHead(self, head_name, num_labels, layers, activation_function, id2label)
self.add_prediction_head(head, overwrite_ok)

class GPTNeoXModelWithHeads(GPTNeoXAdapterModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As XModelWithHeads classes are deprecated, we wouldn't want to add this class for newly supported models.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can remove them.

Copy link
Author

@ajesujoba ajesujoba Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calpt , one more thing to point out. GPTNeoX configuration does not include 'hidden_dropout_prob' and attention_probs_dropout_prob. And running test_adapters with this would result in an error. What do you suggest? Similarly, GPTNeoXTokenizer also does not exist, the current tests_adapter does not support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the tokenizer, I think we should we should allow using fast tokenizers anyway. Could you check if/ which issues occur when you switch to loading the fast tokenizer for GPTNeoX?

Regarding the dropout issue, we would probably add some default dropout if the keys are not present in the model config. I can look into this. You could just add the missing key to the model config temporarily to make the tests pass and we can change it later on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Sounds good

@@ -95,7 +105,7 @@ def __init__(self, config):
self.rotary_ndims, config.max_position_embeddings, base=config.rotary_emb_base
)
self.norm_factor = torch.sqrt(torch.tensor(self.head_size, dtype=torch.float32)).to(torch.get_default_dtype())
self.query_key_value = nn.Linear(config.hidden_size, 3 * config.hidden_size)
self.query_key_value = LoRALinear(config.hidden_size, 3 * config.hidden_size, "selfattn", config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As GPT-NeoX groups all attention matrices together into one linear layers similar to GPT-2, this should probably use LoRAMergedLinear instead of LoRALinear if i'm not mistaken (see GPT-2 implementation for reference).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this already!

@calpt
Copy link
Member

calpt commented Sep 9, 2023

Hey, thanks again for your efforts in contributing new model architectures to adapter-transformers and sorry for the silence on our side.

In the last few weeks, we've been working on a large refactoring of our project, which will ultimately result in the release of Adapters, the next-generation adapters library. See #584.

As a consequence, we plan to merge any new model integrations directly to the new codebase, which currently can be found on this branch. Unfortunately, this necessitates some changes in the model integration code (detailed here, see already integrated models such as BERT, BART etc. for reference).

If you'd be willing to update your model integration to target the new library yourself, we'd be super happy to help you on this. Otherwise, we might look into upgrading and merging some of the open model integration PRs ourselves in the future. For more details, again see #584.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapter support for GPTNeoX
2 participants