[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM #22105

ArthurZucker · 2023-03-11T10:43:48Z

What does this PR do?

Refactor both Deberta and DebertaV2 to make them more compatible with the overall transformers library

Should fix a bunch of issues related to torch-scripting with Deberta:
fixes #15216
fixes #15673
fixes #16456
fixes #18659
fixes #21300
fixes #20815
fixes #12436
fixes #18674

HuggingFaceDocBuilderDev · 2023-03-11T10:58:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

hriaz17 · 2023-05-22T21:46:43Z

Hey @ArthurZucker any updates on this? ETA for when it will be merged into main?

ArthurZucker · 2023-05-23T09:11:11Z

Hey! Just got back from holidays, this should be my main focus in the coming days!

ArthurZucker · 2023-06-19T10:04:24Z

Sorry! Seem like I had to postpone this! If anyone want to take over feel free to do it, otherwise will be my priority once #23909 is merge!

zynaa · 2023-06-29T00:42:42Z

Regarding the z_steps in DebertaV2Model: I think this code is relevant for the enhanced mask decoder of the generator model

if attention_mask.dim() <= 2:
    extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
    att_mask = extended_attention_mask.byte()
    attention_mask = att_mask * att_mask.squeeze(-2).unsqueeze(-1)
elif attention_mask.dim() == 3:
    attention_mask = attention_mask.unsqueeze(1)
target_mask = target_ids > 0
hidden_states = encoder_layers[-2]
if not self.position_biased_input:
    layers = [encoder.layer[-1] for _ in range(2)]
    z_states += hidden_states
    query_states = z_states
    query_mask = attention_mask
    outputs = []
    rel_embeddings = encoder.get_rel_embedding()

    for layer in layers:
        # TODO: pass relative pos ids
        output = layer(hidden_states, query_mask, return_att=False, query_states=query_states,
                       relative_pos=relative_pos, rel_embeddings=rel_embeddings)
        query_states = output
        outputs.append(query_states)
else:
    outputs = [encoder_layers[-1]]

As far as I can tell, they hardcoded z_steps to 2 here. Although it should still be left as 0 for the discriminator. Adding the z_steps to the config seems like a good idea.

z_states represents the position embeddings, which are non-zero if position_biased_input is set to True. They are passed from the embedding layer. So in order to properly implement this, I think we need to return the position embeddings here:

class DebertaV2Embeddings(nn.Module):
    def forward(self, input_ids=None, token_type_ids=None, position_ids=None, mask=None, inputs_embeds=None):
        ...

        return embeddings, position_embeddings

and implement the z_steps like this:

class DebertaV2Model(DebertaV2PreTrainedModel):
    def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        token_type_ids: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.Tensor] = None,
        inputs_embeds: Optional[torch.Tensor] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, BaseModelOutput]:
        ...

        embedding_output, position_embedding_output = self.embeddings(
            input_ids=input_ids,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )
        ...

        if self.z_steps > 0:
            hidden_states = encoded_layers[-2]
            layers = [self.encoder.layer[-1] for _ in range(self.z_steps)]
            position_embedding_output += hidden_states
            query_states = position_embedding_output
            query_mask = self.encoder.get_attention_mask(attention_mask)
            rel_embeddings = self.encoder.get_rel_embedding()
            rel_pos = self.encoder.get_rel_pos(embedding_output)
            for layer in layers:
                query_states = layer(
                    hidden_states,
                    query_mask,
                    output_attentions=False,
                    query_states=query_states,
                    relative_pos=rel_pos,
                    rel_embeddings=rel_embeddings,
                )
                encoded_layers = encoded_layers + (query_states,)

Bachstelze · 2024-02-08T13:23:12Z

What is the status?
The logs of the checks are expired.

ArthurZucker · 2024-02-13T03:43:35Z

#27734 should help with some of the issues in the mean time

joshdevins · 2024-08-15T16:04:59Z

Any update on this? We're blocked by some of the issues that this is meant to fix. Sepcifically:

serenachou · 2024-09-03T23:12:10Z

Hi there, just checking in @ArthurZucker on whether there's any progress here please?

ArthurZucker · 2024-09-06T11:08:15Z

Hey hey! Sorry I ended up dropping this, let me get back to you next week!

ArthurZucker · 2024-09-06T11:09:41Z

I reviewed #27734, will take it over this weekend if possible.

ArthurZucker · 2024-09-06T11:09:52Z

Anyone up for the task can try to do it as well 🤗

…tor-deberta

LysandreJik

Ok! Can you document in DeBERTa's documentation page the evolution that the integration had? I think it's important that users have easily accessible information about the initial contribution, and how this refactor contributes to improving every aspect of DeBERTa.

Thanks!

src/transformers/models/deberta/configuration_deberta.py

…tor-deberta

ArthurZucker · 2024-11-25T09:41:37Z

Had to skip some pt-tf equivalence tests. The slow tests ran for me and are passing.
If anyone has a problem will be quick to fix!

joshdevins · 2024-11-25T11:03:47Z

Nice one, thanks @ArthurZucker !

Description After a recent change in transformers (huggingface/transformers#22105), PEFT could no longer determine the word embeddings from Deberta. This PR provides a very minimal fix that correctly determines the word embeddings again. Details Previously, the word embeddings were determined in the following manner: 1. Find the transformers_backbone by checking the base model's children for PreTrainedModel instances 2. If not found, the model itself is considered the transformers backbone. 3. On the backbone, check for modules whose weight has the same size as the vocab size. This module is now assumed to be the word embeddings. Before the mentioned transformers PR, 1. did not find anything, so 2. was applied. After the PR, however, the DebertaEncoder is now an instance of PreTrainedModel (asked internally, this is intended). Therefore, the encoder is now considered the transformer backbone. But the encoder does not have the word embeddings attribute, therefore step 3. fails. The fix of this PR is to first explicitly check for model.embeddings.word_embeddings and if this attribute is found, use it as the word embeddings. Only when it's not found do we use the other method described above. This way, we can successfully determine the word embeddings on models like Deberta. This whole code is a bit messy and could probably be improved. However, changing the logic too much could inadvertently break for some existing models that are not included in the tests. Therefore, I chose this method which leaves the existing logic mostly intact.

After a recent change in transformers (huggingface/transformers#22105), PEFT could no longer determine the word embeddings from Deberta. This PR provides a very minimal fix that correctly determines the word embeddings again. Details Previously, the word embeddings were determined in the following manner: 1. Find the transformers_backbone by checking the base model's children for PreTrainedModel instances 2. If not found, the model itself is considered the transformers backbone. 3. On the backbone, check for modules whose weight has the same size as the vocab size. This module is now assumed to be the word embeddings. Before the mentioned transformers PR, 1. did not find anything, so 2. was applied. After the PR, however, the DebertaEncoder is now an instance of PreTrainedModel (asked internally, this is intended). Therefore, the encoder is now considered the transformer backbone. But the encoder does not have the word embeddings attribute, therefore step 3. fails. The fix of this PR is to first explicitly check for model.embeddings.word_embeddings and if this attribute is found, use it as the word embeddings. Only when it's not found do we use the other method described above. This way, we can successfully determine the word embeddings on models like Deberta. This whole code is a bit messy and could probably be improved. However, changing the logic too much could inadvertently break for some existing models that are not included in the tests. Therefore, I chose this method which leaves the existing logic mostly intact.

… and fix LLM (huggingface#22105) * some modification for roadmap * revert some changes * yups * weird * make it work * sttling * fix-copies * fixup * renaming * more fix-copies * move stuff around * remove torch script warnings * ignore copies * revert bad changes * woops * just styling * nit * revert * style fixup * nits configuration style * fixup * nits * will this fix the tf pt issue? * style * ??????? * update * eval? * update error message * updates * style * grumble grumble * update * style * nit * skip torch fx tests that were failing * style * skip the failing tests * skip another test and make style

ArthurZucker changed the title ~~[WIP] Refactor~~ [WIP] Refactor Deberta/Deberta-v2 Mar 13, 2023

ArthurZucker self-assigned this Mar 13, 2023

huggingface deleted a comment from github-actions bot Apr 11, 2023

github-actions bot closed this May 13, 2023

ArthurZucker reopened this May 23, 2023

huggingface deleted a comment from github-actions bot May 23, 2023

ArthurZucker mentioned this pull request May 26, 2023

DeBERTa models produce nonsense fill-mask output #22790

Open

4 tasks

huggingface deleted a comment from github-actions bot Jun 19, 2023

huggingface deleted a comment from github-actions bot Jul 24, 2023

huggingface deleted a comment from github-actions bot Aug 17, 2023

huggingface deleted a comment from github-actions bot Sep 13, 2023

huggingface deleted a comment from github-actions bot Oct 13, 2023

huggingface deleted a comment from github-actions bot Nov 8, 2023

ArthurZucker mentioned this pull request Nov 21, 2023

Outputs are not consistent when using DeBERTa for inference #27586

Closed

4 tasks

huggingface deleted a comment from github-actions bot Dec 4, 2023

ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jan 3, 2024

huggingface deleted a comment from github-actions bot Jan 3, 2024

His-Wardship mentioned this pull request Apr 26, 2024

Open to contribution: adding torch.nn.functional.scaled_dot_product_attention support for more architectures #28005

Closed

6 tasks

ArthurZucker added 5 commits October 24, 2024 11:49

???????

f5cf86c

update

45893bd

eval?

87f45cc

update error message

edc07ca

updates

6608ee5

ArthurZucker requested review from LysandreJik and sgugger October 24, 2024 14:47

ArthurZucker added 3 commits October 24, 2024 16:51

style

adb0913

grumble grumble

3987edc

Merge branch 'main' of github.com:huggingface/transformers into refac…

c03fdd3

…tor-deberta

LysandreJik approved these changes Oct 25, 2024

View reviewed changes

src/transformers/models/deberta/configuration_deberta.py Show resolved Hide resolved

ArthurZucker added 8 commits November 20, 2024 13:24

update

c3e1cc0

style

4a6690a

Merge branch 'main' of github.com:huggingface/transformers into refac…

29b78ec

…tor-deberta

nit

b51d64a

skip torch fx tests that were failing

cb6a11c

style

15b33a1

skip the failing tests

6d0113c

skip another test and make style

ccd6f44

ArthurZucker merged commit 857d46c into huggingface:main Nov 25, 2024
26 checks passed

ArthurZucker deleted the refactor-deberta branch November 25, 2024 09:43

BenjaminBossan mentioned this pull request Dec 4, 2024

FIX Correctly determine word embeddings on Deberta huggingface/peft#2257

Merged

echarlaix mentioned this pull request Dec 9, 2024

Enable transformers v4.47 support huggingface/optimum#2119

Merged

davidkyle mentioned this pull request Dec 16, 2024

Update transformers library for DeBERTa model tracing fix elastic/eland#741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM #22105

[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM #22105

ArthurZucker commented Mar 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 11, 2023

hriaz17 commented May 22, 2023

ArthurZucker commented May 23, 2023

ArthurZucker commented Jun 19, 2023

zynaa commented Jun 29, 2023

Bachstelze commented Feb 8, 2024

ArthurZucker commented Feb 13, 2024

joshdevins commented Aug 15, 2024

serenachou commented Sep 3, 2024

ArthurZucker commented Sep 6, 2024 •

edited

Loading

ArthurZucker commented Sep 6, 2024

ArthurZucker commented Sep 6, 2024

LysandreJik left a comment

ArthurZucker commented Nov 25, 2024

joshdevins commented Nov 25, 2024

[Deberta/Deberta-v2] Refactor code base to support compile, export, and fix LLM #22105

[Deberta/Deberta-v2] Refactor code base to support compile, export, and fix LLM #22105

Conversation

ArthurZucker commented Mar 11, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 11, 2023

hriaz17 commented May 22, 2023

ArthurZucker commented May 23, 2023

ArthurZucker commented Jun 19, 2023

zynaa commented Jun 29, 2023

Bachstelze commented Feb 8, 2024

ArthurZucker commented Feb 13, 2024

joshdevins commented Aug 15, 2024

serenachou commented Sep 3, 2024

ArthurZucker commented Sep 6, 2024 • edited Loading

ArthurZucker commented Sep 6, 2024

ArthurZucker commented Sep 6, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

ArthurZucker commented Nov 25, 2024

joshdevins commented Nov 25, 2024

[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM #22105

[`Deberta/Deberta-v2`] Refactor code base to support compile, export, and fix LLM #22105

ArthurZucker commented Mar 11, 2023 •

edited

Loading

ArthurZucker commented Sep 6, 2024 •

edited

Loading