Add missing tokenizer tests - Longformer #17677

tgadeliya · 2022-06-11T17:40:56Z

What does this PR do?

This PR add tests for Longformer tokenizer copying tests from Roberta tokenizer's test suite, because those tokenizers are absolutely identical.

Fixes #16627

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SaulLu @LysandreJik

…ests)

HuggingFaceDocBuilderDev · 2022-06-11T17:59:19Z

The documentation is not available anymore as the PR was closed or merged.

tgadeliya · 2022-06-11T18:15:25Z

I read discussion in merged tokenizers' tests PRs and post ~~Don't~~ Repeat Yourself* on HF blog and I manually add "the copying mechanism". But I don't understand how it is work, so I tried not to change copied test code from Roberta tokenizer tests. If code modification is not a problem, I would like to add some minor changes, e.g. delete commented code and split big test into smaller one.
Could describe "copying mechanism" works in more details?

SaulLu · 2022-07-22T15:36:32Z

Thanks a lot for working on this @tgadeliya!!

As far as I know, there are no identified "practices" for this case (cc @LysandreJik in case you have another opinion). Nevertheless, if changes are relevant, they are obviously welcome. For example, it is possible to indicate the changes made as here:

transformers/src/transformers/models/deberta/modeling_deberta.py

Lines 308 to 309 in d95a32c

    
           # Copied from transformers.models.bert.modeling_bert.BertIntermediate with Bert->Deberta 
        
           class DebertaIntermediate(nn.Module):

If the differences are too long to list perhaps the message can just explain why it diverged from the originally copied and pasted code.

Does this help you?

tgadeliya · 2022-08-17T00:34:49Z

@SaulLu, Sorry for the late reply. Summer is ending :)

Thanks for your comment. Now it is clear for me. Actually, I came to the conclusion, that code cleaning not so necessary considering all pros and cons. So this PR can be reviewed and merged

SaulLu

Sounds great to me! Can you just merge/rebase on main so we can merge your PR?

…former

tgadeliya · 2022-08-19T15:27:51Z

@SaulLu I refreshed this PR, so now it is ready to merge

SaulLu · 2022-08-22T10:14:04Z

Thanks @tgadeliya 🤗

tgadeliya added 6 commits June 11, 2022 19:25

Add tests for Longformer tokenizer (copied from Roberta tokenizer's t…

a13daf2

…ests)

Add changes after linting

d0b3923

Rename test copied from Roberta

3519af0

Fix path to tokenizer in @slow test

576dfb9

move tests to proper place

ab38715

Fix import path of TokenizerTesterMixin

1047dc1

LysandreJik requested a review from SaulLu June 13, 2022 07:24

github-actions bot closed this Jul 21, 2022

huggingface deleted a comment from github-actions bot Jul 22, 2022

SaulLu reopened this Jul 22, 2022

huggingface deleted a comment from github-actions bot Aug 16, 2022

SaulLu changed the title ~~[WIP] Add missing tokenizer tests - Longformer~~ Add missing tokenizer tests - Longformer Aug 19, 2022

SaulLu approved these changes Aug 19, 2022

View reviewed changes

Merge branch 'huggingface:main' into add-missing-tokenizer-tests-long…

d8cb2e0

…former

SaulLu merged commit 0f257a8 into huggingface:main Aug 22, 2022

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022

Add missing tokenizer tests - Longformer (huggingface#17677)

7cc2032

SaulLu mentioned this pull request Sep 28, 2022

Add missing tokenizer test files [:building_construction: in progress] #16627

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing tokenizer tests - Longformer #17677

Add missing tokenizer tests - Longformer #17677

tgadeliya commented Jun 11, 2022

HuggingFaceDocBuilderDev commented Jun 11, 2022 •

edited

Loading

tgadeliya commented Jun 11, 2022

SaulLu commented Jul 22, 2022

tgadeliya commented Aug 17, 2022

SaulLu left a comment

tgadeliya commented Aug 19, 2022

SaulLu commented Aug 22, 2022

Add missing tokenizer tests - Longformer #17677

Add missing tokenizer tests - Longformer #17677

Conversation

tgadeliya commented Jun 11, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jun 11, 2022 • edited Loading

tgadeliya commented Jun 11, 2022

SaulLu commented Jul 22, 2022

tgadeliya commented Aug 17, 2022

SaulLu left a comment

Choose a reason for hiding this comment

tgadeliya commented Aug 19, 2022

SaulLu commented Aug 22, 2022

HuggingFaceDocBuilderDev commented Jun 11, 2022 •

edited

Loading