DeBERTa Fast Tokenizer #10498

brandenchan · 2021-03-03T10:46:22Z

Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.

Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokenizer too. Thanks in advance!

Hi @stefan-it! Wondering if you might have any insight on this?

stefan-it · 2021-03-03T15:21:25Z

Hi @brandenchan ,

I think it should be easier with version 2 of DeBERTa, because they use a "normal" sentence piece model now:

#10018

So having a fast alternative would be great.

(The new 128k vocab size should really boost performance on QA tasks!)

LysandreJik · 2021-03-03T16:25:54Z

Indeed, this would be a very nice addition and way easier to implement than for the first DeBERTa. I'm adding the Good Second Issue label so that a community member may work on it. @brandenchan or @stefan-it feel free to take it too if you feel like it!

ShubhamSanghvi · 2021-03-12T17:06:32Z

Hi, I am looking for my first open source contribution. May I take this if its still available?

LysandreJik · 2021-03-12T19:02:40Z

Yes, of course! Thank you!

cronoik · 2021-03-14T06:25:34Z

@ShubhamSanghvi Maybe wait until #10703 is merged.

ShubhamSanghvi · 2021-03-31T00:00:38Z

Hi, as far as I understand I will have to add tokenizer files for debarta_v2 to implement the fast tokenizer?

May I know how could I get the tokenizer files for deberta_v2 models and how to upload them to the intended destinations, which I believe should be (for deberta-v2-xlarge) :

https://huggingface.co/microsoft/deberta-v2-xlarge/resolve/main/

Thanks, Shubham

cronoik · 2021-03-31T20:05:03Z

@ShubhamSanghvi Do you only want to implement the fast tokenizer for DebertaV2 or also for Deberta?

May I know how could I get the tokenizer files for deberta_v2 models

I think this is what you have to figure out. I would check the other models that have a slow sentencepiece tokenizer.

how to upload them to the intended destinations, which I believe should be (for deberta-v2-xlarge)

You can not upload them there. Upload them to some kind of a public cloud and request an upload.

mansimane · 2021-04-08T05:30:59Z

@ShubhamSanghvi Are you planning to create a PR for this issue soon?

ShubhamSanghvi · 2021-04-08T13:09:16Z

Hi @mansimane, I am currently working on it. I am hoping to get it done by next week.

LysandreJik added Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! labels Mar 3, 2021

tanmaylaud mentioned this issue Apr 4, 2021

Implement fast tokenizer for Big Bird models #11052

Closed

ShubhamSanghvi mentioned this issue Apr 23, 2021

Implement Fast Tokenization for Deberta #11387

Merged

5 tasks

LysandreJik closed this as completed in #11387 Apr 30, 2021

ShubhamSanghvi mentioned this issue Apr 30, 2021

Deberta v2 Fast Tokenizer #11529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeBERTa Fast Tokenizer #10498

DeBERTa Fast Tokenizer #10498

brandenchan commented Mar 3, 2021 •

edited

Loading

stefan-it commented Mar 3, 2021

LysandreJik commented Mar 3, 2021

ShubhamSanghvi commented Mar 12, 2021

LysandreJik commented Mar 12, 2021

cronoik commented Mar 14, 2021

ShubhamSanghvi commented Mar 31, 2021

cronoik commented Mar 31, 2021 •

edited

Loading

mansimane commented Apr 8, 2021

ShubhamSanghvi commented Apr 8, 2021

DeBERTa Fast Tokenizer #10498

DeBERTa Fast Tokenizer #10498

Comments

brandenchan commented Mar 3, 2021 • edited Loading

stefan-it commented Mar 3, 2021

LysandreJik commented Mar 3, 2021

ShubhamSanghvi commented Mar 12, 2021

LysandreJik commented Mar 12, 2021

cronoik commented Mar 14, 2021

ShubhamSanghvi commented Mar 31, 2021

cronoik commented Mar 31, 2021 • edited Loading

mansimane commented Apr 8, 2021

ShubhamSanghvi commented Apr 8, 2021

brandenchan commented Mar 3, 2021 •

edited

Loading

cronoik commented Mar 31, 2021 •

edited

Loading