-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeBERTa Fast Tokenizer #10498
Comments
Hi @brandenchan , I think it should be easier with version 2 of DeBERTa, because they use a "normal" sentence piece model now: So having a fast alternative would be great. (The new 128k vocab size should really boost performance on QA tasks!) |
Indeed, this would be a very nice addition and way easier to implement than for the first DeBERTa. I'm adding the |
Hi, I am looking for my first open source contribution. May I take this if its still available? |
Yes, of course! Thank you! |
@ShubhamSanghvi Maybe wait until #10703 is merged. |
Hi, as far as I understand I will have to add tokenizer files for debarta_v2 to implement the fast tokenizer? May I know how could I get the tokenizer files for deberta_v2 models and how to upload them to the intended destinations, which I believe should be (for deberta-v2-xlarge) : https://huggingface.co/microsoft/deberta-v2-xlarge/resolve/main/ Thanks, Shubham |
@ShubhamSanghvi Do you only want to implement the fast tokenizer for DebertaV2 or also for Deberta?
I think this is what you have to figure out. I would check the other models that have a slow sentencepiece tokenizer.
You can not upload them there. Upload them to some kind of a public cloud and request an upload. |
@ShubhamSanghvi Are you planning to create a PR for this issue soon? |
Hi @mansimane, I am currently working on it. I am hoping to get it done by next week. |
Hi, I am interested in using the DeBERTa model that was recently implemented here and incorporating it into FARM so that it can also be used in open-domain QA settings through Haystack.
Just wondering why there's only a Slow Tokenizer implemented for DeBERTa and wondering if there are plans to create the Fast Tokenizer too. Thanks in advance!
Hi @stefan-it! Wondering if you might have any insight on this?
The text was updated successfully, but these errors were encountered: