Will luke support fast tokenizer #170

TrickyyH · 2022-11-28T01:05:40Z

Hello everyone, I am tring to use luke-large for question answering.
I met serveral issues when finetune the model by SQAUD-like data, most of the issues comes by not supporting fast tokenizer.
So I am wondering if luke will support fast tokenizer in the future, or is any ways to solve the issues.
Thank you so much!

abebe9849 · 2022-12-26T02:52:20Z

Hi!
If you refer to the following blog, it seems that offset_mapping can be used with LUKE. It has not been confirmed whether misalignment does not occur at any time. sorry

https://srad.jp/~yasuoka/journal/651897/

tealgreen0503 · 2023-04-04T06:03:45Z

I thought the same as @TrickyyH. Apart from offset_mapping, for instance, the behaviour of return_overflowing_tokens differs between slow and fast tokenisers. As a result, it becomes difficult to handle long texts in tasks like NER and QA, which LUKE excels at. I would be pleased if you could accommodate the fast tokeniser.

ryokan0123 · 2023-04-04T06:23:10Z

One possible workaround is to use the fast version of the base tokenizer, such as the Fast version of RobertaTokenizer, which LukeTokenizer is based on 'they have the same subword vocabulary).

However, this approach may not support entity-related outputs, which would require additional code to be written.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will luke support fast tokenizer #170

Will luke support fast tokenizer #170

TrickyyH commented Nov 28, 2022

abebe9849 commented Dec 26, 2022

tealgreen0503 commented Apr 4, 2023

ryokan0123 commented Apr 4, 2023

Will luke support fast tokenizer #170

Will luke support fast tokenizer #170

Comments

TrickyyH commented Nov 28, 2022

abebe9849 commented Dec 26, 2022

tealgreen0503 commented Apr 4, 2023

ryokan0123 commented Apr 4, 2023