You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I met a problem that the encode and decode of the GPT2Tokenizer are not mutually inverse.
For example, Alice's ["\u0120compar": 4616, "isons": 9886] may decoded as ["\u0120comparisons": 17909] by Bob, so that Bob can't recover the message correctly.
I met a problem that the
encode
anddecode
of the GPT2Tokenizer are not mutually inverse.For example, Alice's ["\u0120compar": 4616, "isons": 9886] may decoded as ["\u0120comparisons": 17909] by Bob, so that Bob can't recover the message correctly.
(The mapping table can be viewed from https://huggingface.co/gpt2/raw/main/vocab.json.)
There are more examples:
In my environment, the versions of the packages are the same as you said.
So, I'd like to ask you how to solve this problem.
Besides, I noticed that you rewrite GPT2Tokenizer.decode and GPT2Tokenizer._convert_token_to_id. Is it related to the problem?
Thank you!
The text was updated successfully, but these errors were encountered: