ai.djl.engine.EngineException: Out of range: Invalid id at SentencePieceLibrary.decode() problem #3553
JeeDevUser
started this conversation in
General
Replies: 1 comment
-
Can you use |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I am using the
ai.djl.sentencepiece.SpProcessor.encode()
method in order to generate the tokenized input, from the source text.
This is a input to the google-t5/t5-large , which should generate text based on a given beginning
-So, this is what I am using as inputText, to be completed by the model:
String text = "generate: Once upon a time, in a land far away,";
-during inference, the model returns the logits - which represent the probabilities for each word in the vocabulary
-now, by using simple Greedy search, I find the highest probability token, and so I build a sequence of tokens to be decoded, something like that:
[0, 32099, 3, 9, 1322, 623, 550, 6, 16, ...]
-the vocabulary size for the T5 large is : 32128
-but, when trying to decode given output array, by using
ai.djl.sentencepiece.SpProcessor.decode()
I am getting:
What's bothering me is why the token 32099 can't be decoded?
it has a smaller value than the vocabulary size (32128), what's the problem?
Beta Was this translation helpful? Give feedback.
All reactions