ai.djl.engine.EngineException: Out of range: Invalid id at SentencePieceLibrary.decode() problem #3553

JeeDevUser · 2024-12-05T09:41:55Z

JeeDevUser
Dec 5, 2024

Hi all,

I am using the
ai.djl.sentencepiece.SpProcessor.encode()

method in order to generate the tokenized input, from the source text.
This is a input to the google-t5/t5-large , which should generate text based on a given beginning

-So, this is what I am using as inputText, to be completed by the model:

String text = "generate: Once upon a time, in a land far away,";

-during inference, the model returns the logits - which represent the probabilities for each word in the vocabulary
-now, by using simple Greedy search, I find the highest probability token, and so I build a sequence of tokens to be decoded, something like that:
[0, 32099, 3, 9, 1322, 623, 550, 6, 16, ...]

-the vocabulary size for the T5 large is : 32128
-but, when trying to decode given output array, by using

ai.djl.sentencepiece.SpProcessor.decode()

I am getting:

ai.djl.engine.EngineException: Out of range: Invalid id: 32099
	at ai.djl.sentencepiece.jni.SentencePieceLibrary.decode(Native Method)
	at ai.djl.sentencepiece.SpProcessor.decode(SpProcessor.java:121)
	at tof.T5_Summary.main(T5_Summary.java:227)

What's bothering me is why the token 32099 can't be decoded?
it has a smaller value than the vocabulary size (32128), what's the problem?

frankfliu · 2024-12-06T03:51:25Z

frankfliu
Dec 6, 2024

@JeeDevUser

Can you use HuggingfaceTokenizer instead? It support SentencePiece internally.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai.djl.engine.EngineException: Out of range: Invalid id at SentencePieceLibrary.decode() problem #3553

{{title}}

Replies: 1 comment

{{title}}

Select a reply

ai.djl.engine.EngineException: Out of range: Invalid id at SentencePieceLibrary.decode() problem #3553

JeeDevUser Dec 5, 2024

Replies: 1 comment

frankfliu Dec 6, 2024

JeeDevUser
Dec 5, 2024

frankfliu
Dec 6, 2024