InferenceClient: `TypeError: 'NoneType' object is not subscriptable` if max_tokens is too big #2514

lhoestq · 2024-09-05T16:37:25Z

to reproduce:

from huggingface_hub import InferenceClient


model_id = "microsoft/Phi-3-mini-4k-instruct"
client = InferenceClient(model_id)

for message in client.chat_completion(
    messages=[{"role": "user", "content": "Hello there !"}],
    stream=True,
    max_tokens=4091,  # values lower or equal to 4090 work
):
    print(message.choices[0].delta.content, end="")

raises

Traceback (most recent call last):
  File "/Users/quentinlhoest/hf/dataset-rewriter/ttest.py", line 12, in <module>
    print(message.choices[0].delta.content, end="")
TypeError: 'NoneType' object is not subscriptable

cc @Wauplin if you have any idea why it fails

lhoestq · 2024-09-05T17:37:33Z

additionally, it seems the maximum allowed value for max_tokens depends on the input, and seems to decrease with the input size. Maybe it's related to the maximum context size for the model ?

lhoestq · 2024-09-05T17:41:53Z

Ah yes this model only has 4k of context length... I switched to using another one.
Anyway not a big deal but we can surely show a better error message in this case.

Wauplin · 2024-09-10T16:33:43Z

I've checked the raw response from the server and I think this is an inconsistency in TGI API.

When "max_tokens: 4091" and "stream": false` is passed, I'm getting a HTTP 422 with a correct error telling me the correct an issue in the input:

➜  ~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -H 'authorization: Bearer hf_****' \
     -d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": false}' \
     -i
HTTP/2 422 
date: Tue, 10 Sep 2024 16:27:33 GMT
content-type: application/json
(...)

{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}%

Now if I make the same query with "stream": true, I'm getting a HTTP 200 and the error message is passed in the first event sent to the client:

➜  ~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -H 'authorization: Bearer hf_****' \
     -d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": true}' \
     -i
HTTP/2 200 
date: Tue, 10 Sep 2024 16:27:39 GMT
content-type: text/event-stream
(...)

data: {"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}

data: [DONE]

I would expect a HTTP 422 to be returned in both cases by TGI. WDYT @drbh @OlivierDehaene ?

Wauplin · 2024-09-10T16:35:35Z

Regardless of TGI behavior, I do think InferenceClient should raise an error when a streamed message has an error (I'll work on a fix) but I don't think TGI should be returning an HTTP 200 in the first place here.

Agrawalchitranshu · 2024-09-12T11:32:59Z

Hi I am getting while using tools parameter in TGI. Has anyone faced the similar issue?
UnprocessableEntityError: Error code: 422 - {'error': 'expected value at line 1 column 1', 'error_type': 'Input validation error'}

Wauplin · 2024-09-12T15:58:47Z

I think this is a separate issue @Agrawalchitranshu. Could you open a new ticket and give more context about your setup, your failing script and the exact error and traceback? Thanks in advance.

lhoestq · 2024-09-19T15:06:03Z

So I just checked with @OlivierDehaene and actually it makes sense for the API to returns 200: the response is a stream, so if it starts successfully it returns 200 and the client has to parse errors as they arrive in the response stream. @OlivierDehaene confirmed this is the expected behavior.

OlivierDehaene · 2024-09-19T15:35:26Z

@Wauplin,

Yes, there is not much we can here on TGI's side.
This is the same behavior than for the other TGI route /generate. Can you re-use the error parsing from there?

Wauplin · 2024-09-19T15:48:53Z

Yes we will fix the client so that if an error is returned in the stream then we raise an exception. But I still find it weird to get a HTTP 200 if the problem is an input validation error. I understand that with a stream if a problem happens while generating, then it has to be sent in the stream but here we know before starting the stream that the input is invalid.

No big deal anyway for the Python client as we can do a quick fix, just saying for other libraries/usage

OlivierDehaene · 2024-09-19T16:24:19Z

I can see it both ways tbh, that's just the standard we picked when we created TGI.
It divides errors in 2 categories: status code != 200 are issues with the SSE stream whereas error in the stream are TGI errors.

I don't think there are any real standards on SSE.

Wauplin mentioned this issue Sep 20, 2024

Raise error if encountered in chat completion SSE stream #2558

Merged

Wauplin closed this as completed in #2558 Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferenceClient: `TypeError: 'NoneType' object is not subscriptable` if max_tokens is too big #2514

InferenceClient: `TypeError: 'NoneType' object is not subscriptable` if max_tokens is too big #2514

lhoestq commented Sep 5, 2024 •

edited

Loading

lhoestq commented Sep 5, 2024

lhoestq commented Sep 5, 2024

Wauplin commented Sep 10, 2024

Wauplin commented Sep 10, 2024

Agrawalchitranshu commented Sep 12, 2024

Wauplin commented Sep 12, 2024

lhoestq commented Sep 19, 2024

OlivierDehaene commented Sep 19, 2024

Wauplin commented Sep 19, 2024

OlivierDehaene commented Sep 19, 2024

InferenceClient: TypeError: 'NoneType' object is not subscriptable if max_tokens is too big #2514

InferenceClient: TypeError: 'NoneType' object is not subscriptable if max_tokens is too big #2514

Comments

lhoestq commented Sep 5, 2024 • edited Loading

lhoestq commented Sep 5, 2024

lhoestq commented Sep 5, 2024

Wauplin commented Sep 10, 2024

Wauplin commented Sep 10, 2024

Agrawalchitranshu commented Sep 12, 2024

Wauplin commented Sep 12, 2024

lhoestq commented Sep 19, 2024

OlivierDehaene commented Sep 19, 2024

Wauplin commented Sep 19, 2024

OlivierDehaene commented Sep 19, 2024

InferenceClient: `TypeError: 'NoneType' object is not subscriptable` if max_tokens is too big #2514

InferenceClient: `TypeError: 'NoneType' object is not subscriptable` if max_tokens is too big #2514

lhoestq commented Sep 5, 2024 •

edited

Loading