Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceClient: TypeError: 'NoneType' object is not subscriptable if max_tokens is too big #2514

Closed
lhoestq opened this issue Sep 5, 2024 · 10 comments · Fixed by #2558
Closed

Comments

@lhoestq
Copy link
Member

lhoestq commented Sep 5, 2024

to reproduce:

from huggingface_hub import InferenceClient


model_id = "microsoft/Phi-3-mini-4k-instruct"
client = InferenceClient(model_id)

for message in client.chat_completion(
    messages=[{"role": "user", "content": "Hello there !"}],
    stream=True,
    max_tokens=4091,  # values lower or equal to 4090 work
):
    print(message.choices[0].delta.content, end="")

raises

Traceback (most recent call last):
  File "/Users/quentinlhoest/hf/dataset-rewriter/ttest.py", line 12, in <module>
    print(message.choices[0].delta.content, end="")
TypeError: 'NoneType' object is not subscriptable

cc @Wauplin if you have any idea why it fails

@lhoestq
Copy link
Member Author

lhoestq commented Sep 5, 2024

additionally, it seems the maximum allowed value for max_tokens depends on the input, and seems to decrease with the input size. Maybe it's related to the maximum context size for the model ?

@lhoestq
Copy link
Member Author

lhoestq commented Sep 5, 2024

Ah yes this model only has 4k of context length... I switched to using another one.
Anyway not a big deal but we can surely show a better error message in this case.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 10, 2024

I've checked the raw response from the server and I think this is an inconsistency in TGI API.

When "max_tokens: 4091" and "stream": false` is passed, I'm getting a HTTP 422 with a correct error telling me the correct an issue in the input:

~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -H 'authorization: Bearer hf_****' \
     -d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": false}' \
     -i
HTTP/2 422 
date: Tue, 10 Sep 2024 16:27:33 GMT
content-type: application/json
(...)

{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}%

Now if I make the same query with "stream": true, I'm getting a HTTP 200 and the error message is passed in the first event sent to the client:

~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -H 'authorization: Bearer hf_****' \
     -d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": true}' \
     -i
HTTP/2 200 
date: Tue, 10 Sep 2024 16:27:39 GMT
content-type: text/event-stream
(...)

data: {"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}

data: [DONE]

I would expect a HTTP 422 to be returned in both cases by TGI. WDYT @drbh @OlivierDehaene ?

@Wauplin
Copy link
Contributor

Wauplin commented Sep 10, 2024

Regardless of TGI behavior, I do think InferenceClient should raise an error when a streamed message has an error (I'll work on a fix) but I don't think TGI should be returning an HTTP 200 in the first place here.

@Agrawalchitranshu
Copy link

Hi I am getting while using tools parameter in TGI. Has anyone faced the similar issue?
UnprocessableEntityError: Error code: 422 - {'error': 'expected value at line 1 column 1', 'error_type': 'Input validation error'}

@Wauplin
Copy link
Contributor

Wauplin commented Sep 12, 2024

I think this is a separate issue @Agrawalchitranshu. Could you open a new ticket and give more context about your setup, your failing script and the exact error and traceback? Thanks in advance.

@lhoestq
Copy link
Member Author

lhoestq commented Sep 19, 2024

So I just checked with @OlivierDehaene and actually it makes sense for the API to returns 200: the response is a stream, so if it starts successfully it returns 200 and the client has to parse errors as they arrive in the response stream. @OlivierDehaene confirmed this is the expected behavior.

@OlivierDehaene
Copy link
Member

@Wauplin,

Yes, there is not much we can here on TGI's side.
This is the same behavior than for the other TGI route /generate. Can you re-use the error parsing from there?

@Wauplin
Copy link
Contributor

Wauplin commented Sep 19, 2024

Yes we will fix the client so that if an error is returned in the stream then we raise an exception. But I still find it weird to get a HTTP 200 if the problem is an input validation error. I understand that with a stream if a problem happens while generating, then it has to be sent in the stream but here we know before starting the stream that the input is invalid.

No big deal anyway for the Python client as we can do a quick fix, just saying for other libraries/usage

@OlivierDehaene
Copy link
Member

I can see it both ways tbh, that's just the standard we picked when we created TGI.
It divides errors in 2 categories: status code != 200 are issues with the SSE stream whereas error in the stream are TGI errors.

I don't think there are any real standards on SSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants