-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InferenceClient: TypeError: 'NoneType' object is not subscriptable
if max_tokens is too big
#2514
Comments
additionally, it seems the maximum allowed value for max_tokens depends on the input, and seems to decrease with the input size. Maybe it's related to the maximum context size for the model ? |
Ah yes this model only has 4k of context length... I switched to using another one. |
I've checked the raw response from the server and I think this is an inconsistency in TGI API. When ➜ ~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'authorization: Bearer hf_****' \
-d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": false}' \
-i
HTTP/2 422
date: Tue, 10 Sep 2024 16:27:33 GMT
content-type: application/json
(...)
{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}% Now if I make the same query with ➜ ~ curl -X POST https://api-inference.huggingface.co:443/models/microsoft/Phi-3-mini-4k-instruct/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'authorization: Bearer hf_****' \
-d '{"model": "microsoft/Phi-3-mini-4k-instruct", "messages": [{"role": "user", "content": "Hello there !"}], "max_tokens": 4091, "stream": true}' \
-i
HTTP/2 200
date: Tue, 10 Sep 2024 16:27:39 GMT
content-type: text/event-stream
(...)
data: {"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 6 `inputs` tokens and 4091 `max_new_tokens`","error_type":"validation"}
data: [DONE] I would expect a HTTP 422 to be returned in both cases by TGI. WDYT @drbh @OlivierDehaene ? |
Regardless of TGI behavior, I do think |
Hi I am getting while using tools parameter in TGI. Has anyone faced the similar issue? |
I think this is a separate issue @Agrawalchitranshu. Could you open a new ticket and give more context about your setup, your failing script and the exact error and traceback? Thanks in advance. |
So I just checked with @OlivierDehaene and actually it makes sense for the API to returns 200: the response is a stream, so if it starts successfully it returns 200 and the client has to parse errors as they arrive in the response stream. @OlivierDehaene confirmed this is the expected behavior. |
Yes, there is not much we can here on TGI's side. |
Yes we will fix the client so that if an error is returned in the stream then we raise an exception. But I still find it weird to get a HTTP 200 if the problem is an input validation error. I understand that with a stream if a problem happens while generating, then it has to be sent in the stream but here we know before starting the stream that the input is invalid. No big deal anyway for the Python client as we can do a quick fix, just saying for other libraries/usage |
I can see it both ways tbh, that's just the standard we picked when we created TGI. I don't think there are any real standards on SSE. |
to reproduce:
raises
cc @Wauplin if you have any idea why it fails
The text was updated successfully, but these errors were encountered: