Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Used token not calculating when streaming - Llamaindex #5729

Closed
MarouaneZhani opened this issue Dec 13, 2024 · 5 comments · Fixed by Arize-ai/openinference#1174
Closed
Assignees
Labels
bug Something isn't working

Comments

@MarouaneZhani
Copy link

Describe the bug
Token calculation is not done/shown when using astream_chat in Llamaindex.
To Reproduce
`
query_str = "Hello, Tell me a joke!"
chat_engine = CondensePlusContextChatEngine.from_defaults(
index_doc.as_retriever(),
llm=groqLLM
)
responses = []
result = await chat_engine.astream_chat(query_str)

async for response in result.achat_stream:
responses.append(response.delta)
`
the token calculation is working correctly when using chat_engine.achat or chat_engine.chat. so the problem is just present when streaming.
Expected behavior
Token Calculation also when streaming.
Screenshots
Here it's not working, and as you can see the token information is not there:
image

And here when using chat or achat and everything is working as expected:
image

Environment (please complete the following information):

  • Version [e.g. 7.1.0]
@MarouaneZhani MarouaneZhani added bug Something isn't working triage issues that need triage labels Dec 13, 2024
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Dec 13, 2024
@MarouaneZhani
Copy link
Author

I have just seen that this is happening just when using groq as an LLM and not ollama for example!

@RogerHYang RogerHYang self-assigned this Dec 13, 2024
@RogerHYang RogerHYang removed the triage issues that need triage label Dec 13, 2024
@RogerHYang
Copy link
Contributor

This is probably an issue with groq itself not returning token counts when streaming.

As a comparison, OpenAI exhibits the same token count issue when streaming, but can be fixed via additional kwargs as shown below.

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index_doc.as_retriever(),
    llm=OpenAI(
        model="gpt-4o-mini",
        additional_kwargs={"stream_options": {"include_usage": True}},
    ),
)

@RogerHYang
Copy link
Contributor

ok, it looks like the usage can be found in a different location

@RogerHYang RogerHYang moved this from 📘 Todo to 👨‍💻 In progress in phoenix Dec 16, 2024
@RogerHYang RogerHYang moved this from 👨‍💻 In progress to 🔍. Needs Review in phoenix Dec 16, 2024
@github-project-automation github-project-automation bot moved this from 🔍. Needs Review to ✅ Done in phoenix Dec 17, 2024
@RogerHYang
Copy link
Contributor

@MarouaneZhani We released an update in openinference-instrumentation-llama-index 3.1.1. Please give it a try and let us know if you have further questions. Thank you!

@MarouaneZhani
Copy link
Author

@RogerHYang I have tested and it's working now as expected! thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants