[BUG] Used token not calculating when streaming - Llamaindex #5729

MarouaneZhani · 2024-12-13T07:02:28Z

Describe the bug
Token calculation is not done/shown when using astream_chat in Llamaindex.
To Reproduce
`
query_str = "Hello, Tell me a joke!"
chat_engine = CondensePlusContextChatEngine.from_defaults(
index_doc.as_retriever(),
llm=groqLLM
)
responses = []
result = await chat_engine.astream_chat(query_str)

async for response in result.achat_stream:
responses.append(response.delta)
`
the token calculation is working correctly when using chat_engine.achat or chat_engine.chat. so the problem is just present when streaming.
Expected behavior
Token Calculation also when streaming.
Screenshots
Here it's not working, and as you can see the token information is not there:

And here when using chat or achat and everything is working as expected:

Environment (please complete the following information):

Version [e.g. 7.1.0]

MarouaneZhani · 2024-12-13T14:07:29Z

I have just seen that this is happening just when using groq as an LLM and not ollama for example!

RogerHYang · 2024-12-13T23:27:03Z

This is probably an issue with groq itself not returning token counts when streaming.

As a comparison, OpenAI exhibits the same token count issue when streaming, but can be fixed via additional kwargs as shown below.

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index_doc.as_retriever(),
    llm=OpenAI(
        model="gpt-4o-mini",
        additional_kwargs={"stream_options": {"include_usage": True}},
    ),
)

RogerHYang · 2024-12-14T00:11:13Z

ok, it looks like the usage can be found in a different location

RogerHYang · 2024-12-17T21:17:21Z

@MarouaneZhani We released an update in openinference-instrumentation-llama-index 3.1.1. Please give it a try and let us know if you have further questions. Thank you!

MarouaneZhani · 2024-12-20T09:20:11Z

@RogerHYang I have tested and it's working now as expected! thanks for the quick fix!

MarouaneZhani added bug Something isn't working triage issues that need triage labels Dec 13, 2024

github-project-automation bot added this to phoenix Dec 13, 2024

github-project-automation bot moved this to 📘 Todo in phoenix Dec 13, 2024

RogerHYang self-assigned this Dec 13, 2024

RogerHYang removed the triage issues that need triage label Dec 13, 2024

RogerHYang moved this from 📘 Todo to 👨‍💻 In progress in phoenix Dec 16, 2024

RogerHYang mentioned this issue Dec 16, 2024

fix(llama-index): extract token counts for groq when streaming Arize-ai/openinference#1174

Merged

RogerHYang moved this from 👨‍💻 In progress to 🔍. Needs Review in phoenix Dec 16, 2024

RogerHYang closed this as completed in Arize-ai/openinference#1174 Dec 17, 2024

github-project-automation bot moved this from 🔍. Needs Review to ✅ Done in phoenix Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Used token not calculating when streaming - Llamaindex #5729

[BUG] Used token not calculating when streaming - Llamaindex #5729

MarouaneZhani commented Dec 13, 2024

MarouaneZhani commented Dec 13, 2024

RogerHYang commented Dec 13, 2024

RogerHYang commented Dec 14, 2024

RogerHYang commented Dec 17, 2024

MarouaneZhani commented Dec 20, 2024

[BUG] Used token not calculating when streaming - Llamaindex #5729

[BUG] Used token not calculating when streaming - Llamaindex #5729

Comments

MarouaneZhani commented Dec 13, 2024

MarouaneZhani commented Dec 13, 2024

RogerHYang commented Dec 13, 2024

RogerHYang commented Dec 14, 2024

RogerHYang commented Dec 17, 2024

MarouaneZhani commented Dec 20, 2024