Ensure non-streaming usage data from function calling is in history #5676

stephentoub · 2024-11-20T02:03:54Z

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.

Fixes #5668

Microsoft Reviewers: Open in CodeFlow

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.

test/Libraries/Microsoft.Extensions.AI.Integration.Tests/ChatClientIntegrationTests.cs

SteveSandersonMS · 2024-11-20T02:53:37Z

While this is a valid solution, I suspect it's not what #5668 had in mind. It sounds like they were not expecting to have to set KeepFunctionCallingMessages = true or to have to sum over a set of auto-added messages (somehow working out which ones to sum over), and just expected the response object to contain the sum over usage details across all sub-requests.

I'm fairly sure you've done it this way because there could be usage information that we can't sum automatically, i.e., the UsageDetails.AdditionalProperties entries.

Even with this we're still discarding other intermediate state such as the AdditionalProperties and RawRepresentation on the original ChatCompletion objects, but then FunctionInvokingChatClient is meant as a convenient but lossy process that primarily just gives you the final post-function-call results and doesn't try to preserve and expose all the intermediate states. People who must have all the intermediate states verbatim would need not to use FunctionInvokingChatClient.

Given the lossiness of FunctionInvokingChatClient I'd also be open to the alternative of auto-summing known properties on UsageDetails and ignoring its AdditionalProperties. I know the casualty there would be gpt4-o1 reasoning tokens - at least in the short term, since we could potentially add that as a first-class thing to UsageDetails in the long term.

Overall I'm fine with whatever you prefer here.

stephentoub · 2024-11-20T03:28:03Z

It sounds like they were not expecting to have to set KeepFunctionCallingMessages = true

That's the default, fwiw

I'm fairly sure you've done it this way because there could be usage information that we can't sum automatically, i.e., the UsageDetails.AdditionalProperties entries.

Yes

I know the casualty there would be gpt4-o1 reasoning tokens

And cached tokens. And predicted tokens. All of those have emerged in just the last few months. Presumably there will be more in the future.

I'd also be open to the alternative of auto-summing known properties on UsageDetails and ignoring its AdditionalProperties

That makes me nervous. The mitigating factor is at least all the ones we're aware of are included in the totals. But they also all have different pricing structures.

I liked this PR's approach because it logically matches the streaming case, but I understand the hesitancy and the rationale that the resulting ChatCompletion should represent the totality of the operation. I'm just concerned about the summing due to the financial aspect of tokens.

Another approach would be to include all of the additional properties in the final one, and assume that anything that's a property with an int value is summable. Thats not necessarily a valid assumption, but we could do it anyway. Or find another way to include everything, like putting in lists instead of indovidual values, or having a slot in additional properties thats a list of all the individual usagr details.

SteveSandersonMS · 2024-11-20T14:25:12Z

That's the default, fwiw

Thanks for the reminder. I usually expect bool flags to default to false (in ASP.NET Core that's pretty much a rule, even if it makes naming harder) but it does make sense that we have this behavior as default.

That adds extra weight to the validity of this design.

Another approach would be to include all of the additional properties in the final one, and assume that anything that's a property with an int value is summable. Thats not necessarily a valid assumption, but we could do it anyway. Or find another way to include everything, like putting in lists instead of indovidual values, or having a slot in additional properties thats a list of all the individual usagr details.

Summing all ints seems very sketchy. Tracking all the UsageDetails in a list in AdditionalProperties would technically preserve the information but it's likely harder to use that information as a consumer because (1) it's not accessible in a strongly-typed way and (2) even if you find the list, it's hard to match up entries with their original sources.

I think the approach you've taken here is good, at least as good as we can do. It retains precision at the cost of a bit of ease-of-use. As long as it's relatively uncommon for people to need to track token usage, there's not much drawback to what this PR does. If it does turn into a common requirement we could look at adding some helper method that does the counting given an IEnumerable<ChatMessage> or something like that (or encourage IChatClient implementations to provide one, since they know about their own custom usage details extension data).

…otnet#5676) It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.

lucasmeijer · 2024-11-28T07:45:31Z

I liked this PR's approach because it logically matches the streaming case, but I understand the hesitancy and the rationale that the resulting ChatCompletion should represent the totality of the operation. I'm just concerned about the summing due to the financial aspect of tokens.

Original issue submitter here: as a user of this library, the reason I use it is to abstract away the small and big differences between the different LLM providers. Being able to determine cost, or at least relevant token counts seems like a good idea from a user perspective, implementation challenges aside.

I have some sympathy for the hesitancy due to financial impacts of tokens, but right now you're just punting this problem to the user, who is probably more likely to get the logic wrong.

Sidenote: when working with raw LLM's over http, especially in streaming mode, I've found semantic differences between providers. IIRC anthropic's usage data looked like I was supposed to only take the last one, and they did the summing on their end, where IIRC openai's ones looked like the responsibility for the summing is on my end.

Tricky situation. Ideally there was a nice abstraction library that would shield me from this problem :-)

stephentoub · 2024-11-28T08:39:22Z

but right now you're just punting this problem to the user, who is probably more likely to get the logic wrong.

How so? This makes all the data available, and in a standard way. All the intermediate content from function calling, such as the function call requests and responses, are included in the history for subsequent examination if the dev wants them. Those intermediate token counts are now all there, too, should they be desired. The strongly typed counts are easily accessed, and if a dev cares about the other ones, like reasoning token counts (which are part of the output token counts), then they know how to interpret that vendor-specific data, and better than we do.

Further, most systems I've seen that care about tracking token counts do it via telemetry, and the built in AddOpenTelemetry includes the token counts per the otel spec. If that's inserted in the pipeline after function invocation, all the intermediate counts will be logged automatically.

SteveSandersonMS · 2024-11-28T10:29:20Z

How so? This makes all the data available, and in a standard way.

My guess is that @lucasmeijer means there's some nontrivial, possibly error-prone step of converting that raw data into the desired interpretation, which means interpreting and aggregating vendor-specific data in a useful way that ends up exactly matching real-world costs. @lucasmeijer would prefer this library to take on the burden of handling that so that individual apps don't have to. If we don't, we're "punting this problem to the user".

I can totally sympathise with that. However I think it would have to be some kind of additional abstraction within this library rather than being something we just do automatically. This library can't just innately know how to aggregate cost data for all LLM backends; we'd need some abstraction that LLM backends can optionally implement in order to do that aggregation based on their vendor-specific data. This is what I meant above with "encourage IChatClient implementations to provide [counting]".

stephentoub · 2024-11-28T14:49:30Z

I can totally sympathise with that. However I think it would have to be some kind of additional abstraction within this library rather than being something we just do automatically.

Alternatively, automatically sum input/output/total counts and

Explicitly drop UsageDetails.AdditionalProperties on those sum instances, or
Declare by design that numeric properties in UsageDetails.AdditionalProperties are considered summable, or
Add a separate AdditionalCounts dictionary that's summable, or
Put a collection of the original UsageDetails into AdditionalProperties, or
...

SteveSandersonMS · 2024-11-28T15:28:31Z

Of those, the AdditionalCounts concept seems most interesting as it avoids the pitfalls of the others (either discarding info because we don't understand it, making assumptions we can't be sure of, or failing to do the aggregation so it remains the app's problem).

I like the idea of UsageDetails containing an AdditionalCounts dictionary that we declare must only contain summable values. That seems a lot more useful than AdditionalProperties, to such an extent I wonder if UsageDetails should even have AdditionalProperties anyway (anything you can put there could also be stored at the ChatCompletion level).

stephentoub · 2024-11-28T18:56:12Z

I like the idea of UsageDetails containing an AdditionalCounts dictionary that we declare must only contain summable values. That seems a lot more useful than AdditionalProperties, to such an extent I wonder if UsageDetails should even have AdditionalProperties anyway (anything you can put there could also be stored at the ChatCompletion level).

@SteveSandersonMS, want to submit a pr?

SteveSandersonMS · 2024-11-29T19:34:12Z

want to submit a pr?

#5707

Ensure non-streaming usage data from function calling is in history

0e6a043

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.

stephentoub requested a review from a team as a code owner November 20, 2024 02:03

dotnet-policy-service bot assigned stephentoub Nov 20, 2024

stephentoub mentioned this pull request Nov 20, 2024

FunctionCallingChatClient reports incorrect UsageData #5668

Closed

SteveSandersonMS reviewed Nov 20, 2024

View reviewed changes

test/Libraries/Microsoft.Extensions.AI.Integration.Tests/ChatClientIntegrationTests.cs Show resolved Hide resolved

SteveSandersonMS approved these changes Nov 20, 2024

View reviewed changes

stephentoub merged commit 7ebb34d into dotnet:main Nov 20, 2024
6 checks passed

stephentoub deleted the addusagetohistory branch November 20, 2024 15:27

github-actions bot locked and limited conversation to collaborators Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure non-streaming usage data from function calling is in history #5676

Ensure non-streaming usage data from function calling is in history #5676

stephentoub commented Nov 20, 2024 •

edited by dotnet-policy-service bot

Loading

SteveSandersonMS commented Nov 20, 2024

stephentoub commented Nov 20, 2024

SteveSandersonMS commented Nov 20, 2024

lucasmeijer commented Nov 28, 2024

stephentoub commented Nov 28, 2024 •

edited

Loading

SteveSandersonMS commented Nov 28, 2024

stephentoub commented Nov 28, 2024

SteveSandersonMS commented Nov 28, 2024

stephentoub commented Nov 28, 2024

SteveSandersonMS commented Nov 29, 2024

Ensure non-streaming usage data from function calling is in history #5676

Ensure non-streaming usage data from function calling is in history #5676

Conversation

stephentoub commented Nov 20, 2024 • edited by dotnet-policy-service bot Loading

Microsoft Reviewers: Open in CodeFlow

SteveSandersonMS commented Nov 20, 2024

stephentoub commented Nov 20, 2024

SteveSandersonMS commented Nov 20, 2024

lucasmeijer commented Nov 28, 2024

stephentoub commented Nov 28, 2024 • edited Loading

SteveSandersonMS commented Nov 28, 2024

stephentoub commented Nov 28, 2024

SteveSandersonMS commented Nov 28, 2024

stephentoub commented Nov 28, 2024

SteveSandersonMS commented Nov 29, 2024

stephentoub commented Nov 20, 2024 •

edited by dotnet-policy-service bot

Loading

stephentoub commented Nov 28, 2024 •

edited

Loading