-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Description
Context
@Kludex suggested opening an issue based on this Slack thread.
I believe that access to the request–response duration for a model call should be a built-in metric, on par with what is provided in usage.
Currently, you have to manually iter() through the agent graph to track inner request–response timings, which is cumbersome.
Problem
ModelResponse holds a timestamp, but it is somewhat unreliable, as it is defined as:
The timestamp of the response. If the model provides a timestamp in the response (as OpenAI does), that will be used.
Conversely, ModelRequest does not have a timestamp. Its individual parts do, presumably at build time, but the full request itself is not timestamped when it is actually sent.
ModelResponse does have a timestamp—presumably when the response is received—but its parts do not, which makes sense since all parts arrive in the same response.
This leads to ambiguity and unnecessary work for anyone wanting to measure or log latency.
Proposal
- Add a
timestampfield (local, UTC) toModelRequest, marking the actual send time. - Clarify the
timestampinModelResponseas the local time the response was received. - Add a
provider_timestampfield toModelResponse(if available, otherwiseNone). - Add a
durationfield (timedeltaorfloatin seconds) toModelResponse, computed asresponse.timestamp - request.timestamp. - (Optional) Add a
durationfield to agent runs, to capture total run time.
Note also that the graph persistent API tracks both ts (timestamp) and duration for NodeSnapshot and NodeSnapshot
References
No response