pydantic · DouweM · Aug 8, 2025 · Aug 6, 2025 · Aug 6, 2025 · Aug 7, 2025
diff --git a/docs/ag-ui.md b/docs/ag-ui.md
@@ -35,7 +35,7 @@ There are three ways to run a Pydantic AI agent based on AG-UI run input with st
 
 1. [`run_ag_ui()`][pydantic_ai.ag_ui.run_ag_ui] takes an agent and an AG-UI [`RunAgentInput`](https://docs.ag-ui.com/sdk/python/core/types#runagentinput) object, and returns a stream of AG-UI events encoded as strings. It also takes optional [`Agent.iter()`][pydantic_ai.Agent.iter] arguments including `deps`. Use this if you're using a web framework not based on Starlette (e.g. Django or Flask) or want to modify the input or output some way.
 2. [`handle_ag_ui_request()`][pydantic_ai.ag_ui.handle_ag_ui_request] takes an agent and a Starlette request (e.g. from FastAPI) coming from an AG-UI frontend, and returns a streaming Starlette response of AG-UI events that you can return directly from your endpoint. It also takes optional [`Agent.iter()`][pydantic_ai.Agent.iter] arguments including `deps`, that you can vary for each request (e.g. based on the authenticated user).
-3. [`Agent.to_ag_ui()`][pydantic_ai.Agent.to_ag_ui] returns an ASGI application that handles every AG-UI request by running the agent. It also takes optional [`Agent.iter()`][pydantic_ai.Agent.iter] arguments including `deps`, but these will be the same for each request, with the exception of the AG-UI state that's injected as described under [state management](#state-management). This ASGI app can be [mounted](https://fastapi.tiangolo.com/advanced/sub-applications/) at a given path in an existing FastAPI app.
+3. [`Agent.to_ag_ui()`][pydantic_ai.agent.AbstractAgent.to_ag_ui] returns an ASGI application that handles every AG-UI request by running the agent. It also takes optional [`Agent.iter()`][pydantic_ai.Agent.iter] arguments including `deps`, but these will be the same for each request, with the exception of the AG-UI state that's injected as described under [state management](#state-management). This ASGI app can be [mounted](https://fastapi.tiangolo.com/advanced/sub-applications/) at a given path in an existing FastAPI app.
 
 ### Handle run input and output directly
 
@@ -117,7 +117,7 @@ This will expose the agent as an AG-UI server, and your frontend can start sendi
 
 ### Stand-alone ASGI app
 
-This example uses [`Agent.to_ag_ui()`][pydantic_ai.Agent.to_ag_ui] to turn the agent into a stand-alone ASGI application:
+This example uses [`Agent.to_ag_ui()`][pydantic_ai.agent.AbstractAgent.to_ag_ui] to turn the agent into a stand-alone ASGI application:
 
 ```py {title="agent_to_ag_ui.py" py="3.10" hl_lines="4"}
 from pydantic_ai import Agent
@@ -265,7 +265,7 @@ uvicorn ag_ui_tool_events:app --host 0.0.0.0 --port 9000
 
 ## Examples
 
-For more examples of how to use [`to_ag_ui()`][pydantic_ai.Agent.to_ag_ui] see
+For more examples of how to use [`to_ag_ui()`][pydantic_ai.agent.AbstractAgent.to_ag_ui] see
 [`pydantic_ai_examples.ag_ui`](https://github.com/pydantic/pydantic-ai/tree/main/examples/pydantic_ai_examples/ag_ui),
 which includes a server for use with the
 [AG-UI Dojo](https://docs.ag-ui.com/tutorials/debugging#the-ag-ui-dojo).
diff --git a/docs/agents.md b/docs/agents.md
diff --git a/docs/api/agent.md b/docs/api/agent.md
@@ -4,6 +4,8 @@
     options:
         members:
             - Agent
+            - AbstractAgent
+            - WrapperAgent
             - AgentRun
             - AgentRunResult
             - EndStrategy

diff --git a/docs/changelog.md b/docs/changelog.md
@@ -12,6 +12,14 @@ Pydantic AI is still pre-version 1, so breaking changes will occur, however:
 !!! note
     Here's a filtered list of the breaking changes for each version to help you upgrade Pydantic AI.
 
+### v0.7.0 (2025-08-08)
+
+See [#2458](https://github.com/pydantic/pydantic-ai/pull/2458) - `pydantic_ai.models.StreamedResponse` now yields a `FinalResultEvent` along with the existing `PartStartEvent` and `PartDeltaEvent`. If you're using `pydantic_ai.direct.model_request_stream` or `pydantic_ai.direct.model_request_stream_sync`, you may need to update your code to account for this.
+
+See [#2458](https://github.com/pydantic/pydantic-ai/pull/2458) - `pydantic_ai.models.Model.request_stream` now receives a `run_context` argument. If you've implemented a custom `Model` subclass, you will need to account for this.
+
+See [#2458](https://github.com/pydantic/pydantic-ai/pull/2458) - `pydantic_ai.models.StreamedResponse` now requires a `model_request_parameters` field and constructor argument. If you've implemented a custom `Model` subclass and implemented `request_stream`, you will need to account for this.
+
 ### v0.6.0 (2025-08-06)
 
 This release was meant to clean some old deprecated code, so we can get a step closer to V1.

diff --git a/docs/direct.md b/docs/direct.md
@@ -26,9 +26,9 @@ model_response = model_request_sync(
 )
 
 print(model_response.parts[0].content)
-#> Paris
+#> The capital of France is Paris.
 print(model_response.usage)
-#> Usage(requests=1, request_tokens=56, response_tokens=1, total_tokens=57)
+#> Usage(requests=1, request_tokens=56, response_tokens=7, total_tokens=63)
 ```
 
 _(This example is complete, it can be run "as is")_
@@ -122,7 +122,7 @@ model_response = model_request_sync(
 )
 
 print(model_response.parts[0].content)
-#> Paris
+#> The capital of France is Paris.
 ```
 
 _(This example is complete, it can be run "as is")_
@@ -145,7 +145,7 @@ model_response = model_request_sync(
 )
 
 print(model_response.parts[0].content)
-#> Paris
+#> The capital of France is Paris.
 ```
 
 See [Debugging and Monitoring](logfire.md) for more details, including how to instrument with plain OpenTelemetry without Logfire.
diff --git a/docs/logfire.md b/docs/logfire.md
@@ -119,7 +119,7 @@ We can also query data with SQL in Logfire to monitor the performance of an appl
     agent = Agent('openai:gpt-4o')
     result = agent.run_sync('What is the capital of France?')
     print(result.output)
-    #> Paris
+    #> The capital of France is Paris.
     ```
 
     1. See the [`logfire.instrument_httpx` docs][logfire.Logfire.instrument_httpx] more details, `capture_all=True` means both headers and body are captured for both the request and response.
@@ -139,7 +139,7 @@ We can also query data with SQL in Logfire to monitor the performance of an appl
     agent = Agent('openai:gpt-4o')
     result = agent.run_sync('What is the capital of France?')
     print(result.output)
-    #> Paris
+    #> The capital of France is Paris.
     ```
 
     ![Logfire without HTTPX instrumentation](img/logfire-without-httpx.png)
@@ -272,7 +272,7 @@ logfire.instrument_pydantic_ai(event_mode='logs')
 agent = Agent('openai:gpt-4o')
 result = agent.run_sync('What is the capital of France?')
 print(result.output)
-#> Paris
+#> The capital of France is Paris.
 ```
 
 For now, this won't look as good in the Logfire UI, but we're working on it.

diff --git a/docs/message-history.md b/docs/message-history.md
@@ -7,8 +7,8 @@ Pydantic AI provides access to messages exchanged during an agent run. These mes
 After running an agent, you can access the messages exchanged during that run from the `result` object.
 
 Both [`RunResult`][pydantic_ai.agent.AgentRunResult]
-(returned by [`Agent.run`][pydantic_ai.Agent.run], [`Agent.run_sync`][pydantic_ai.Agent.run_sync])
-and [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult] (returned by [`Agent.run_stream`][pydantic_ai.Agent.run_stream]) have the following methods:
+(returned by [`Agent.run`][pydantic_ai.agent.AbstractAgent.run], [`Agent.run_sync`][pydantic_ai.agent.AbstractAgent.run_sync])
+and [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult] (returned by [`Agent.run_stream`][pydantic_ai.agent.AbstractAgent.run_stream]) have the following methods:
 
 - [`all_messages()`][pydantic_ai.agent.AgentRunResult.all_messages]: returns all messages, including messages from prior runs. There's also a variant that returns JSON bytes, [`all_messages_json()`][pydantic_ai.agent.AgentRunResult.all_messages_json].
 - [`new_messages()`][pydantic_ai.agent.AgentRunResult.new_messages]: returns only the messages from the current run. There's also a variant that returns JSON bytes, [`new_messages_json()`][pydantic_ai.agent.AgentRunResult.new_messages_json].
@@ -141,8 +141,8 @@ _(This example is complete, it can be run "as is" — you'll need to add `asynci
 The primary use of message histories in Pydantic AI is to maintain context across multiple agent runs.
 
 To use existing messages in a run, pass them to the `message_history` parameter of
-[`Agent.run`][pydantic_ai.Agent.run], [`Agent.run_sync`][pydantic_ai.Agent.run_sync] or
-[`Agent.run_stream`][pydantic_ai.Agent.run_stream].
+[`Agent.run`][pydantic_ai.agent.AbstractAgent.run], [`Agent.run_sync`][pydantic_ai.agent.AbstractAgent.run_sync] or
+[`Agent.run_stream`][pydantic_ai.agent.AbstractAgent.run_stream].
 
 If `message_history` is set and not empty, a new system prompt is not generated — we assume the existing message history includes a system prompt.
 

diff --git a/docs/models/google.md b/docs/models/google.md
@@ -115,14 +115,14 @@ You can supply a custom `GoogleProvider` instance using the `provider` argument
 This is useful if you're using a custom-compatible endpoint with the Google Generative Language API.
 
 ```python
-from google import genai
+from google.genai import Client
 from google.genai.types import HttpOptions
 
 from pydantic_ai import Agent
 from pydantic_ai.models.google import GoogleModel
 from pydantic_ai.providers.google import GoogleProvider
 
-client = genai.Client(
+client = Client(
     api_key='gemini-custom-api-key',
     http_options=HttpOptions(base_url='gemini-custom-base-url'),
 )

diff --git a/docs/multi-agent-applications.md b/docs/multi-agent-applications.md
@@ -16,7 +16,7 @@ If you want to hand off control to another agent completely, without coming back
 
 Since agents are stateless and designed to be global, you do not need to include the agent itself in agent [dependencies](dependencies.md).
 
-You'll generally want to pass [`ctx.usage`][pydantic_ai.RunContext.usage] to the [`usage`][pydantic_ai.Agent.run] keyword argument of the delegate agent run so usage within that run counts towards the total usage of the parent agent run.
+You'll generally want to pass [`ctx.usage`][pydantic_ai.RunContext.usage] to the [`usage`][pydantic_ai.agent.AbstractAgent.run] keyword argument of the delegate agent run so usage within that run counts towards the total usage of the parent agent run.
 
 !!! note "Multiple models"
     Agent delegation doesn't need to use the same model for each agent. If you choose to use different models within a run, calculating the monetary cost from the final [`result.usage()`][pydantic_ai.agent.AgentRunResult.usage] of the run will not be possible, but you can still use [`UsageLimits`][pydantic_ai.usage.UsageLimits] to avoid unexpected costs.

diff --git a/docs/output.md b/docs/output.md
@@ -482,6 +482,13 @@ There two main challenges with streamed results:
 1. Validating structured responses before they're complete, this is achieved by "partial validation" which was recently added to Pydantic in [pydantic/pydantic#10748](https://github.com/pydantic/pydantic/pull/10748).
 2. When receiving a response, we don't know if it's the final response without starting to stream it and peeking at the content. Pydantic AI streams just enough of the response to sniff out if it's a tool call or an output, then streams the whole thing and calls tools, or returns the stream as a [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult].
 
+!!! note
+    As the `run_stream()` method will consider the first output matching the `output_type` to be the final output,
+    it will stop running the agent graph and will not execute any tool calls made by the model after this "final" output.
+
+    If you want to always run the agent graph to completion and stream all events from the model's streaming response and the agent's execution of tools,
+    use [`agent.run()`][pydantic_ai.agent.AbstractAgent.run] with an `event_stream_handler` ([docs](agents.md#streaming-all-events)) or [`agent.iter()`][pydantic_ai.agent.AbstractAgent.iter] ([docs](agents.md#streaming-all-events-and-output)) instead.
+
 ### Streaming Text
 
 Example of streamed text output:
@@ -505,7 +512,7 @@ async def main():
 ```
 
 1. Streaming works with the standard [`Agent`][pydantic_ai.Agent] class, and doesn't require any special setup, just a model that supports streaming (currently all models support streaming).
-2. The [`Agent.run_stream()`][pydantic_ai.Agent.run_stream] method is used to start a streamed run, this method returns a context manager so the connection can be closed when the stream completes.
+2. The [`Agent.run_stream()`][pydantic_ai.agent.AbstractAgent.run_stream] method is used to start a streamed run, this method returns a context manager so the connection can be closed when the stream completes.
 3. Each item yield by [`StreamedRunResult.stream_text()`][pydantic_ai.result.StreamedRunResult.stream_text] is the complete text response, extended as new data is received.
 
 _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_
@@ -540,22 +547,20 @@ _(This example is complete, it can be run "as is" — you'll need to add `asynci
 
 ### Streaming Structured Output
 
-Not all types are supported with partial validation in Pydantic, see [pydantic/pydantic#10748](https://github.com/pydantic/pydantic/pull/10748), generally for model-like structures it's currently best to use `TypeDict`.
-
-Here's an example of streaming a use profile as it's built:
+Here's an example of streaming a user profile as it's built:
 
 ```python {title="streamed_user_profile.py" line_length="120"}
 from datetime import date
 
-from typing_extensions import TypedDict
+from typing_extensions import TypedDict, NotRequired
 
 from pydantic_ai import Agent
 
 
-class UserProfile(TypedDict, total=False):
+class UserProfile(TypedDict):
     name: str
-    dob: date
-    bio: str
+    dob: NotRequired[date]
+    bio: NotRequired[str]
 
 
 agent = Agent(
@@ -581,7 +586,7 @@ async def main():
 
 _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_
 
-If you want fine-grained control of validation, particularly catching validation errors, you can use the following pattern:
+If you want fine-grained control of validation, you can use the following pattern to get the entire partial [`ModelResponse`][pydantic_ai.messages.ModelResponse]:
 
 ```python {title="streamed_user_profile.py" line_length="120"}
 from datetime import date

diff --git a/docs/retries.md b/docs/retries.md
@@ -86,6 +86,7 @@ wait_strategy_2 = wait_retry_after(
 ```
 
 This wait strategy:
+
 - Automatically parses `Retry-After` headers from HTTP 429 responses
 - Supports both seconds format (`"30"`) and HTTP date format (`"Wed, 21 Oct 2015 07:28:00 GMT"`)
 - Falls back to your chosen strategy when no header is present

diff --git a/docs/tools.md b/docs/tools.md
@@ -616,7 +616,7 @@ In addition to per-tool `prepare` methods, you can also define an agent-wide `pr
 The `prepare_tools` function should be of type [`ToolsPrepareFunc`][pydantic_ai.tools.ToolsPrepareFunc], which takes the [`RunContext`][pydantic_ai.tools.RunContext] and a list of [`ToolDefinition`][pydantic_ai.tools.ToolDefinition], and returns a new list of tool definitions (or `None` to disable all tools for that step).
 
 !!! note
-    The list of tool definitions passed to `prepare_tools` includes both regular function tools and tools from any [toolsets](toolsets.md) registered to the agent, but not [output tools](output.md#tool-output).
+    The list of tool definitions passed to `prepare_tools` includes both regular function tools and tools from any [toolsets](toolsets.md) registered on the agent, but not [output tools](output.md#tool-output).
 To modify output tools, you can set a `prepare_output_tools` function instead.
 
 Here's an example that makes all tools strict if the model is an OpenAI model: