Skip to content

Conversation

@ajac-zero
Copy link
Contributor

@ajac-zero ajac-zero commented Oct 5, 2025

Hi! This pull request takes a shot at implementing a dedicated OpenRouterModel model. Closes #2936.

The differentiator for this PR is that this implementation minimizes code duplication as much as possible by delegating the main logic to OpenAIChatModel, such that the new model class serves as a convenience layer for OpenRouter specific features.

The main thinking behind this solution is that as long as the OpenRouter API is still fully accessible via the openai package, it would be inefficient to reimplement the internal logic using this same package again. We can instead use hooks to achieve the requested features.

I would like to get some thoughts on this implementation before starting to update the docs.

Addressed issues

  1. Closes Store OpenRouter provider metadata in ModelResponse vendor details #1849

Provider metadata can now be accessed via the 'downstream_provider' key in ModelMessage.provider_details:

from pydantic_ai import ModelRequest
from pydantic_ai.direct import model_request_sync
from pydantic_ai.models.openrouter import OpenRouterModel

model = OpenRouterModel('moonshotai/kimi-k2-0905')

response = model_request_sync(model, [ModelRequest.user_text_prompt('Who are you')])

assert response.provider_details is not None
print(response.provider_details['downstream_provider'])  # <-- Final provider that was routed to
# Output: AtlasCloud
  1. Closes Can I get thinking part from openrouter provider using google/gemini-2.5-pro? #2999

The new OpenRouterModelSettings allows for the reasoning parameter by OpenRouter, the thinking can then be accessed as a ThinkingPart in the model response:

from pydantic_ai import ModelRequest
from pydantic_ai.direct import model_request_sync
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings

model = OpenRouterModel('google/gemini-2.5-pro')

settings = OpenRouterModelSettings(openrouter_reasoning={'effort': 'high'})

response = model_request_sync(model, [ModelRequest.user_text_prompt('Who are you')], model_settings=settings)

print(response.parts[0])
# Output: ThinkingPart(content='**Identifying the Core Inquiry**\n\nI\'m grappling with the core question: "Who am I?" Initially, I\'m identifying the root of the query. The user wants a fundamental identity explained, and I\'ve begun by pinpointing the key words and associations. AI, specifically. Next step, I\'ll move onto broadening this.\n\n\n**Clarifying My Nature**\n\nI\'m now dissecting the definition of "language model," focusing on what that *means* in practical terms. I\'ve moved past simply stating the term and am now delving into how my functions—answering, generating, translating—are executed. This requires explaining my training on vast datasets and my lack of personal experience, which is key to the identity question. I am trying to find the right framing for this complex process.\n\n\n**Formulating a Direct Response**\n\nI\'m now trying to directly answer the question, avoiding technical jargon where possible. I\'m organizing my response. The essential elements have been identified: My nature, my capabilities, and what I *cannot* do. I\'m thinking of ways to explain these facts in a concise, accessible format, focusing on clarity for the user.\n\n\n**Constructing a Detailed Answer**\n\nI\'m now translating the structured plan into actual sentences. I\'m working on the opening, the "I am..." statement, and aiming for a direct, clear tone. Then, I am carefully crafting the explanation of my capabilities and limitations to avoid misunderstandings. I\'m actively searching for concise and impactful language.\n\n\n**Drafting the Final Response**\n\n\\n\\n\n\nI\'m now integrating all the elements I\'ve identified. I\'m beginning the final draft. I\'m focusing on flow and readability, weaving the key points—my nature, my origin, my abilities, and my constraints—into a cohesive narrative. The goal is a concise and informative self-description, tailored to the user\'s inquiry.\n\n\n', id='reasoning', provider_name='openrouter')
  1. Closes Handle error response from OpenRouter as exception instead of validation failure #2323. Closes OpenRouter uses non-compatible finish reason #2844

These are dependent on some downstream logic from OpenRouter or their own downstream providers (that a response of type 'error' will have a >= 400 status code), but for most cases I would say it works as one would expect:

from pydantic_ai import ModelHTTPError, ModelRequest
from pydantic_ai.direct import model_request_sync
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings

model = OpenRouterModel('google/gemini-2.5-pro')

settings = OpenRouterModelSettings(
    openrouter_preferences={'only': ['azure']}  # Gemini is not available in Azure; Guaranteed failure.
)

try:
    response = model_request_sync(model, [ModelRequest.user_text_prompt('Who are you')], model_settings=settings)
except ModelHTTPError as e:
    print(e)
# status_code: 404, model_name: google/gemini-2.5-pro, body: {'message': 'No allowed providers are available for the selected model.', 'code': 404}
  1. Add OpenRouterModel #1870 (comment)

Add some additional type support to set the provider routing options from OpenRouter:

from pydantic_ai import ModelRequest
from pydantic_ai.direct import model_request_sync
from pydantic_ai.models.openrouter import OpenRouterModel, OpenRouterModelSettings

model = OpenRouterModel('moonshotai/kimi-k2-0905')

settings = OpenRouterModelSettings(
    openrouter_preferences={
        'order': ['moonshotai', 'deepinfra', 'fireworks', 'novita'],
        'allow_fallbacks': True,
        'require_parameters': True,
        'data_collection': 'allow',
        'zdr': True,
        'only': ['moonshotai', 'fireworks'],
        'ignore': ['deepinfra'],
        'quantizations': ['fp8'],
        'sort': 'throughput',
        'max_price': {'prompt': 1},
    }
)

response = model_request_sync(model, [ModelRequest.user_text_prompt('Who are you')], model_settings=settings)
assert response.provider_details is not None
print(response.provider_details['downstream_provider'])
# Output: Fireworks

@DouweM DouweM self-assigned this Oct 7, 2025
Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajac-zero Muchas gracias Anibal!

@ajac-zero
Copy link
Contributor Author

Buen día @DouweM, can you take a look when you get the chance?

Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracias!

It'd be interesting to add support for the WebSearchTool built-in tool as well, shouldn't be too complicated I think: https://openrouter.ai/docs/features/web-search

@DouweM
Copy link
Collaborator

DouweM commented Oct 21, 2025

@ajac-zero We can also remove this comment from openai.py:

# NOTE: We don't currently handle OpenRouter `reasoning_details`:
# - https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks
# If you need this, please file an issue.

@xcpky
Copy link
Contributor

xcpky commented Oct 26, 2025

Hi, just found this useful pr and I think top_k and other missing model config should be added to align with the Request Schema documented here https://openrouter.ai/docs/api-reference/overview.

@DouweM
Copy link
Collaborator

DouweM commented Oct 28, 2025

@ajac-zero Please have a look at the failing linting & coverage!

@ajac-zero
Copy link
Contributor Author

@DouweM This part from OpenAIChatModel is causing some unexpected behavior with the thinking content, because it appends it to the message content, which OpenRouter doesn't want.

elif isinstance(item, ThinkingPart):
# NOTE: DeepSeek `reasoning_content` field should NOT be sent back per https://api-docs.deepseek.com/guides/reasoning_model,
# but we currently just send it in `<think>` tags anyway as we don't want DeepSeek-specific checks here.
# If you need this changed, please file an issue.
start_tag, end_tag = self.profile.thinking_tags
texts.append('\n'.join([start_tag, item.content, end_tag]))

I fixed it by regexing the content afterward, but I don't like this solution very much.

if openai_message['role'] == 'assistant' and isinstance(
contents := openai_message.get('content'), str
): # pragma: lax no cover
openai_message['content'] = re.sub(r'<think>.*?</think>\s*', '', contents, flags=re.DOTALL).strip()

I am hesitant on changing OpenAIChatModel at all, but maybe that we could wrap the ThinkingPart logic in a function, and then override it from OpenRouterModel and any future models?

@DouweM
Copy link
Collaborator

DouweM commented Oct 29, 2025

@ajac-zero Yep I think pulling this into a method that can be overridden in a subclass is a good idea. It shouldn't be specific for ThinkingPart, but for the entire ModelResponsePart, so that it has that isinstance(...) branching it, and and the overridden method can short-circuit by handling ThinkingPart itself.

maybe_event.part.id = 'content'
maybe_event.part.provider_name = self.provider_name
yield maybe_event
async def _validate_response(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the other new methods need a return type hint

By default, this is a no-op since `ChatCompletionChunk` is already validated.
"""
async for chunk in self._response:
yield chunk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just return self._response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it but no :( If we use return the method returns a coroutine instead, we have to use yield for it to behave as an async iterable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the method sync though, and then return the coroutine, and I think it'll work

@override
def _process_reasoning(self, response: chat.ChatCompletion) -> list[ThinkingPart]:
# We can cast with confidence because response was validated in `_validate_completion`
response = cast(OpenRouterChatCompletion, response)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do an assert isinstance instead to raise an error if we somehow get here with an unexpected type?

message = response.choices[0].message
items: list[ThinkingPart] = []

if reasoning_details := message.reasoning_details:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we can entirely drop the .reasoning field processing we have in the superclass? Will OpenRouter always have reasoning_details as well?

provider_details = super()._process_provider_details(response)

provider_details['downstream_provider'] = response.provider
provider_details['native_finish_reason'] = response.choices[0].native_finish_reason
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure there's more than 1 choice at this point?

if isinstance(item, TextPart):
texts.append(item.content)
elif isinstance(item, ThinkingPart):
if item.provider_name == self.system and isinstance(item, OpenRouterThinkingPart):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this unfortunately won't work right :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment