Add test for GPT-5-mini temperature issue with litellm_proxy#267
Closed
simonrosenberg wants to merge 3 commits intomainfrom
Closed
Add test for GPT-5-mini temperature issue with litellm_proxy#267simonrosenberg wants to merge 3 commits intomainfrom
simonrosenberg wants to merge 3 commits intomainfrom
Conversation
This test reproduces issue #265 where GPT-5-mini requires temperature=1.0 but the LLM class defaults to temperature=0.0, causing an error. The test includes three scenarios: 1. Failing case: GPT-5-mini with default temperature=0.0 (reproduces the issue) 2. Working case: GPT-5-mini with explicit temperature=1.0 3. Working case: GPT-5-mini with temperature override in completion call Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
Collaborator
Author
|
@OpenHands don't mock completion. use LLM_API_KEY and LLM_BASE_URL to setup your LLM |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
…M_BASE_URL - Remove all mocking code from GPT-5-mini temperature tests - Use environment variables LLM_API_KEY and LLM_BASE_URL for configuration - Tests now skip gracefully when environment variables are not set - Maintain the same test scenarios: default temperature failure, explicit temperature=1.0 success, and temperature override - Add proper type ignores for response attribute access Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI have successfully replaced the mocked GPT-5-mini temperature tests with real LLM API calls as requested. Here's what was accomplished: ✅ Request Fulfillment Checklist:
Key Changes Made:
Test Behavior:
The changes have been committed and pushed to the PR branch |
xingyaoww
reviewed
Sep 15, 2025
Collaborator
xingyaoww
left a comment
There was a problem hiding this comment.
Will this hits API everytime the sdk test suite is run? 😭
If so i think mock is probably ok!
Collaborator
|
I think we can close this in favor of #265 (comment) |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a test that reproduces issue #265 where GPT-5-mini requires
temperature=1.0but the LLM class defaults totemperature=0.0, causing an error when using the completion API.Changes
tests/sdk/llm/test_gpt5_mini_temperature_issue.pywith three test scenarios:temperature=0.0(reproduces the issue)temperature=1.0Test Details
The test uses mocking to simulate the GPT-5-mini behavior where it requires
temperature=1.0and fails with other temperature values. This allows us to reproduce the issue without making actual API calls.Test Scenarios
test_gpt5_mini_temperature_issue_without_temperature: Demonstrates the issue when using default temperaturetest_gpt5_mini_works_with_temperature_1: Shows that the issue is resolved when temperature is explicitly set to 1.0test_gpt5_mini_temperature_override_in_completion_call: Shows that temperature can be overridden in the completion callVerification
Related Issues
Fixes #265
Notes
As mentioned in the issue, the test is expected to fail initially (which it does in the first test case), demonstrating the problem. The other test cases show potential workarounds until a proper fix is implemented.
@simonrosenberg can click here to continue refining the PR