Skip to content

Add test for GPT-5-mini temperature issue with litellm_proxy#267

Closed
simonrosenberg wants to merge 3 commits intomainfrom
openhands/fix-gpt5-mini-temperature-issue
Closed

Add test for GPT-5-mini temperature issue with litellm_proxy#267
simonrosenberg wants to merge 3 commits intomainfrom
openhands/fix-gpt5-mini-temperature-issue

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

This PR adds a test that reproduces issue #265 where GPT-5-mini requires temperature=1.0 but the LLM class defaults to temperature=0.0, causing an error when using the completion API.

Changes

  • Added tests/sdk/llm/test_gpt5_mini_temperature_issue.py with three test scenarios:
    1. Failing case: GPT-5-mini with default temperature=0.0 (reproduces the issue)
    2. Working case: GPT-5-mini with explicit temperature=1.0
    3. Working case: GPT-5-mini with temperature override in completion call

Test Details

The test uses mocking to simulate the GPT-5-mini behavior where it requires temperature=1.0 and fails with other temperature values. This allows us to reproduce the issue without making actual API calls.

Test Scenarios

  1. test_gpt5_mini_temperature_issue_without_temperature: Demonstrates the issue when using default temperature
  2. test_gpt5_mini_works_with_temperature_1: Shows that the issue is resolved when temperature is explicitly set to 1.0
  3. test_gpt5_mini_temperature_override_in_completion_call: Shows that temperature can be overridden in the completion call

Verification

Related Issues

Fixes #265

Notes

As mentioned in the issue, the test is expected to fail initially (which it does in the first test case), demonstrating the problem. The other test cases show potential workarounds until a proper fix is implemented.

@simonrosenberg can click here to continue refining the PR

This test reproduces issue #265 where GPT-5-mini requires temperature=1.0
but the LLM class defaults to temperature=0.0, causing an error.

The test includes three scenarios:
1. Failing case: GPT-5-mini with default temperature=0.0 (reproduces the issue)
2. Working case: GPT-5-mini with explicit temperature=1.0
3. Working case: GPT-5-mini with temperature override in completion call

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Sep 15, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
openhands
   __init__.py10100% 
openhands/sdk
   __init__.py15286%23–24
   logger.py732171%33, 57, 64–67, 69–71, 124, 130–132, 135–136, 142–144, 151, 156–157
openhands/sdk/agent
   __init__.py30100% 
   agent.py1773580%66, 73, 80, 84, 101, 115, 122–123, 128–129, 200–201, 203–205, 207–209, 244, 258, 281, 313, 347–349, 353–355, 362–363, 367, 371–372, 402, 409
   base.py63690%53, 75–77, 93, 113
openhands/sdk/context
   __init__.py40100% 
   agent_context.py57296%146, 152
   manager.py330%1, 4–5
   view.py48687%47–51, 53
openhands/sdk/context/condenser
   __init__.py30100% 
   condenser.py17382%68–69, 73
   no_op_condenser.py60100% 
openhands/sdk/context/microagents
   __init__.py40100% 
   exceptions.py50100% 
   microagent.py1432582%130, 133–136, 218–221, 229, 251–252, 257–258, 260, 264, 271–273, 281–283, 337, 339–340
   types.py210100% 
openhands/sdk/context/prompts
   __init__.py20100% 
   prompt.py30583%12, 15, 24, 44–45
openhands/sdk/conversation
   __init__.py70100% 
   conversation.py1141190%115, 123–125, 129–130, 193, 270–271, 279–280
   event_store.py101892%50–51, 60, 67, 72–73, 129, 142
   persistence_const.py50100% 
   secrets_manager.py41197%107
   serialization_diff.py00100% 
   state.py100595%140, 163, 199–201
   types.py30100% 
   visualizer.py91594%88, 145, 167, 210, 212
openhands/sdk/event
   __init__.py50100% 
   base.py742072%51, 55, 75, 79–81, 83–88, 90–91, 93–95, 97–98, 100
   condenser.py25580%29, 33–35, 49
   llm_convertible.py1791691%53, 63–64, 69–70, 246, 280–281, 286, 294, 335–336, 341, 374–375, 380
   types.py30100% 
   user_action.py12191%21
   utils.py120100% 
openhands/sdk/io
   __init__.py40100% 
   base.py14471%7, 11, 15, 19
   local.py561671%43–44, 58, 66–78
   memory.py43490%16, 20, 53–54
openhands/sdk/llm
   __init__.py60100% 
   exceptions.py360100% 
   llm.py38611669%224, 229, 242–244, 248–249, 281, 340, 346–347, 442, 455–456, 461–462, 464–465, 468–470, 475–477, 481–483, 504–507, 512–517, 521–522, 531–533, 536–537, 561, 567–568, 614, 663, 669, 672, 681–682, 691, 698, 701, 705–707, 711, 713–718, 720–737, 740–744, 746–747, 753–762, 766–777, 790, 804, 809
   llm_registry.py380100% 
   message.py110496%97, 100, 223–224
   metadata.py150100% 
openhands/sdk/llm/mixins
   fn_call_converter.py34310170%74, 343, 345, 349, 367, 369, 375, 381, 383, 422, 424, 426, 428, 433–434, 518–520, 522, 524, 545–547, 553, 575, 601–602, 610–613, 615, 617, 639, 648, 656, 701–704, 708–711, 723, 727, 738, 748, 797–798, 800, 819–821, 823–826, 829, 833, 844–845, 859, 867, 870–871, 876, 905–908, 912–913, 918–919, 924, 973–974, 980, 994, 1006, 1008–1009, 1012–1014, 1016–1017, 1023–1025, 1027–1028, 1030, 1032, 1036, 1038, 1043, 1045–1046, 1049
   non_native_fc.py39392%64, 75, 91
openhands/sdk/llm/utils
   metrics.py111298%17, 117
   model_features.py400100% 
   retry_mixin.py501178%47, 50, 64, 86, 90, 94–95, 105, 110–111, 116
   telemetry.py1361588%71, 94, 99–100, 112–113, 120, 134, 199, 216, 222, 229, 232, 234, 241
openhands/sdk/mcp
   __init__.py50100% 
   client.py631182%41, 56–57, 78, 82, 94–95, 105–106, 112–113
   definition.py481666%55, 75–80, 82–90
   tool.py401367%36–39, 43, 46, 49–52, 101–102, 107
   utils.py30486%23–24, 27, 30
openhands/sdk/preset
   __init__.py00100% 
   default.py12120%3, 6, 8, 10, 17, 23–24, 26–28, 30–31
openhands/sdk/tool
   __init__.py40100% 
   schema.py1241191%27–29, 31, 40, 230–233, 253, 268
   security_prompt.py30100% 
   tool.py951089%58, 99, 169, 172–178
openhands/sdk/tool/builtins
   __init__.py40100% 
   finish.py26196%33
   think.py321359%24, 27–28, 31, 33–37, 39, 51, 57, 74
openhands/sdk/utils
   __init__.py30100% 
   async_utils.py120100% 
   discriminated_union.py63592%173, 212, 217, 224, 227
   json.py28280%1–3, 5, 7–8, 11, 14–21, 25, 28, 30–31, 34, 37–38, 40, 43, 45–48
   protocol.py30100% 
   pydantic_diff.py571573%36, 44, 50–58, 60–62, 65
   truncate.py100100% 
   visualize.py17664%14–16, 19–20, 22
openhands/tools
   __init__.py9277%53–54
openhands/tools/execute_bash
   __init__.py40100% 
   constants.py90100% 
   definition.py925935%37, 40, 43–44, 46, 49–51, 53–55, 57, 87, 91–100, 105, 108–110, 113, 115–117, 119, 123–124, 127–129, 131–132, 135–138, 142–144, 149, 153–155, 158–160, 164–165, 167, 244, 247, 257
   impl.py402245%52–55, 57–58, 60–62, 64–67, 69, 72, 82–83, 86–89, 91
   metadata.py502354%67–73, 77–78, 83, 85, 87–96, 100–101
openhands/tools/execute_bash/terminal
   __init__.py60100% 
   factory.py492842%24–25, 30, 32, 35, 37–38, 44–46, 74–77, 79–83, 87–89, 91, 97, 107, 111–113
   interface.py691873%43, 52, 62, 71, 76, 85, 94, 99, 104, 112, 145, 157, 162, 171, 180, 185, 191, 193
   subprocess_terminal.py23620313%33–34, 50–53, 56, 59–61, 67–68, 71–74, 76, 79, 81–83, 97–100, 102, 105–106, 109, 112–113, 117–118, 124–126, 128, 131–132, 134, 138–139, 141–142, 144–150, 152–158, 161–166, 168–169, 171–172, 177, 179–187, 191–193, 195–196, 198–199, 202–204, 206–209, 211–212, 214–215, 217–222, 227–229, 231, 234, 237–238, 241–242, 248–250, 252–259, 261–264, 268–276, 289–290, 292, 309–310, 313–314, 316, 318, 320–322, 325–326, 328–330, 332–333, 335–336, 345–346, 350, 352–357, 361–362, 364–367, 369–375, 377–378, 380–384, 388–389, 391–397, 401–402, 405–406, 408–409, 411–413
   terminal_session.py17813524%43, 92, 96–98, 107–108, 119–121, 123–126, 135–136, 140, 144, 147–148, 150, 155, 161, 163–164, 169, 177, 182–185, 198–200, 205, 208–209, 213, 219, 233–235, 240, 243–244, 248, 255, 264, 273–274, 276, 279–283, 285, 288, 290–292, 296–297, 300–302, 306, 311–312, 316–317, 323–325, 336–337, 340–342, 344–346, 349, 359, 363, 366, 369–370, 376–377, 383, 388–389, 392–396, 402, 404–406, 412–416, 419, 422, 425–426, 428–431, 438, 442, 447–448, 455–457, 461, 465, 470–471, 475–476, 479–482, 488–489, 492
   tmux_terminal.py802963%36, 45, 91, 95, 97, 107–108, 110, 118–119, 121, 128, 133, 145–152, 160–161, 163–164, 166, 168–170
openhands/tools/execute_bash/utils
   command.py817211%15–19, 27, 32, 34–35, 37–38, 41–46, 48, 51–53, 55, 58–67, 74–75, 77–79, 81, 83, 90–91, 93, 95–97, 99, 101–102, 105–106, 109, 117, 120–121, 123–124, 127–129, 132–138, 141–145, 150
openhands/tools/str_replace_editor
   __init__.py30100% 
   definition.py651576%87, 99, 119, 122, 125, 132, 134, 136, 138, 210, 213, 216–217, 223, 226
   editor.py22817025%74, 76–77, 80, 98, 101, 111–115, 121–129, 131, 149–150, 170–171, 175, 179–180, 189, 193–196, 204–205, 209–211, 217, 220, 225, 228, 231–232, 235, 238–239, 243, 247, 262–264, 273, 276, 282, 287, 289–293, 295, 297, 301–302, 306–307, 316–317, 319–322, 324, 331–332, 338–340, 348–352, 356, 358–359, 366, 369, 374–375, 377, 401–402, 424–425, 427–428, 434, 437, 441–447, 450–451, 454–459, 462, 465–466, 470, 473–474, 477, 479–480, 486, 490, 509, 514–517, 519, 527, 534, 541, 552–555, 557, 559, 586–589, 598–599, 628–630, 632–641, 646–649, 662–663, 668, 673, 679, 685
   exceptions.py221340%5–6, 9, 16–18, 25–27, 38–41
   impl.py261157%31–32, 51–52, 54–56, 65–68
openhands/tools/str_replace_editor/utils
   __init__.py00100% 
   config.py20100% 
   constants.py50100% 
   diff.py641576%24, 40–42, 45–47, 50, 90–93, 107–108, 115
   encoding.py542750%42–43, 46–48, 51, 54, 60, 64–65, 67, 69, 78, 80–81, 84, 87–90, 93, 96–97, 114, 128, 130–131
   file_cache.py954849%43–46, 49–50, 54, 59, 61, 64, 69, 72–73, 87, 95–98, 108–112, 115–120, 126–130, 133–135, 138–140, 143–148, 151, 154
   history.py663645%58–60, 66–68, 70–71, 74–76, 78–79, 82, 85–86, 88, 92–94, 98–99, 102–104, 107, 111–113, 115–120, 122
   shell.py231630%30, 32–34, 38, 40, 51–55, 62–63, 70–72
openhands/tools/task_tracker
   __init__.py20100% 
   definition.py1329428%48, 51–53, 55–56, 59–60, 62, 78, 83, 85, 87–88, 91, 94–96, 98–99, 102–108, 110–112, 115, 117–120, 122, 125, 128–129, 131–132, 134–135, 137, 150–151, 154–155, 159, 161, 163–165, 171, 173–174, 179–180, 184, 193–194, 196–198, 202–203, 205–208, 210, 214–215, 217–219, 221–225, 229, 233–234, 236–237, 239, 241–245, 407, 410
openhands/tools/utils
   __init__.py00100% 
TOTAL5243163768% 

@simonrosenberg
Copy link
Collaborator Author

@OpenHands don't mock completion. use LLM_API_KEY and LLM_BASE_URL to setup your LLM

@openhands-ai
Copy link

openhands-ai bot commented Sep 15, 2025

I'm on it! simonrosenberg can track my progress at all-hands.dev

@simonrosenberg simonrosenberg added the openhands Solving the issue with OpenHands. label Sep 15, 2025
…M_BASE_URL

- Remove all mocking code from GPT-5-mini temperature tests
- Use environment variables LLM_API_KEY and LLM_BASE_URL for configuration
- Tests now skip gracefully when environment variables are not set
- Maintain the same test scenarios: default temperature failure, explicit temperature=1.0 success, and temperature override
- Add proper type ignores for response attribute access

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Sep 15, 2025

Summary

I have successfully replaced the mocked GPT-5-mini temperature tests with real LLM API calls as requested. Here's what was accomplished:

Request Fulfillment Checklist:

  1. Removed all mocking code - Eliminated unittest.mock imports, @patch decorators, and mock completion functions
  2. Implemented LLM_API_KEY usage - Tests now read API key from environment variable
  3. Implemented LLM_BASE_URL usage - Tests now read base URL from environment variable
  4. Set up real LLM integration - All three test functions now make actual API calls

Key Changes Made:

  • Removed 139 lines of mocking infrastructure and added 43 lines of real API integration (net -96 lines)
  • Added get_llm_config() helper that reads environment variables and gracefully skips tests when not available
  • Maintained identical test scenarios:
    • Default temperature=0.0 failure case
    • Explicit temperature=1.0 success case
    • Temperature override in completion call
  • Added proper type annotations with # type: ignore for response attribute access
  • All quality checks pass: formatting, linting, type checking, and existing test suite

Test Behavior:

  • Tests now skip gracefully with clear message when LLM_API_KEY and LLM_BASE_URL are not set
  • When environment variables are provided, tests make real API calls to reproduce the GPT-5-mini temperature issue
  • Maintains the same validation logic but with actual LLM responses instead of mocked ones

The changes have been committed and pushed to the PR branch openhands/fix-gpt5-mini-temperature-issue, updating PR #267 with the real LLM integration as requested.

View full conversation

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this hits API everytime the sdk test suite is run? 😭

If so i think mock is probably ok!

@xingyaoww
Copy link
Collaborator

I think we can close this in favor of #265 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openhands Solving the issue with OpenHands.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gpt5-mini not working as expected with Litellm

3 participants

Comments