Skip to content

Clean up default temperature handling and Kimi top_p override#1994

Open
enyst wants to merge 8 commits intofix-default-temperature-to-nonefrom
cleanup-default-temperature-models
Open

Clean up default temperature handling and Kimi top_p override#1994
enyst wants to merge 8 commits intofix-default-temperature-to-nonefrom
cleanup-default-temperature-models

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Feb 11, 2026

Summary

  • Remove get_default_temperature and always default to provider temperature (temperature stays None unless explicitly set).
  • Adjust top_p defaults for Moonshot Kimi-K2.5 (requires 0.95) in the LLM initializer.
  • Update tests to reflect the new temperature behavior.

Testing

  • uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/utils/model_features.py openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_model_features.py
  • Manual example run: uv run python examples/01_standalone_sdk/29_llm_streaming.py with LLM_BASE_URL=https://llm-proxy.eval.all-hands.dev, LLM_API_KEY=$LITELLM_API_KEY, LLM_MODEL=moonshot/kimi-k2.5 (top_p auto-overridden to 0.95)

@enyst can click here to continue refining the PR

Real-world tests

  • bedrock/moonshot.kimi-k2-thinking via examples/01_standalone_sdk/22_anthropic_thinking.pyFailed (404 Not Found from Bedrock converse endpoint).
  • moonshot/kimi-k2.5 via examples/01_standalone_sdk/29_llm_streaming.py with LLM_BASE_URL=https://llm-proxy.eval.all-hands.dev and LLM_API_KEY=$LITELLM_API_KEYSucceeded (story file created and then deleted by the example).
  • moonshot/kimi-k2-thinking via examples/01_standalone_sdk/22_anthropic_thinking.pyFailed (no healthy deployments for the model).

Behavior changes

  • Kimi models no longer receive an implicit temperature override; temperature remains None unless set by the caller, letting the provider default apply.
  • top_p defaults are now centralized via get_default_top_p when the caller leaves top_p at 1.0 (e.g., Moonshot Kimi-K2.5 defaults to 0.95).

enyst and others added 2 commits February 11, 2026 03:10
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py4598481%405, 463, 637, 738, 740–741, 769, 819, 830–832, 836–840, 848–850, 860–862, 865–866, 870, 872–873, 875, 898–903, 1026, 1031–1032, 1229–1230, 1239, 1252, 1254–1259, 1261–1278, 1281–1285, 1287–1288, 1294–1303, 1354, 1356
openhands-sdk/openhands/sdk/llm/utils
   model_features.py46197%32
TOTAL17196503970% 

@enyst enyst marked this pull request as ready for review February 11, 2026 03:17
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Works but violates "good taste" by accumulating special cases instead of using a data-driven approach.

Verdict: ❌ Needs rework - Inconsistent approach and missing test coverage

Key Insight: You removed a data structure (DEFAULT_TEMPERATURE_MODELS) but added hardcoded special cases (kimi-k2.5 top_p). This creates technical debt and will lead to a growing pile of if-statements.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst requested a review from all-hands-bot February 11, 2026 03:35
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Breaking Change Risk

Taste Rating: 🟡 Acceptable - Works but violates "never break userspace"

This PR simplifies temperature handling but introduces a breaking change for existing users. The removal of default temperature settings for kimi models changes their runtime behavior without a clear deprecation path.

KEY INSIGHT: Removing default temperature while adding model-specific top_p overrides creates inconsistent patterns that will multiply as more models need special handling.

See inline comments for specific issues.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst requested a review from all-hands-bot February 11, 2026 04:31
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Clean refactoring with solid simplification, but test coverage regression and undocumented breaking change need attention.

Key Insight: Good taste in eliminating special cases, but temperature removal is a breaking change hidden in a "cleanup" PR.

@enyst
Copy link
Collaborator Author

enyst commented Feb 11, 2026

@OpenHands Address the review comments and inline review threads. Think deeply before doing so, it's an AI reviewer so it could be wrong; decide first if the comment is valid, and fix it then, or if not, reply to the thread with your assessment.

Note that you should be careful and use gh correctly when you reply, do not replace their comment, reply to it in the same thread.

@openhands-ai
Copy link

openhands-ai bot commented Feb 11, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 11, 2026

Final summary (new work since last update)

What changed

  • No further code changes were needed after the previous update; git diff against origin/cleanup-default-temperature-models...HEAD is empty and the working tree is clean.

Checklist against your request

  • Addressed inline review threads by fixing valid comments (expanded get_default_top_p test coverage).
  • For the “breaking change” concern, responded in-thread with an assessment (intentional behavior change; provider defaults should not be encoded; set temperature=1.0 explicitly to preserve old behavior).
  • Replied as replies in the same threads (did not replace/edit the reviewer’s comments).
  • Changes were concise (only expanded the relevant test parametrization).
  • Changes were committed and already pushed to cleanup-default-temperature-models (commit cd9dc3c9).

No additional push is required.

View full conversation

@enyst enyst requested a review from all-hands-bot February 11, 2026 18:54
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Clean refactor that simplifies temperature/top_p handling

The core logic is sound: removing special-case temperature overrides and generalizing top_p defaults is pragmatic. The behavior change (temperature stays None) lets providers handle their own defaults, which is the right approach. Tests actually verify behavior rather than just mocking calls.

One improvement needed: stale comment that no longer reflects the generalized logic.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Feb 11, 2026
@neubig neubig self-requested a review February 12, 2026 00:17
Copy link
Contributor

neubig commented Feb 12, 2026

[automated message] @neubig assigned for review according to git blame

@enyst enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants