Clean up default temperature handling and Kimi top_p override#1994
Clean up default temperature handling and Kimi top_p override#1994enyst wants to merge 8 commits intofix-default-temperature-to-nonefrom
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Works but violates "good taste" by accumulating special cases instead of using a data-driven approach.
Verdict: ❌ Needs rework - Inconsistent approach and missing test coverage
Key Insight: You removed a data structure (DEFAULT_TEMPERATURE_MODELS) but added hardcoded special cases (kimi-k2.5 top_p). This creates technical debt and will lead to a growing pile of if-statements.
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Breaking Change Risk
Taste Rating: 🟡 Acceptable - Works but violates "never break userspace"
This PR simplifies temperature handling but introduces a breaking change for existing users. The removal of default temperature settings for kimi models changes their runtime behavior without a clear deprecation path.
KEY INSIGHT: Removing default temperature while adding model-specific top_p overrides creates inconsistent patterns that will multiply as more models need special handling.
See inline comments for specific issues.
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Clean refactoring with solid simplification, but test coverage regression and undocumented breaking change need attention.
Key Insight: Good taste in eliminating special cases, but temperature removal is a breaking change hidden in a "cleanup" PR.
|
@OpenHands Address the review comments and inline review threads. Think deeply before doing so, it's an AI reviewer so it could be wrong; decide first if the comment is valid, and fix it then, or if not, reply to the thread with your assessment. Note that you should be careful and use gh correctly when you reply, do not replace their comment, reply to it in the same thread. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Final summary (new work since last update)What changed
Checklist against your request
No additional push is required. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Clean refactor that simplifies temperature/top_p handling
The core logic is sound: removing special-case temperature overrides and generalizing top_p defaults is pragmatic. The behavior change (temperature stays None) lets providers handle their own defaults, which is the right approach. Tests actually verify behavior rather than just mocking calls.
One improvement needed: stale comment that no longer reflects the generalized logic.
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
|
[automated message] @neubig assigned for review according to git blame |
Summary
get_default_temperatureand always default to provider temperature (temperature staysNoneunless explicitly set).Testing
uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/utils/model_features.py openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/llm/test_model_features.pyuv run python examples/01_standalone_sdk/29_llm_streaming.pywithLLM_BASE_URL=https://llm-proxy.eval.all-hands.dev,LLM_API_KEY=$LITELLM_API_KEY,LLM_MODEL=moonshot/kimi-k2.5(top_p auto-overridden to 0.95)@enyst can click here to continue refining the PR
Real-world tests
bedrock/moonshot.kimi-k2-thinkingviaexamples/01_standalone_sdk/22_anthropic_thinking.py→ Failed (404 Not Found from Bedrock converse endpoint).moonshot/kimi-k2.5viaexamples/01_standalone_sdk/29_llm_streaming.pywithLLM_BASE_URL=https://llm-proxy.eval.all-hands.devandLLM_API_KEY=$LITELLM_API_KEY→ Succeeded (story file created and then deleted by the example).moonshot/kimi-k2-thinkingviaexamples/01_standalone_sdk/22_anthropic_thinking.py→ Failed (no healthy deployments for the model).Behavior changes
Noneunless set by the caller, letting the provider default apply.get_default_top_pwhen the caller leaves top_p at 1.0 (e.g., Moonshot Kimi-K2.5 defaults to 0.95).