Conversation
Change get_default_temperature to return None by default instead of 0.0. This allows models to use their provider's default temperature setting rather than forcing temperature=0.0 for all models. Models with specific temperature requirements (kimi-k2-thinking, kimi-k2.5) still explicitly return 1.0. Fixes #1913 Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands check the failing CI, merge main, reproduce locally, and fix any issues. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Update test to expect temperature=None instead of 0.0, aligning with the PR's change to get_default_temperature() returning None by default. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've successfully addressed the failing CI for PR #1989 by: Completed Tasks:
Change Made:- assert config.temperature == 0.0
+ assert config.temperature is None # None to use provider defaultsVerification:
Pushed:The fix has been pushed to the |
Coverage Report •
|
||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable
Pragmatic fix to a real problem, but breaks existing behavior without proper user migration guidance.
The core change is simple and solves the claude-4.6 incompatibility issue. However, this is a significant behavior change that moves from deterministic outputs (temp=0.0) to provider-dependent defaults. See inline comments for specific concerns.
VERDICT: ✅ Worth merging after addressing migration documentation
KEY INSIGHT: This is the right technical fix, but users who relied on deterministic behavior need explicit guidance on setting temperature=0.0 in their configs.
enyst
left a comment
There was a problem hiding this comment.
Thank you for this! I think maybe we can then remove the DEFAULT_TEMPERATURE_MODELS too? That’s there because Kimi is a reasoning model that doesn’t accept effort, if I recall correctly, but it needs temperature 1 like reasoning models.
I had the agent test Kimi:
- for 2.5, it worked fine with
None - for 2-thinking, no deployments found. (maybe it’s legacy now; we may want to remove it from eval proxy, since it doesn’t appear to work anymore)
Please see PR:
Addresses review comment asking for more explicit documentation about: - What None means (provider default) - How to get deterministic behavior (explicitly set 0.0) - The typical temperature range Co-authored-by: openhands <openhands@all-hands.dev>
As per enyst's suggestion on PR #1989: - Remove DEFAULT_TEMPERATURE_MODELS entirely - get_default_temperature() now returns None for all models - This simplifies the codebase and lets all models use their provider's default temperature - Previously Kimi 2.5 was tested to work fine with None Co-authored-by: openhands <openhands@all-hands.dev>
|
FWIW, PR 1994 is a PR to this PR. 😅 |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable - Solves a real problem with elegant simplification, but introduces a breaking change that needs careful consideration.
Verdict: Worth merging with minor improvements to address backward compatibility concerns and design consistency.
Key Insight: This trades deterministic defaults for provider flexibility—a pragmatic choice, but one that changes behavior for all existing users.
all-hands-bot
left a comment
There was a problem hiding this comment.
🔴 Taste Rating: Needs Rework
This PR simplifies the code (good!) but introduces a breaking change that violates the "never break userspace" principle. Users relying on deterministic outputs (temperature=0.0) will now get unpredictable provider defaults.
[CRITICAL ISSUES] - Must Fix
1. Breaking Change Without Documentation
File: openhands-sdk/openhands/sdk/llm/llm.py (lines 180-185)
🔴 Critical: This description doesn't mention that this is a BREAKING CHANGE. Users who relied on temperature=0.0 for deterministic outputs will now get provider-specific defaults (which vary: OpenAI defaults to 1.0, Anthropic to 1.0, etc.).
You've broken userspace without documenting it. At minimum:
- Clear migration guide: "To preserve deterministic behavior, explicitly set temperature=0.0"
- Deprecation warning in release notes
- Consider a compatibility flag for one release cycle
Suggested fix:
temperature: float | None = Field(
default=None,
ge=0,
description=(
"Sampling temperature for response generation. "
"Defaults to None (uses provider default, typically 1.0 for most models). "
"BREAKING CHANGE: Previously defaulted to 0.0 for deterministic outputs. "
"To restore deterministic behavior, explicitly set temperature=0.0. "
"Higher values (0.7-1.0) increase creativity and randomness."
),
)2. Unused Parameter - Code Smell
File: openhands-sdk/openhands/sdk/llm/utils/model_features.py (line 183)
🔴 Critical: The _model parameter is now completely ignored (indicated by the underscore prefix). This is a code smell that screams "delete this function entirely."
If the model parameter is never used, why does this function exist? Either:
- Make it a module-level constant:
DEFAULT_TEMPERATURE = None - Delete it and use
Nonedirectly where needed - Or explain why we need the indirection
The function signature promises model-specific logic but delivers nothing. That's broken by design.
[IMPROVEMENT OPPORTUNITIES] - Should Fix
3. Misleading Documentation
File: openhands-sdk/openhands/sdk/llm/utils/model_features.py (lines 184-189)
🟠 Important: The docstring says "Return the default temperature for a given model pattern" but the model is completely ignored. This is a lie.
Suggested fix:
def get_default_temperature(_model: str) -> float | None:
"""Return None to allow providers to use their default temperature.
This function exists for backward compatibility and may be removed in future versions.
To use provider defaults, pass temperature=None when creating LLM instances.
"""
return None4. Missing Integration Tests
File: tests/sdk/llm/test_model_features.py (lines 342-346)
🟠 Important: These tests only verify the function returns None. They don't test what matters:
- Does passing
temperature=Noneto a provider actually work? - Does it fix the claude-4.6 issue mentioned in #1913?
- What temperature does each provider actually use when None is passed?
You're testing the implementation, not the behavior. Add tests that:
- Verify LiteLLM/providers accept temperature=None
- Confirm claude-4.6 no longer errors
- Document what "provider default" means for major providers
VERDICT
❌ Needs rework: The simplification is good in principle, but the breaking change and unused parameter issues must be addressed first.
KEY INSIGHT
Simplifying code by removing special cases is "good taste" - BUT you cannot break existing user expectations in the process. Either preserve backward compatibility with explicit opt-in to new behavior, or clearly document the breaking change and provide a migration path.
As per review comment on PR #1989: - Remove the unused _model parameter since DEFAULT_TEMPERATURE_MODELS was removed - Function now returns None without needing model info - Updated tests to use new function signature Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
Summary
Change
get_default_temperatureto returnNoneby default instead of0.0. This allows models to use their provider's default temperature setting rather than forcingtemperature=0.0for all models.Motivation
As discussed in #1913, always setting
temperature=0.0causes issues with some models (like claude-4.6) that don't accept this value. By returningNone, we let the provider use their own default temperature, which:Changes
get_default_temperature()now returnsfloat | Noneinstead offloat0.0toNonekimi-k2-thinking,kimi-k2.5) still explicitly return1.0LLMclass to reflect the new behaviorNoneinstead of0.0Testing
All 583 LLM tests pass. Pre-commit hooks pass on all modified files.
Fixes #1913
@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:887118f-pythonRun
All tags pushed for this build
About Multi-Architecture Support
887118f-python) is a multi-arch manifest supporting both amd64 and arm64887118f-python-amd64) are also available if needed