-
Notifications
You must be signed in to change notification settings - Fork 862
fix(MBT): update auto gen mbt evals #3530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe evaluator module refactors input validation and configuration handling to use nested model structures. Instead of unpacking input fields directly, the validation now passes input as a single field to the request model, while configuration extraction adapts to resolve and use nested config models rather than top-level fields. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 72ad88e in 2 minutes and 11 seconds. Click for details.
- Reviewed
749lines of code in5files - Skipped
0files when reviewing. - Skipped posting
12draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py:27
- Draft comment:
Docstring now details the nested 'input' and optional 'config' structure. Ensure callers pass a flat dict as intended. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is asking the PR author to ensure that callers pass a flat dictionary as intended. It seems to be a request for confirmation of intended behavior, which violates the rule against asking the author to confirm their intention. Therefore, this comment should be removed.
2. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py:39
- Draft comment:
Changed to 'request_model(input=input)' to enforce the nested payload structure. Verify that legacy clients are updated accordingly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify that legacy clients are updated, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written. Therefore, it should be removed.
3. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py:31
- Draft comment:
List comprehension correctly extracts required fields from the nested 'input' model. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that something is done correctly, which is not necessary in a code review context.
4. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py:53
- Draft comment:
Handling Optional config types using origin is well implemented to extract the inner model. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply praises the implementation without offering any specific guidance or questions.
5. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/__init__.py:132
- Draft comment:
The all export now includes updated evaluator models. Ensure auto-generation remains consistent with the API spec. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:15
- Draft comment:
AgentEfficiencyInput now includes a 'trajectory_prompts' field. Confirm that producers supply this required value. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:87
- Draft comment:
New field 'tool_output' added in AgentToolErrorDetectorInput; ensure that input providers include this field. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure that input providers include a new field. This falls under the category of asking the author to ensure something is done, which is against the rules. The comment does not provide a specific suggestion or ask for a specific test to be written, so it should be removed.
8. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:210
- Draft comment:
InstructionAdherenceInput now requires a 'response' field. Validate that tasks provide this new field accordingly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is asking the PR author to validate that tasks provide a new required field. It is not making a specific code suggestion or asking for a test to be written. It is more of a general request for validation, which violates the rule against asking the author to ensure behavior is intended or tested.
9. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/response.py:88
- Draft comment:
Using 'pass_' alias in IntentChangeResponse is a smart workaround to avoid reserved keyword conflicts. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/response.py:115
- Draft comment:
In PromptInjectionResponse, consider using a boolean for 'has_injection' if it represents a binary state. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:151
- Draft comment:
Typo suggestion: The example value 'gpt-4o' in ContextRelevanceConfigRequest might be a typo. Did you mean 'gpt-4'? - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% 'gpt-4o' is actually the correct name for OpenAI's GPT-4 Omni model, which was released in 2024. This is not a typo - it's a legitimate model identifier. The comment is factually incorrect because it assumes 'gpt-4o' is a mistake when it's actually a valid model name. The automated tool that generated this comment appears to be unaware of this model's existence. This comment should be deleted as it's incorrect and would mislead the PR author. I might be wrong if this codebase was written before GPT-4o was released, or if the codebase has specific conventions about which models to support. However, given that this is generated code (as indicated by the comment at the top of the file), it's likely pulling from a schema that intentionally includes 'gpt-4o'. Even if there were historical context, the file header shows this is auto-generated code from a JSON schema file. The example value 'gpt-4o' was intentionally included in the source schema, and 'gpt-4o' is a real, valid OpenAI model name. The comment is factually incorrect. This comment should be deleted because 'gpt-4o' is not a typo - it's the correct name for OpenAI's GPT-4 Omni model. The comment is factually incorrect and would mislead the PR author into making an unnecessary change.
12. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:194
- Draft comment:
The 'context' field in FaithfulnessInput appears to be incomplete. Please ensure the Field is finished with the intended parameters, such as closing parenthesis and examples if needed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_ZBsLzqhcr9lMb5C7
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
|
||
| class AgentGoalAccuracyRequest(BaseModel): | ||
| class AgentFlowQualityRequest(BaseModel): | ||
| config: Optional[AgentFlowQualityConfigRequest] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not optional
|
|
||
| class AgentToolErrorDetectorRequest(BaseModel): | ||
| class AgentGoalCompletenessRequest(BaseModel): | ||
| config: Optional[AgentGoalCompletenessConfigRequest] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed f0587f1 in 2 minutes and 0 seconds. Click for details.
- Reviewed
81lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluator.py:18
- Draft comment:
The detailed docstring explaining the nested input structure was removed. Consider adding a brief note that the flat input is wrapped in an {'input': ...} dict for clarity. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the PR author to add back documentation that was intentionally removed. The rules state "Do NOT comment unless there is clearly a code change required" and "Do NOT make comments that are obvious or unimportant." This is a documentation preference comment, not a code issue. The PR author clearly decided to simplify the docstring, removing the detailed explanation. The comment is essentially asking them to reconsider that decision or add some of it back. This falls under making comments about things that aren't clear code issues. The functionality hasn't changed, only the documentation level of detail. Perhaps the detailed documentation was actually important for understanding how the function works, especially if the nested structure is non-obvious. The comment is trying to preserve useful information that helps future maintainers understand the code. While documentation can be helpful, the rules explicitly state not to comment unless there's a clear code change required. This is a documentation style preference, not a functional issue. The PR author intentionally simplified the docstring, and if they felt the detail was unnecessary, that's their judgment call. The code itself is clear enough at line 23 showingrequest_model(input=input). This comment should be deleted. It's asking the PR author to reconsider a documentation change they intentionally made, which is a style preference rather than a code issue. The rules state comments should only be made when there's clearly a code change required, not for documentation preferences.
2. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py:14
- Draft comment:
The docstring for _get_required_fields was simplified. While brevity is good, consider mentioning that it extracts required fields from the nested 'input' model to aid maintainers. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, suggesting a documentation update. It doesn't provide a specific code suggestion or identify a potential issue with the code itself. According to the rules, purely informative comments should be removed.
3. packages/traceloop-sdk/traceloop/sdk/evaluator/evaluators_made_by_traceloop.py:33
- Draft comment:
The docstring and inline comments in _get_config_fields have been reduced. A brief note on handling Optional[ConfigModel] might be useful for clarity. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and suggests adding a note for clarity. It doesn't provide a specific code suggestion or point out a potential issue with the code. It violates the rule against making purely informative comments.
Workflow ID: wflow_mtYj9VqQPP2VJhE0
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify all the places the config is optional - if it is really true.
As far as I know only model config is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed f25cc27 in 1 minute and 32 seconds. Click for details.
- Reviewed
31lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:1
- Draft comment:
Header filename updated to 'tmpp6g09qxu.json'. Verify that this change is intentional and consistent with configuration. - Reason this comment was not posted:
Confidence changes required:30%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/request.py:45
- Draft comment:
Changed 'config' field from optional to required in AgentFlowQualityRequest. Confirm client code and docs reflect this potentially breaking change. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment explicitly asks the author to "Confirm" that client code and docs reflect the change. This directly violates the rule: "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended." The comment doesn't suggest a specific code change - it's purely asking for verification. Additionally, since this is generated code (indicated by the header), the author likely didn't manually make this change but rather regenerated the file from a schema. The comment doesn't provide actionable feedback on what to change in the code itself. However, this is a legitimate breaking change that could cause runtime errors for existing clients. Perhaps the comment is valuable as a warning about the impact, even if it's phrased as a request for confirmation. The comment does identify a real issue that could break existing code. While the breaking change is real, the comment doesn't suggest what code change should be made. It only asks for confirmation about external factors (client code and docs). According to the rules, comments should clearly indicate a code change that's required. If the comment suggested reverting the change or adding a migration path, it would be actionable. As written, it's just asking the author to verify things outside the PR. The comment should be deleted because it asks the author to "confirm" something rather than suggesting a specific code change. It violates the rule against asking authors to verify or ensure things. While it identifies a breaking change, it doesn't provide actionable guidance on what to do about it in the code.
3. packages/traceloop-sdk/traceloop/sdk/generated/evaluators/response.py:1
- Draft comment:
Header filename updated to 'tmpp6g09qxu.json' in response file. Ensure consistency with request file. - Reason this comment was not posted:
Confidence changes required:30%<= threshold50%None
Workflow ID: wflow_2q7lE5BOudvV7AZu
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
feat(instrumentation): ...orfix(instrumentation): ....Important
Refactor evaluator validation and configuration logic to use consolidated input fields and improve error handling.
_validate_evaluator_input()inevaluator.pyto use a single consolidated input field for request models._get_config_fields()inevaluators_made_by_traceloop.pyto derive defaults from nested config models.evaluator.pyandevaluators_made_by_traceloop.py.request.pyto use consolidated input fields.response.pyto align with new request model structure.This description was created by
for f25cc27. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.