-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support adding custom metrics to existing evaluations #5436
Conversation
WalkthroughThis pull request introduces enhancements to FiftyOne's model evaluation framework, focusing on expanding custom metrics functionality. The changes allow users to add and apply custom metrics across different model evaluation types (regressions, classifications, detections, segmentations). The documentation in Changes
Sequence DiagramsequenceDiagram
participant User
participant SampleCollection
participant EvaluationResults
participant CustomMetrics
User->>SampleCollection: evaluate_regressions(predictions, gt_field, eval_key, custom_metrics)
SampleCollection->>EvaluationResults: Compute Evaluation
EvaluationResults->>CustomMetrics: Add and Process Custom Metrics
CustomMetrics-->>EvaluationResults: Return Metric Results
EvaluationResults-->>User: Evaluation Results with Custom Metrics
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
fiftyone/utils/eval/base.py (3)
31-32
: Add input validation for custom_metrics parameter.Consider adding type validation to ensure
custom_metrics
is either a list or dict when provided. This would help catch configuration errors early.def __init__(self, custom_metrics=None, **kwargs): + if custom_metrics is not None and not isinstance(custom_metrics, (list, dict)): + raise ValueError("custom_metrics must be a list or dict") self.custom_metrics = custom_metrics super().__init__(**kwargs)
Line range hint
63-93
: Enhance error handling in compute_custom_metrics.While the method handles errors gracefully, consider:
- Including the operator URI in the error context
- Adding debug logging for successful computations
- Potentially raising an exception if too many metrics fail
for metric, kwargs in custom_metrics.items(): + success = False try: operator = foo.get_operator(metric) value = operator.compute(samples, results, **kwargs or {}) if value is not None: if results.custom_metrics is None: results.custom_metrics = {} key = operator.config.aggregate_key if key is None: key = operator.config.name results.custom_metrics[operator.uri] = { "key": key, "value": value, "label": operator.config.label, "lower_is_better": operator.config.kwargs.get( "lower_is_better", True ), } + success = True + logger.debug("Successfully computed metric '%s'", metric) except Exception as e: logger.warning( - "Failed to compute metric '%s': Reason: %s", - metric, + "Failed to compute metric '%s' (uri: %s): Reason: %s", + operator.config.name if 'operator' in locals() else 'unknown', + metric, e, ) + if not success: + results.failed_metrics = results.get("failed_metrics", 0) + 1 + if results.failed_metrics > len(custom_metrics) // 2: + raise ValueError(f"Too many metrics failed to compute ({results.failed_metrics})")
164-206
: LGTM! Well-structured implementation with proper error handling.The implementation is thorough and handles edge cases well. The
save_config()
call before computation ensures proper cleanup even if computation fails.Consider adding a docstring example to demonstrate the usage with both list and dict formats.
"""Computes the given custom metrics and adds them to these results. Args: custom_metrics: a list of custom metrics to compute or a dict mapping metric names to kwargs dicts overwrite (True): whether to recompute any custom metrics that have already been applied + + Example: + # Using a list + results.add_custom_metrics(["metric1", "metric2"]) + + # Using a dict with kwargs + results.add_custom_metrics({ + "metric1": {"threshold": 0.5}, + "metric2": {"min_value": 10} + }) """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/eval/base.py
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-app
- GitHub Check: e2e / test-e2e
- GitHub Check: lint / eslint
- GitHub Check: build / build
- GitHub Check: build
🔇 Additional comments (1)
fiftyone/utils/eval/base.py (1)
47-61
: LGTM! Clean implementation of metrics filtering.The method efficiently handles both list and dict formats while providing URI-based filtering capability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Change log
results.add_custom_metrics()
methodExample usage
# Install the example metrics plugin fiftyone plugins download \ https://github.com/voxel51/fiftyone-plugins \ --plugin-names @voxel51/metric-examples
Later in another session:
Summary by CodeRabbit
Release Notes
New Features
Documentation
Improvements
These updates provide users with greater control and customization options when evaluating machine learning models in FiftyOne.