-
Notifications
You must be signed in to change notification settings - Fork 420
feat(client): Add client-side metrics for transfer and RPC operations #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive client-side metrics for transfer and RPC operations, adding both Prometheus-style and human-readable metric collection interfaces. The changes enable real-time monitoring of client performance with support for configurable background reporting.
Key changes include:
- Implementation of client-side metrics collection for transfer (read/write bytes, latency) and RPC operations
- Addition of background metrics reporting thread with environment variable configuration
- Integration of metrics tracking throughout the client operation flow
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
client_metric.h |
New header defining metric structures for transfer and RPC operations with summary formatting |
client.cpp |
Integration of metrics collection throughout client operations and background reporting thread |
master_client.cpp |
RPC call tracking with latency measurement and metric collection |
transfer_task.cpp |
Transfer operation byte counting and metrics integration |
client.h |
Client class extension with metrics members and reporting thread management |
master_client.h |
Header updates for metrics parameter passing |
transfer_task.h |
Header updates for metrics integration in transfer operations |
client_metrics_test.cpp |
Comprehensive test suite for all metric functionality |
CMakeLists.txt |
Build configuration for the new test file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me. Just two points:
- Is there a way to skip recording metrics when the metrics feature is disabled? This would allow users to avoid the overhead of metrics if they don’t use this feature.
- Perhaps we could also add a section in the documentation explaining how to use client metrics.
This commit introduces a comprehensive metrics system for the client component, tracking transfer byte counts and operation latencies with both human-readable summaries and Prometheus-style serialization. Key features include: - New TransferMetric for tracking read/write bytes and latency histograms - MasterClientMetric for RPC call counting and latency tracking - Environment-controlled metrics reporting (MC_STORE_METRIC_REPORT) - Automatic periodic metrics collection thread - Enhanced test coverage for metrics validation - Unified metrics interface across all client operations The implementation provides detailed latency percentiles (P50/P95) and total byte tracking with automatic unit conversion (B/KB/MB/GB).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
5eed4ec to
a98b0b4
Compare
|
We can now disable metric collection through an environment variable. Regarding the documentation update, I believe we currently lack an appropriate doc for this. For example, we could include it in a "Mooncake Configuration Guide" or "Mooncake Deployment Guide." I'll update the documentation lately. |
In this PR, we introduce client side support for both transfer and RPC-related metrics.
We provide interface support: Prometheus-style and human-readable formats. Additionally, we've implemented a background thread for metric printing (disabled by default).