Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): add metrics for barrier latency at each stage #3965

Merged
merged 11 commits into from
Aug 1, 2022

Conversation

xxhZs
Copy link
Contributor

@xxhZs xxhZs commented Jul 18, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Add barrer_inflight_latency, barrier_sync_latency, and barrier_wait_commit_latency. We can know how much time these stages take.
And add sync_size_every_epoch, we can know the size of S3 written by each epoch

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

@xxhZs xxhZs requested a review from hzxa21 July 18, 2022 08:06
@codecov
Copy link

codecov bot commented Jul 18, 2022

Codecov Report

Merging #3965 (93eb0cb) into main (fb2b33a) will decrease coverage by 0.00%.
The diff coverage is 68.64%.

@@            Coverage Diff             @@
##             main    #3965      +/-   ##
==========================================
- Coverage   74.33%   74.32%   -0.01%     
==========================================
  Files         844      844              
  Lines      122336   122407      +71     
==========================================
+ Hits        90939    90985      +46     
- Misses      31397    31422      +25     
Flag Coverage Δ
rust 74.32% <68.64%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/memory.rs 79.15% <0.00%> (ø)
src/storage/src/monitor/monitored_store.rs 1.73% <0.00%> (-0.04%) ⬇️
src/storage/src/store.rs 60.00% <ø> (ø)
src/stream/src/task/stream_manager.rs 2.86% <0.00%> (-0.12%) ⬇️
src/meta/src/barrier/recovery.rs 62.19% <80.00%> (+0.46%) ⬆️
src/storage/src/hummock/local_version_manager.rs 74.28% <88.88%> (ø)
src/meta/src/barrier/mod.rs 84.21% <100.00%> (+0.16%) ⬆️
src/meta/src/rpc/metrics.rs 97.63% <100.00%> (+0.18%) ⬆️
src/storage/src/hummock/state_store.rs 80.34% <100.00%> (+0.05%) ⬆️
src/storage/src/monitor/state_store_metrics.rs 83.71% <100.00%> (+0.38%) ⬆️
... and 7 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@skyzh
Copy link
Contributor

skyzh commented Jul 18, 2022

Would you please take a screenshot of the metrics? I didn't start it by myself for now, but I have some ideas for how to display this.

We can have multiple panels, e.g. Barrier In-Flight p50, Barrier In-Flight p99.

In each of the panel, we stack barrer_inflight_latency, barrier_sync_latency and barrier_wait_commit_latency together.

https://grafana.com/docs/grafana/next/visualizations/time-series/graph-time-series-stacking/

This might help us better understand what's going on.

@xxhZs
Copy link
Contributor Author

xxhZs commented Jul 18, 2022

Would you please take a screenshot of the metrics? I didn't start it by myself for now, but I have some ideas for how to display this.

We can have multiple panels, e.g. Barrier In-Flight p50, Barrier In-Flight p99.

In each of the panel, we stack barrer_inflight_latency, barrier_sync_latency and barrier_wait_commit_latency together.

https://grafana.com/docs/grafana/next/visualizations/time-series/graph-time-series-stacking/

This might help us better understand what's going on.

image
like this.

@MrCroxx
Copy link
Contributor

MrCroxx commented Jul 19, 2022

Would you please add some comments about the new-introduced metrics to explain what duration it is measuring? Or the name looks kind of confusing. 🥰

src/meta/src/barrier/mod.rs Outdated Show resolved Hide resolved
src/meta/src/barrier/mod.rs Show resolved Hide resolved
src/storage/src/monitor/state_store_metrics.rs Outdated Show resolved Hide resolved
src/stream/src/task/stream_manager.rs Show resolved Hide resolved
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. I think we can merge this PR after addressing all comments.

grafana/risingwave-dashboard.py Outdated Show resolved Hide resolved
risedev.yml Outdated Show resolved Hide resolved
@xxhZs xxhZs requested a review from hzxa21 July 28, 2022 13:03
@hzxa21
Copy link
Collaborator

hzxa21 commented Aug 1, 2022

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Aug 1, 2022

refresh

✅ Pull request refreshed

@hzxa21
Copy link
Collaborator

hzxa21 commented Aug 1, 2022

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Aug 1, 2022

requeue

❌ This pull request head commit has not been previously disembarked from queue.

@mergify mergify bot merged commit 161cb22 into main Aug 1, 2022
@mergify mergify bot deleted the xxh/add_some_metrics branch August 1, 2022 06:16
nasnoisaac pushed a commit to nasnoisaac/risingwave that referenced this pull request Aug 9, 2022
…avelabs#3965)

* add metrics

* remove

* add doc

* add docs

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants