-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-34199] Add tracing for durations of rescaling/restoring RocksDB incremental checkpoints from downloaded and local state #24168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the purpose this PR:
- from the title and docs, it seems to be to measure the restore time for local state
- but the implementation measures restore time for all state
Or am I missing something?
Regardless, I think the latter (measure restore times) makes sense.
Besides of that, measuring remote and/or local state size would be more useful than restore time from local state.
WDYT?
...ava/org/apache/flink/contrib/streaming/state/restore/RocksDBIncrementalRestoreOperation.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricNames.java
Outdated
Show resolved
Hide resolved
I think the description is correct, depending on you point of view. Recovery today has two phases:
And the PR is about measuring the 2nd phase (after we have already duration for the first). |
Besides that, I already merged a PR that measures local/remote state sizes. And in our meeting we found that all of the metrics are useful :) |
ab7a654
to
268439e
Compare
Right 😅 |
I wrote the docs slightly different, hope that makes it clearer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
LGTM
a15b23c
to
c72b27d
Compare
… is available. Then this commit should be reverted.
…local and downloaded remote state).
c72b27d
to
bc5b4a1
Compare
What is the purpose of the change
This PR extends #24031 with traces for the rescaling/restore times from local state.
Brief change log
(for example:)
Verifying this change
Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation