-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add basic metrics of multilevel task queue #27
Conversation
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Why not use rust-prometheus instead? In that case, users of the library does even have to make their own wrappers. |
@BusyJay We can't assume the user uses prometheus. |
They can still access the data via prometheus's APIs. /cc @breeswish, is it easy for other metrics implement to collaborate with rust-prometheus? |
I think it mightn't be a good idea to use prometheus-rust in a library. Because the registry needs to interact with the metrics. If the version of the registry differs from that of the metrics, I believe it's likely to be true, then using prometheus in the library cannot bring any benefit. |
Why need to use registry? What I mean is use prometheus' counters and histograms. It should behave like rocksdb's metrics: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/statistics.h. |
Then what's the benefit of using prometheus's counters, just an atomic wrapper? |
It is easy for us to unify the version of rust-prometheus used by tikv and this library. But it doesn't change that it's generally not a good idea to use rust-prometheus in a library. |
The benefit is that:
I think it's very similar with log crate. log crate also has similar layout but it doesn't prevent it from being widely used by both libraries and binaries. I don't think versions are the defeater of using another library. |
Ok. I can change to use prometheus if you have strong opinion. PS: The log crate uses a trick to deal with compatibility. log 0.3 depends on log 0.4 so the types can be used across versions. We can apply this trick on rust-prometheus too, I think. /cc @breeswish |
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@BusyJay PTAL again. |
let total = self.total_elapsed.0.load(SeqCst); | ||
if Duration::from_micros(total) < ADJUST_CHANCE_INTERVAL { | ||
// Another thread just adjusted the chances. | ||
let total = self.total_elapsed_us.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the order will be Relaxed
for metrics, is it OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK as soon as the time of each handling is not very long so that the inaccuracy doesn't have much effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worry that total_diff
can be zero as reorder is allowed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If total_diff
is zero, the function returns in L297. The calculation is different here and there is no risk of div by 0 like before. Anything else worrying?
// Another thread just adjusted the chances. | ||
let total = self.total_elapsed_us.get(); | ||
let total_diff = total - self.last_total_elapsed_us.get(); | ||
if total_diff < ADJUST_CHANCE_INTERVAL_US { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the metrics report interval should be less than ADJUST_CHANCE_INTERVAL_US
, otherwise it can be always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't get what you mean. Reporting the metrics shoudn't reset its value and last_total_elapsed_us
is only set at L304.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, turns out I misunderstood the processing of metrics.
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@AndreMouche @breeswish PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
const MIN_LEVEL0_CHANCE: u32 = 1 << 31; // 0.5 | ||
const MAX_LEVEL0_CHANCE: u32 = 4_209_067_949; // 0.98 | ||
const ADJUST_AMOUNT: u32 = (MAX_LEVEL0_CHANCE - MIN_LEVEL0_CHANCE) / 8; // 0.06 | ||
const INIT_LEVEL0_CHANCE: f64 = 0.8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some description about this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add some comments to it. PTAL
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It's quite useful to get how long tasks use and the chance of being scheduled of each level to investigate issues about the effect of multilevel feedback scheduling.