Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add basic metrics of multilevel task queue #27

Merged
merged 6 commits into from
Feb 7, 2020

Conversation

sticnarf
Copy link
Contributor

@sticnarf sticnarf commented Jan 6, 2020

It's quite useful to get how long tasks use and the chance of being scheduled of each level to investigate issues about the effect of multilevel feedback scheduling.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@BusyJay
Copy link
Member

BusyJay commented Jan 6, 2020

Why not use rust-prometheus instead? In that case, users of the library does even have to make their own wrappers.

@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 6, 2020

@BusyJay We can't assume the user uses prometheus.

@BusyJay
Copy link
Member

BusyJay commented Jan 6, 2020

They can still access the data via prometheus's APIs.

/cc @breeswish, is it easy for other metrics implement to collaborate with rust-prometheus?

@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 6, 2020

I think it mightn't be a good idea to use prometheus-rust in a library. Because the registry needs to interact with the metrics. If the version of the registry differs from that of the metrics, I believe it's likely to be true, then using prometheus in the library cannot bring any benefit.

@BusyJay
Copy link
Member

BusyJay commented Jan 6, 2020

Why need to use registry? What I mean is use prometheus' counters and histograms. It should behave like rocksdb's metrics: https://github.com/facebook/rocksdb/blob/master/include/rocksdb/statistics.h.

@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 6, 2020

Then what's the benefit of using prometheus's counters, just an atomic wrapper?
If the versions don't match, the counters and histograms cannot be used with the prometheus registry that the user uses and it doesn't bring any more convenience. Why bother introduce a library only for its counter type?

@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 6, 2020

It is easy for us to unify the version of rust-prometheus used by tikv and this library. But it doesn't change that it's generally not a good idea to use rust-prometheus in a library.

@BusyJay
Copy link
Member

BusyJay commented Jan 6, 2020

Then what's the benefit of using prometheus's counters, just an atomic wrapper?

The benefit is that:

  1. we don't need to define our own types, such as counter, histogram or registry. For example, to check the performance of a benchmark, we can dump the metrics easily, instead of building a struct and customize its output from scratch;
  2. very easy to integrate with other system, all need to be done are just registry.register.

If the versions don't match, the counters and histograms cannot be used with the prometheus registry

I think it's very similar with log crate. log crate also has similar layout but it doesn't prevent it from being widely used by both libraries and binaries. I don't think versions are the defeater of using another library.

@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 6, 2020

Ok. I can change to use prometheus if you have strong opinion.

PS: The log crate uses a trick to deal with compatibility. log 0.3 depends on log 0.4 so the types can be used across versions. We can apply this trick on rust-prometheus too, I think. /cc @breeswish

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@sticnarf
Copy link
Contributor Author

sticnarf commented Jan 7, 2020

@BusyJay PTAL again.
I choose to use global static MetricsVec so if the user uses several thread pools at the same time, it is convenient to register the vec only once and get information of all thread pools.

let total = self.total_elapsed.0.load(SeqCst);
if Duration::from_micros(total) < ADJUST_CHANCE_INTERVAL {
// Another thread just adjusted the chances.
let total = self.total_elapsed_us.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the order will be Relaxed for metrics, is it OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK as soon as the time of each handling is not very long so that the inaccuracy doesn't have much effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worry that total_diff can be zero as reorder is allowed now.

Copy link
Contributor Author

@sticnarf sticnarf Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If total_diff is zero, the function returns in L297. The calculation is different here and there is no risk of div by 0 like before. Anything else worrying?

// Another thread just adjusted the chances.
let total = self.total_elapsed_us.get();
let total_diff = total - self.last_total_elapsed_us.get();
if total_diff < ADJUST_CHANCE_INTERVAL_US {
Copy link
Member

@BusyJay BusyJay Jan 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the metrics report interval should be less than ADJUST_CHANCE_INTERVAL_US, otherwise it can be always true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't get what you mean. Reporting the metrics shoudn't reset its value and last_total_elapsed_us is only set at L304.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, turns out I misunderstood the processing of metrics.

BusyJay
BusyJay previously approved these changes Jan 19, 2020
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@BusyJay
Copy link
Member

BusyJay commented Feb 6, 2020

@AndreMouche @breeswish PTAL

AndreMouche
AndreMouche previously approved these changes Feb 7, 2020
Copy link
Member

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

const MIN_LEVEL0_CHANCE: u32 = 1 << 31; // 0.5
const MAX_LEVEL0_CHANCE: u32 = 4_209_067_949; // 0.98
const ADJUST_AMOUNT: u32 = (MAX_LEVEL0_CHANCE - MIN_LEVEL0_CHANCE) / 8; // 0.06
const INIT_LEVEL0_CHANCE: f64 = 0.8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some description about this value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add some comments to it. PTAL

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Copy link
Member

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sticnarf sticnarf merged commit fe59cbc into tikv:master Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants