-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Elasticsearch types in Cockroachdb module #17736
Conversation
@@ -731,46 +731,11 @@ | |||
"metrics": [ | |||
{ | |||
"agg_with": "avg", | |||
"field": "prometheus.metrics.raft_process_logcommit_latency_count", | |||
"field": "prometheus.raft_process_logcommit_latency.histogram", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current dashboard is using sum and count to calculate the average of this value. I think it can make sense now to calculate percentiles, but I haven't managed to use histograms in TSVB yet. @exekias do you know if they are already supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works with other visualizations, I will go on with line graphs by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently only Visualize supports this type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced the graphs that were using sum and count to calculate averages and they are using 99th percentile now (as the CockroachDB admin UI does). It is quite ok now but the timings are in nanoseconds and I haven't found a way to format them.
Pinging @elastic/integrations-platforms (Team:Platforms) |
I have moved changes for fields validation to #17759 |
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero.
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
f34de4f
to
55e9891
Compare
…17783) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
…17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
@@ -4,3 +4,16 @@ input: | |||
metricset: collector | |||
defaults: | |||
metrics_path: /_status/vars | |||
use_types: true | |||
processors: | |||
- drop_fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this could make use of metrics_filters
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, as all sql_mem_sql...
metrics are duplicated in cockroachdb. But only histograms are problematic, the rest are gauges, and in principle Metricbeat doesn't have any problem with duplicated gauges.
Also, removing these metrics at this point ensures that they are never collected, even if the user sets its own metrics_filters
, or adds its own processors.
This pull request does not have a backport label. Could you fix it @jsoriano? 🙏
NOTE: |
This pull request is now in conflicts. Could you fix it? 🙏
|
@jsoriano - Closing this one as there were no activity for a while |
…lows (elastic#17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 2d6e0ca)
What does this PR do?
Use native Elasticsearch types for CockroachDB data so histograms are stored much more efficiently.
Adapt dashboard to use these types and some other fixes (see section about this below).
Why is it important?
Align CockroachDB module with latest Prometheus changes to leverage the use of new histogram type.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Run the CockroachDB module, check that dashboard works.
Related issues
Dashboard
There are some changes in dashboards: