-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement heartbeat metrics #183
Conversation
How many different heartbeat implementations are there out there? |
pt-heartbeat has been there for a very long time; it is widely used. |
@brian-brazil : Given these 2 new metrics, can we easily get the time difference with them? |
Could be interesting. In what way is |
|
4bd2443
to
956bd82
Compare
Fixed:
|
@grypyrg Awesome, thanks for documenting those issues here. |
Even with second-only precision hearbeats, it's still worth keeping the float data as is valid. Especially if Maybe for compatibility, we should implement a It's really too bad that |
collector/pt_heartbeat.go
Outdated
ptHeartbeat = "pt_heartbeat" | ||
// pt-heartbeat query | ||
// the second column allows gets the server timestamp at the exact same time the query is run | ||
ptHeartbeatQuery = "SELECT UNIX_TIMESTAMP(ts), UNIX_TIMESTAMP(CONCAT(DATE(NOW()), ' ', CURTIME(6))), server_id from %s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified with UNIX_TIMESTAMP(NOW(6))
. I tested on 5.5 and 5.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@SuperQ pt-heartbeat'i very initial version dates from https://sourceforge.net/p/maatkit/code/857/ 2007-09-05. The latest version is the result of almost 10 years of maturing. Older releases were using timestamps, and that got changed to VARCHAR later. I could not track back the reason. I do agree that a Go version of that tool might be handy, but is out of scope of this pull request. |
collector/pt_heartbeat.go
Outdated
"Timestamp of the current server.", | ||
[]string{"server_id"}, nil, | ||
) | ||
PtHeartbeatDelayDesc = prometheus.NewDesc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric isn't necessary (and against Prometheus design) since you can do this in PromQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grypyrg That delay can be made at the prometheus level using https://prometheus.io/docs/querying/rules/ . Once this patch gets in, maybe we can document an example recording rule for this on a blogpost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
README.md
Outdated
@@ -66,6 +66,8 @@ collect.perf_schema.indexiowaits | 5.6 | Collect | |||
collect.perf_schema.tableiowaits | 5.6 | Collect metrics from performance_schema.table_io_waits_summary_by_table. | |||
collect.perf_schema.tablelocks | 5.6 | Collect metrics from performance_schema.table_lock_waits_summary_by_table. | |||
collect.slave_status | 5.1 | Collect from SHOW SLAVE STATUS (Enabled by default) | |||
collect.pt-heartbeat | 5.1 | Collect from pt-heartbeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to call this heartbeat
, as there could be other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed everywhere
Overall, I like this idea, some comments in-line. |
Should I rename ptHeartbeat to heartbeat everywhere or just in the CLI flags? |
Updated after latest comments. @SuperQ do you want more documentation for this in README? |
@roidelapluie Totally agree, re-implementing is out of scope for this feature. I was mostly just thinking out loud. Yes, please remove |
c5685e4
to
dab4a26
Compare
collector/heartbeat.go
Outdated
// timestamps. %s will be replaced by the table name. | ||
// The second column allows gets the server timestamp at the exact same | ||
// time the query is run. | ||
heartbeatQuery = "SELECT UNIX_TIMESTAMP(ts), UNIX_TIMESTAMP(NOW(6)), server_id from `%s`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SuperQ how can we prevent this to turn into sql injection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently not much golang/go#18478
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split in two: database and table, for better escaping
@roidelapluie, @SuperQ :
|
And maybe fetch another column to precalculate the
Example output (to also show it's subsecond):
|
Updated with all the comments taken in consideration; also split database and table parameters so we can better escape them. |
Maybe make a parameter for the column? |
Two parameters then. Is that complexity needed for an initial release? |
No, not necssary, it was just an idea. I think it's fine to keep the columns hard-coded. |
collector/heartbeat.go
Outdated
} | ||
defer heartbeatRows.Close() | ||
|
||
var now sql.RawBytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
var (
now, ts sql.RawBytes
serverId int
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides one Go nit.
Go nit fixed. |
pt-heartbeat measures actual replication lab on a mysql server. It does not rely on data stored in SHOW SLAVE STATUS, as it can be unreliable. pt-heartbeat website: https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
Wording fixed in README.md and mysqld_exporter.go, so for me it is good to merge. |
Yup, looks fine, thanks. |
Thanks @roidelapluie ! |
For reference, promsql query: mysql_heartbeat_now_timestamp_seconds - on (server_id) mysql_heartbeat_stored_timestamp_seconds |
Normally Prometheus will be able to match up the labels and you don't need to specify the |
Oh thanks. Nice trick. |
Heartbeat[0] metrics allow for similar alerting symantics to "seconds behind master" metrics. * Add a recording rule to provide a similar gauge of seconds behind. * Add an alert with the same style. [0]: #183
Heartbeat[0] metrics allow for similar alerting symantics to "seconds behind master" metrics. * Add a recording rule to provide a similar gauge of seconds behind. * Add an alert with the same style. [0]: #183
* [FEATURE] Add read/write query response time #166 * [FEATURE] Add Galera gcache size metric #169 * [FEATURE] Add MariaDB multi source replication support #178 * [FEATURE] Implement heartbeat metrics #183 * [FEATURE] Add basic file_summary_by_instance metrics #189 * [BUGFIX] Workaround MySQL bug 79533 #173
* [FEATURE] Add read/write query response time #166 * [FEATURE] Add Galera gcache size metric #169 * [FEATURE] Add MariaDB multi source replication support #178 * [FEATURE] Implement heartbeat metrics #183 * [FEATURE] Add basic file_summary_by_instance metrics #189 * [BUGFIX] Workaround MySQL bug 79533 #173
* [FEATURE] Add read/write query response time #166 * [FEATURE] Add Galera gcache size metric #169 * [FEATURE] Add MariaDB multi source replication support #178 * [FEATURE] Implement heartbeat metrics #183 * [FEATURE] Add basic file_summary_by_instance metrics #189 * [BUGFIX] Workaround MySQL bug 79533 #173
* [FEATURE] Add read/write query response time #166 * [FEATURE] Add Galera gcache size metric #169 * [FEATURE] Add MariaDB multi source replication support #178 * [FEATURE] Implement heartbeat metrics #183 * [FEATURE] Add basic file_summary_by_instance metrics #189 * [BUGFIX] Workaround MySQL bug 79533 #173
pt-heartbeat measures actual replication lab on a mysql server. It does
not rely on data stored in SHOW SLAVE STATUS, as it can be unreliable.
pt-heartbeat website:
https://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.html
Signed-off-by: Julien Pivotto roidelapluie@inuits.eu