-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the frequency at which heartbeats update the _vt.vreplication table configurable #7659
Make the frequency at which heartbeats update the _vt.vreplication table configurable #7659
Conversation
…ble a vttablet configuration Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep an upper bound on that interval. The maximum interval should not exceed one minute. The reason is that we must have some tracking. Specifically, OnlineDDL looks at the update timestamp to say "this is healthy/active".
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Thanks that was a good point. I added an upper bound. I have enforced it in code at the moment. I couldn't see any validations of flags in vttablet: I would ideally like to fail the vttablet launch itself if a higher value is provided so that we are not silently fixing it. |
Mandatory flags are checked in vttablet.go. For example:
|
This are for flags defined in that cmd itself. The ones in vreplicator.go are not in scope here or in tm_init.go. It seems like an overkill to validate this inside vreplication and fail a vreplication stream. Maybe I just add this to the flags documentation for now? |
Let us log a warning if the user provided value is being overridden by the limit - either when starting the VR engine, or when a stream is created. If it turns out that people are being affected and not realizing it we can revisit whether to fail vttablet startup. |
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
added the log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Description
Motivation
For an idle source shard, the source vstreamer sends a heartbeat. Currently that is once per second. On receiving the heartbeat the target vreplication module updates the time_updated column of the relevant row of _vt.vreplication. For some setups this is a problem, for example:
See related issues for more information.
Implementation
This PR provides a tablet level configuration parameter
-vreplication_heartbeat_update_interval
which determines how often the time_updated column is updated if there are no real events on the source and the source. Default is 1 second. Keep this low if you expect high QPS and are monitoring this column to alert about potential outages. Keep this high if you experience any of the issues mentioned above.Notes
This is an initial light-weight implementation to address immediate concerns of affected users.
This also doesn't address the problem of long-running workflows (like materialize or reverse replication) for high qps systems.
To address this, in the long term, we will be refactoring the vreplication table so that the static part (including the source proto) will be separated from the dynamic part (position, timestamps) , so as to minimize the size of binlog entries.
The
VReplicationHeartbeat
metric will continue to be updated at 1 second, so for systems using this metric for monitoring, you will continue to get 1-second granularity for alerts.Signed-off-by: Rohit Nayak rohit@planetscale.com
Related Issue(s)
Checklist
Impacted Areas in Vitess
Components that this PR will affect: