-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: add table for logging stream errors, state changes and key workflow steps #7831
Conversation
Sample output from local example
|
f41805b
to
d865b2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing!! I'm very excited to try this out ❤️
// LogStateChange is used when the state of the stream changes | ||
LogStateChange = "State Changed" | ||
|
||
// TODO: LogError is not used atm. Currently irrecoverable errors, resumable errors and informational messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'm not sure if you have any ideas about to do this more advanced error detection/categorization, but I'm happy to brainstorm with you if you want!
) | ||
|
||
const ( | ||
createVReplicationLog = `CREATE TABLE IF NOT EXISTS _vt.vreplication_log ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our experiments, we were seeing the _vt.vreplication_log
table created on the tablet but inaccessible from VReplicationExec
:
vitess@abe8006adf7f:/vt/local$ vtctlclient VreplicationExec zone1-0000000400 "select * from _vt.vreplication_log"
VreplicationExec Error: rpc error: code = Unknown desc = TabletManager.VReplicationExec on zone1-0000000400 error: invalid table name: _vt.vreplication_log: invalid table name: _vt.vreplication_log
E0427 15:05:50.761692 9477 main.go:72] remote error: rpc error: code = Unknown desc = TabletManager.VReplicationExec on zone1-0000000400 error: invalid table name: _vt.vreplication_log: invalid table name: _vt.vreplication_log
@ajm188 found that the switch statements in controller_plan.go
return "invalid table name" error as they've not been updated to include the _vt.vreplication_log
table. We were able to patch it like this (and, we only updated one of the four switch
statements for brevity):
diff --git a/go/vt/vttablet/tabletmanager/vreplication/controller_plan.go b/go/vt/vttablet/tabletmanager/vreplication/controller_plan.go
index ed42cf1839..c7ac3d45d4 100644
--- a/go/vt/vttablet/tabletmanager/vreplication/controller_plan.go
+++ b/go/vt/vttablet/tabletmanager/vreplication/controller_plan.go
@@ -133,6 +133,7 @@ func buildUpdatePlan(upd *sqlparser.Update) (*controllerPlan, error) {
}, nil
case vreplicationTableName:
// no-op
+
default:
return nil, fmt.Errorf("invalid table name: %v", sqlparser.String(upd.TableExprs))
}
@@ -222,7 +223,7 @@ func buildDeletePlan(del *sqlparser.Delete) (*controllerPlan, error) {
func buildSelectPlan(sel *sqlparser.Select) (*controllerPlan, error) {
switch sqlparser.String(sel.From) {
- case vreplicationTableName, reshardingJournalTableName, copyStateTableName:
+ case vreplicationTableName, reshardingJournalTableName, copyStateTableName, vreplicationLogTableName:
return &controllerPlan{
opcode: selectQuery,
}, nil
diff --git a/go/vt/vttablet/tabletmanager/vreplication/utils.go b/go/vt/vttablet/tabletmanager/vreplication/utils.go
index dd89ea3bd1..9f542b8d15 100644
--- a/go/vt/vttablet/tabletmanager/vreplication/utils.go
+++ b/go/vt/vttablet/tabletmanager/vreplication/utils.go
@@ -26,7 +26,8 @@ import (
)
const (
- createVReplicationLog = `CREATE TABLE IF NOT EXISTS _vt.vreplication_log (
+ vreplicationLogTableName = "_vt.vreplication_log"
+ createVReplicationLog = `CREATE TABLE IF NOT EXISTS _vt.vreplication_log (
id BIGINT(20) AUTO_INCREMENT,
vrepl_id INT NOT NULL,
type VARBINARY(256) NOT NULL,
... and then see the rows via VReplicationExec
:
vitess@61befc15be72:/vt/local$ vtctlclient VreplicationExec zone1-0000000400 "select count(*) from _vt.vreplication_log"
+----------+
| count(*) |
+----------+
| 6 |
+----------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hadn't thought of VReplicationExec
for this table. I think we should add it only for Select and possibly Delete, right? Insert/Update shouldn't be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, even the delete is hardcoded in multiple places for the vreplication table and will require a little more time and tests :( I am going to just make the same changes you made so it works with Select for now. Delete can be implemented in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating Select for now sounds great!!! 🎉 This will be amazing to have regardless!
// Enum values for type column of _vt.vreplication_log | ||
|
||
// LogStreamCreate is used when a row in _vt.vreplication is inserted via VReplicationExec | ||
LogStreamCreate = "Stream Created" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible/reasonable/idiomatic to make a proto enum for this? That way we also get a TypeScript enum generated for us. (Otherwise, I'll have to duplicate these strings on the front-end at some point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had punted on the enum: I think we discussed that this would be implemented as part of the vtadmin API around these logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting at VTAdmin time makes sense to me!
…icationExec and setMessage/State Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
1966c93
to
f57350c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again!!! 🎉 🎉 🎉 Can't wait to show you the end result in the UI!
Description
This PR adds a new table to the _vt sidecar database. The
_vt.vreplication_log
table will record certain types of events. To start with we recordThe value of the message will usually be a string. In the case of stream inserts it records all the parameters including the binlogsource field.
Before inserting a new row we check if the latest message matches the state and message of the new message and if it does then it increments the count. This prevents any repeated errors from spamming the table.
Tests
When I tried adding the queries related to the vreplication_log to expect them during other vreplication tests it lead to a huge number of updates to existing tests and also created race conditions and other flakiness. For now we ignore these queries
during the workflow and test the log insertion code separately.
Checklist
Impacted Areas in Vitess
Components that this PR will affect: