-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support validating FK constraints #12682
Conversation
Need to add a couple test cases for multiple-col FKs and cross-DB FKs before this one is ready to go, but wanted some early feedback on the JOIN approach. Initially I was hesitant to go this route (potentially construction the join plan tree by hand instead) since I was worried about sprintf'ing all these names into a query just to then parse it back out, but in discussion with @knz we concluded that column and table names should be safe, and this is probably the easier to understand/maintain/optimize than a hand-written plan literal. |
Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sql/alter_table.go, line 313 at r1 (raw file):
pkg/sql/check.go, line 192 at r1 (raw file):
I think we need to escape scrCols[i] here. pkg/sql/check.go, line 201 at r1 (raw file):
escape pkg/sql/check.go, line 207 at r1 (raw file):
more escapes: srcName, targetName Comments from Reviewable |
Ok yeah the outer join will work but ... this will have terrible, terrible performance (and outright crash the node where the ALTER is issued if the table is large-ish). May I suggest instead to
Reviewed 3 of 3 files at r1. pkg/sql/check.go, line 192 at r1 (raw file): Previously, mjibson (Matt Jibson) wrote…
nb: to escape do: srcColsSelects.WriteString("s.")
parser.Name(srcCols[i]).Format(srcColsSelect, parser.FmtSimple) pkg/sql/check.go, line 201 at r1 (raw file): Previously, mjibson (Matt Jibson) wrote…
see above Comments from Reviewable |
On second though, elide the ORDER BY entirely for step 1 (UNION already does this via a map), and change step 2 to add an ORDER BY clause that forces the index selection sufficient to find the same ordering on both sides. Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. Comments from Reviewable |
And another comment, now further from my league so it's more discussion. I'm not completely fond of mandating the FK validation in all cases. This very much makes it intractable to add a foreign key constraint on a large table: for all matters and purposes the synchronous validation would take forever and the txn would likely abort / connection drop / whatever before the validation completes. From a UX perspective what I'd really like to have is:
|
@knz To your UX concern: maybe it wasn't entirely clear from the logic test, but this does exactly what you described: new constraints are added quickly, in an "unvalidated" state, meaning they're enforced on any new writes but they're not known to hold for existing table data, until a second Currently the two steps are separate SQL commands: add constraint and validate constraint. Once the pieces work individually, Due to our distributed nature, we'll actually always be in the async case though -- we change the table to add the unvalidated constraint and wait for all nodes to be aware of that so no new violations can creep in, then we start scanning to do the validation. Like other schema changes, we can make this appear to be synchronous to the connection issuing the schema change unless they request otherwise, but it needs to be async under the hood. As far as using an existing index: the column names here are coming from an index descriptor, so I'd expect this to be trivial for index selection, and since I'm not specifying an ORDER shouldn't it just pick index order? |
To your UX concern: maybe it wasn't
entirely clear from the logic test, but this does exactly what you
described: new constraints are added quickly, in an "unvalidated" state,
meaning they're enforced on any new writes but they're not known to hold
for existing table data, until a second |validate| step is run on them,
that scans for current violations and, if it finds none, marks them as
validated.
I had missed that from the diff. Thanks for explaining. It's excellent,
I love this design!
Mind adding an explanatory text to this effect in the commit message too?
As far as using an existing index: the column names here are coming from
an index descriptor, so I'd expect this to be trivial for index
selection, and since I'm not specifying an ORDER shouldn't it just pick
index order?
I'm not sure what is done currently. :)
I know for sure that if you specify ORDER BY and there's a matching
index, then the index is picked and the ORDER BY clause is elided.
Now if you're telling me the columns are always coming from an index
descriptor then please also specify this index in your generated SQL
query with the syntax `tablename@indexname` so that the order is
guaranteed. That will not yet optimize the join/union but it's 50% of
the job to get there, there won't be any toying needed with ORDER BY
this way.
…--
Raphael 'kena' Poss
|
add multicolunm, cross-db and escaped identifier FKs to the existing mock-shopping test data example (and fix a couple issues uncovered by them).
Added more tests for cross-db, escaped identifier, and critically multi-col FKs. Reworked the existing JOIN a bit to add the escaping and handle NULLs a bit better, but NULL comparisons bubbling up are making it a little gross -- ideally for this one, I want null to compare to null as true and to anything else as false. Thinking about the |
The proper way forward is to implement a new |
Reviewed 5 of 5 files at r2, 3 of 3 files at r3. Comments from Reviewable |
Hm, would we need to go back to a plan literal here to force the usage of such a node? Also, sadly, unless we force the indexes to be in the same order, O(1) space means doing O(n) point lookups. If in normal operation the common case is single-row insert/update/delete ops, then the flexibility being able to use existing indexes -- even if they differ in ordering -- is nice, but it certainly makes checking multi-row batches (like this ALTER TABLE or even just a large UPDATE stmt) harder. I'm tempted to say we should require matching ordering of indexes by default and then, maybe, add a 'yes i know the order doesn't match' option to explicitly say you're fine with the o(n) point-lookup validation behavior. |
I believe your analysis is correct and I agree with your suggestions but this discussion should really be added as comment on a new follow-up issue with label "investigation". Cheers! |
Yes! LGTM 👍 |
This change is