sql: support validating FK constraints #12682

dt · 2017-01-04T20:53:20Z

This change is

dt · 2017-01-04T20:54:46Z

Need to add a couple test cases for multiple-col FKs and cross-DB FKs before this one is ready to go, but wanted some early feedback on the JOIN approach.

Initially I was hesitant to go this route (potentially construction the join plan tree by hand instead) since I was worried about sprintf'ing all these names into a query just to then parse it back out, but in discussion with @knz we concluded that column and table names should be safe, and this is probably the easier to understand/maintain/optimize than a hand-written plan literal.

maddyblue · 2017-01-04T21:10:18Z

Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/alter_table.go, line 313 at r1 (raw file):

					if idx.ForeignKey.IsSet() && idx.ForeignKey.Name == name {
						found = true
						id = idx.ID

break here?

pkg/sql/check.go, line 192 at r1 (raw file):

			srcColsSelects.WriteString(", ")
		}
		srcColsSelects.WriteString(fmt.Sprintf("s.%s", srcCols[i]))

I think we need to escape scrCols[i] here.

pkg/sql/check.go, line 201 at r1 (raw file):

			where.WriteString(" AND ")
		}
		on.WriteString(fmt.Sprintf("s.%s = t.%s", srcCols[i], targetCols[i]))

escape

pkg/sql/check.go, line 207 at r1 (raw file):

	query := fmt.Sprintf(
		`SELECT %s FROM %s AS s LEFT OUTER JOIN %s AS t ON %s WHERE %s LIMIT 1`,
		srcColsSelects.String(), srcName, targetName, on.String(), where.String(),

more escapes: srcName, targetName

Comments from Reviewable

knz · 2017-01-05T12:55:52Z

Ok yeah the outer join will work but ... this will have terrible, terrible performance (and outright crash the node where the ALTER is issued if the table is large-ish).

May I suggest instead to

in this PR if possible use issue instead ((SELECT a, b, c FROM s ORDER BY a, b, c) EXCEPT (SELECT x, y, z FROM t ORDER BY x, y, z)) LIMIT 1 (i.e. define an ORDER BY clause that orders both tables on the key columns, then EXCEPT one by the other). This will be (only slightly, until next step is done) more efficient than the join-based solution,
file a follow-up issue to "optimize FK validation" where the description is "two steps: A) change the generated ORDER BY clauses to use an existing index when possible then B) optimize UNION+INTERSECT+EXCEPT to a constant-space ordered merge if the input operands have the same ordering". (That's an optimization very much tractable for unionNode, but farther from tractable for joins)

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/check.go, line 192 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

I think we need to escape scrCols[i] here.

nb: to escape do:

srcColsSelects.WriteString("s.")
parser.Name(srcCols[i]).Format(srcColsSelect, parser.FmtSimple)

pkg/sql/check.go, line 201 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

escape

see above

Comments from Reviewable

knz · 2017-01-05T12:57:19Z

On second though, elide the ORDER BY entirely for step 1 (UNION already does this via a map), and change step 2 to add an ORDER BY clause that forces the index selection sufficient to find the same ordering on both sides.

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

Comments from Reviewable

knz · 2017-01-05T13:04:05Z

And another comment, now further from my league so it's more discussion.

I'm not completely fond of mandating the FK validation in all cases. This very much makes it intractable to add a foreign key constraint on a large table: for all matters and purposes the synchronous validation would take forever and the txn would likely abort / connection drop / whatever before the validation completes.

From a UX perspective what I'd really like to have is:

an option to choose between synchronous and asynchronous validation, with asynchronous being the default
with asynchronous validation, have validation done as a background job, and report in the UI the job details, including a nice error sign and message whether/when an FK violation was detected

dt · 2017-01-05T14:30:07Z

@knz To your UX concern: maybe it wasn't entirely clear from the logic test, but this does exactly what you described: new constraints are added quickly, in an "unvalidated" state, meaning they're enforced on any new writes but they're not known to hold for existing table data, until a second validate step is run on them, that scans for current violations and, if it finds none, marks them as validated.

Currently the two steps are separate SQL commands: add constraint and validate constraint. Once the pieces work individually, add constraint will default to attempting to do the second (though the grammar allows skipping it AFAIK).

Due to our distributed nature, we'll actually always be in the async case though -- we change the table to add the unvalidated constraint and wait for all nodes to be aware of that so no new violations can creep in, then we start scanning to do the validation. Like other schema changes, we can make this appear to be synchronous to the connection issuing the schema change unless they request otherwise, but it needs to be async under the hood.

As far as using an existing index: the column names here are coming from an index descriptor, so I'd expect this to be trivial for index selection, and since I'm not specifying an ORDER shouldn't it just pick index order?

knz · 2017-01-05T14:40:20Z

To your UX concern: maybe it wasn't entirely clear from the logic test, but this does exactly what you described: new constraints are added quickly, in an "unvalidated" state, meaning they're enforced on any new writes but they're not known to hold for existing table data, until a second |validate| step is run on them, that scans for current violations and, if it finds none, marks them as validated.

I had missed that from the diff. Thanks for explaining. It's excellent, I love this design! Mind adding an explanatory text to this effect in the commit message too?

As far as using an existing index: the column names here are coming from an index descriptor, so I'd expect this to be trivial for index selection, and since I'm not specifying an ORDER shouldn't it just pick index order?

I'm not sure what is done currently. :) I know for sure that if you specify ORDER BY and there's a matching index, then the index is picked and the ORDER BY clause is elided. Now if you're telling me the columns are always coming from an index descriptor then please also specify this index in your generated SQL query with the syntax `tablename@indexname` so that the order is guaranteed. That will not yet optimize the join/union but it's 50% of the job to get there, there won't be any toying needed with ORDER BY this way.

…

-- Raphael 'kena' Poss

add multicolunm, cross-db and escaped identifier FKs to the existing mock-shopping test data example (and fix a couple issues uncovered by them).

dt · 2017-01-06T15:17:18Z

Added more tests for cross-db, escaped identifier, and critically multi-col FKs.

Reworked the existing JOIN a bit to add the escaping and handle NULLs a bit better, but NULL comparisons bubbling up are making it a little gross -- ideally for this one, I want null to compare to null as true and to anything else as false.

Thinking about the UNION/EXCEPT alternative to JOIN, it isn't forced that both indexes are the same direction, so I'm not sure we can actually do better?

knz · 2017-01-07T09:22:20Z

The proper way forward is to implement a new indexJoinNode which returns rows for every row on the left which does not match on the right, and use that. We don't have this yet but again we must open a follow up issue to find a solution in O(1) space. The current solution would simply ... not work for large tables!

knz · 2017-01-07T09:25:25Z

Reviewed 5 of 5 files at r2, 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

Comments from Reviewable

dt · 2017-01-09T15:21:01Z

Hm, would we need to go back to a plan literal here to force the usage of such a node?

Also, sadly, unless we force the indexes to be in the same order, O(1) space means doing O(n) point lookups.

If in normal operation the common case is single-row insert/update/delete ops, then the flexibility being able to use existing indexes -- even if they differ in ordering -- is nice, but it certainly makes checking multi-row batches (like this ALTER TABLE or even just a large UPDATE stmt) harder.

I'm tempted to say we should require matching ordering of indexes by default and then, maybe, add a 'yes i know the order doesn't match' option to explicitly say you're fine with the o(n) point-lookup validation behavior.

knz · 2017-01-09T15:24:40Z

I believe your analysis is correct and I agree with your suggestions but this discussion should really be added as comment on a new follow-up issue with label "investigation". Cheers!

dt · 2017-01-09T16:13:21Z

@knz with #12780 filed, is this good to go as-is?

knz · 2017-01-09T16:19:02Z

Yes!

LGTM 👍

dt assigned maddyblue and knz Jan 4, 2017

knz mentioned this pull request Jan 5, 2017

sql: make it possible to refer to dbs/tables/indices by number #12715

Closed

dt force-pushed the fk branch from 947a8f4 to b309c22 Compare January 5, 2017 23:13

sql: More FK tests

b3e11b2

add multicolunm, cross-db and escaped identifier FKs to the existing mock-shopping test data example (and fix a couple issues uncovered by them).

dt force-pushed the fk branch from b309c22 to 1045601 Compare January 6, 2017 15:04

dt force-pushed the fk branch from 1045601 to 40f5c2b Compare January 6, 2017 15:18

sql: add support for VALIDATE of FK constraint

38e0068

dt force-pushed the fk branch from 40f5c2b to 38e0068 Compare January 9, 2017 15:32

dt mentioned this pull request Jan 9, 2017

sql: validation of FK relationships in fixed space #12780

Closed

dt merged commit d265531 into cockroachdb:master Jan 9, 2017

dt deleted the fk branch January 9, 2017 21:05

jseldess mentioned this pull request Jan 12, 2017

sql: ALTER TABLE .. VALIDATE CONSTRAINT substatement cockroachdb/docs#990

Closed

jseldess added the docs-todo label Jan 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support validating FK constraints #12682

sql: support validating FK constraints #12682

dt commented Jan 4, 2017 •

edited by rjnn

Loading

dt commented Jan 4, 2017 •

edited

Loading

maddyblue commented Jan 4, 2017

knz commented Jan 5, 2017

knz commented Jan 5, 2017

knz commented Jan 5, 2017

dt commented Jan 5, 2017

knz commented Jan 5, 2017 via email

dt commented Jan 6, 2017

knz commented Jan 7, 2017

knz commented Jan 7, 2017

dt commented Jan 9, 2017

knz commented Jan 9, 2017

dt commented Jan 9, 2017

knz commented Jan 9, 2017

sql: support validating FK constraints #12682

sql: support validating FK constraints #12682

Conversation

dt commented Jan 4, 2017 • edited by rjnn Loading

dt commented Jan 4, 2017 • edited Loading

maddyblue commented Jan 4, 2017

knz commented Jan 5, 2017

knz commented Jan 5, 2017

knz commented Jan 5, 2017

dt commented Jan 5, 2017

knz commented Jan 5, 2017 via email

dt commented Jan 6, 2017

knz commented Jan 7, 2017

knz commented Jan 7, 2017

dt commented Jan 9, 2017

knz commented Jan 9, 2017

dt commented Jan 9, 2017

knz commented Jan 9, 2017

dt commented Jan 4, 2017 •

edited by rjnn

Loading

dt commented Jan 4, 2017 •

edited

Loading