Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support validating FK constraints #12682

Merged
merged 2 commits into from
Jan 9, 2017
Merged

sql: support validating FK constraints #12682

merged 2 commits into from
Jan 9, 2017

Conversation

dt
Copy link
Member

@dt dt commented Jan 4, 2017

This change is Reviewable

@dt dt assigned maddyblue and knz Jan 4, 2017
@dt
Copy link
Member Author

dt commented Jan 4, 2017

Need to add a couple test cases for multiple-col FKs and cross-DB FKs before this one is ready to go, but wanted some early feedback on the JOIN approach.

Initially I was hesitant to go this route (potentially construction the join plan tree by hand instead) since I was worried about sprintf'ing all these names into a query just to then parse it back out, but in discussion with @knz we concluded that column and table names should be safe, and this is probably the easier to understand/maintain/optimize than a hand-written plan literal.

@maddyblue
Copy link
Contributor

Review status: 0 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/alter_table.go, line 313 at r1 (raw file):

					if idx.ForeignKey.IsSet() && idx.ForeignKey.Name == name {
						found = true
						id = idx.ID

break here?


pkg/sql/check.go, line 192 at r1 (raw file):

			srcColsSelects.WriteString(", ")
		}
		srcColsSelects.WriteString(fmt.Sprintf("s.%s", srcCols[i]))

I think we need to escape scrCols[i] here.


pkg/sql/check.go, line 201 at r1 (raw file):

			where.WriteString(" AND ")
		}
		on.WriteString(fmt.Sprintf("s.%s = t.%s", srcCols[i], targetCols[i]))

escape


pkg/sql/check.go, line 207 at r1 (raw file):

	query := fmt.Sprintf(
		`SELECT %s FROM %s AS s LEFT OUTER JOIN %s AS t ON %s WHERE %s LIMIT 1`,
		srcColsSelects.String(), srcName, targetName, on.String(), where.String(),

more escapes: srcName, targetName


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 5, 2017

Ok yeah the outer join will work but ... this will have terrible, terrible performance (and outright crash the node where the ALTER is issued if the table is large-ish).

May I suggest instead to

  1. in this PR if possible use issue instead ((SELECT a, b, c FROM s ORDER BY a, b, c) EXCEPT (SELECT x, y, z FROM t ORDER BY x, y, z)) LIMIT 1 (i.e. define an ORDER BY clause that orders both tables on the key columns, then EXCEPT one by the other). This will be (only slightly, until next step is done) more efficient than the join-based solution,
  2. file a follow-up issue to "optimize FK validation" where the description is "two steps: A) change the generated ORDER BY clauses to use an existing index when possible then B) optimize UNION+INTERSECT+EXCEPT to a constant-space ordered merge if the input operands have the same ordering". (That's an optimization very much tractable for unionNode, but farther from tractable for joins)

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/check.go, line 192 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

I think we need to escape scrCols[i] here.

nb: to escape do:

srcColsSelects.WriteString("s.")
parser.Name(srcCols[i]).Format(srcColsSelect, parser.FmtSimple)

pkg/sql/check.go, line 201 at r1 (raw file):

Previously, mjibson (Matt Jibson) wrote…

escape

see above


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 5, 2017

On second though, elide the ORDER BY entirely for step 1 (UNION already does this via a map), and change step 2 to add an ORDER BY clause that forces the index selection sufficient to find the same ordering on both sides.


Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 5, 2017

And another comment, now further from my league so it's more discussion.

I'm not completely fond of mandating the FK validation in all cases. This very much makes it intractable to add a foreign key constraint on a large table: for all matters and purposes the synchronous validation would take forever and the txn would likely abort / connection drop / whatever before the validation completes.

From a UX perspective what I'd really like to have is:

  1. an option to choose between synchronous and asynchronous validation, with asynchronous being the default
  2. with asynchronous validation, have validation done as a background job, and report in the UI the job details, including a nice error sign and message whether/when an FK violation was detected

@dt
Copy link
Member Author

dt commented Jan 5, 2017

@knz To your UX concern: maybe it wasn't entirely clear from the logic test, but this does exactly what you described: new constraints are added quickly, in an "unvalidated" state, meaning they're enforced on any new writes but they're not known to hold for existing table data, until a second validate step is run on them, that scans for current violations and, if it finds none, marks them as validated.

Currently the two steps are separate SQL commands: add constraint and validate constraint. Once the pieces work individually, add constraint will default to attempting to do the second (though the grammar allows skipping it AFAIK).

Due to our distributed nature, we'll actually always be in the async case though -- we change the table to add the unvalidated constraint and wait for all nodes to be aware of that so no new violations can creep in, then we start scanning to do the validation. Like other schema changes, we can make this appear to be synchronous to the connection issuing the schema change unless they request otherwise, but it needs to be async under the hood.

As far as using an existing index: the column names here are coming from an index descriptor, so I'd expect this to be trivial for index selection, and since I'm not specifying an ORDER shouldn't it just pick index order?

@knz
Copy link
Contributor

knz commented Jan 5, 2017 via email

add multicolunm, cross-db and escaped identifier FKs to the existing
mock-shopping test data example (and fix a couple issues uncovered
by them).
@dt
Copy link
Member Author

dt commented Jan 6, 2017

Added more tests for cross-db, escaped identifier, and critically multi-col FKs.

Reworked the existing JOIN a bit to add the escaping and handle NULLs a bit better, but NULL comparisons bubbling up are making it a little gross -- ideally for this one, I want null to compare to null as true and to anything else as false.

Thinking about the UNION/EXCEPT alternative to JOIN, it isn't forced that both indexes are the same direction, so I'm not sure we can actually do better?

@knz
Copy link
Contributor

knz commented Jan 7, 2017

The proper way forward is to implement a new indexJoinNode which returns rows for every row on the left which does not match on the right, and use that. We don't have this yet but again we must open a follow up issue to find a solution in O(1) space. The current solution would simply ... not work for large tables!

@knz
Copy link
Contributor

knz commented Jan 7, 2017

Reviewed 5 of 5 files at r2, 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


Comments from Reviewable

@dt
Copy link
Member Author

dt commented Jan 9, 2017

Hm, would we need to go back to a plan literal here to force the usage of such a node?

Also, sadly, unless we force the indexes to be in the same order, O(1) space means doing O(n) point lookups.

If in normal operation the common case is single-row insert/update/delete ops, then the flexibility being able to use existing indexes -- even if they differ in ordering -- is nice, but it certainly makes checking multi-row batches (like this ALTER TABLE or even just a large UPDATE stmt) harder.

I'm tempted to say we should require matching ordering of indexes by default and then, maybe, add a 'yes i know the order doesn't match' option to explicitly say you're fine with the o(n) point-lookup validation behavior.

@knz
Copy link
Contributor

knz commented Jan 9, 2017

I believe your analysis is correct and I agree with your suggestions but this discussion should really be added as comment on a new follow-up issue with label "investigation". Cheers!

@dt
Copy link
Member Author

dt commented Jan 9, 2017

@knz with #12780 filed, is this good to go as-is?

@knz
Copy link
Contributor

knz commented Jan 9, 2017

Yes!

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants