Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: bug when setting a column NOT NULL and then dropping it #47719

Closed
thoszhang opened this issue Apr 20, 2020 · 3 comments · Fixed by #62167
Closed

sql: bug when setting a column NOT NULL and then dropping it #47719

thoszhang opened this issue Apr 20, 2020 · 3 comments · Fixed by #62167
Assignees

Comments

@thoszhang
Copy link
Contributor

This was uncovered from looking at logs for TestRandomSyntaxSchemaChangeColumn. If a column is dropped while a schema change for SET NOT NULL is in progress on the same column, the validation for the non-null column (which is a SQL query) can fail with the error column %q does not exist. The column drop can be queued after SET NOT NULL with a different mutation ID, or it can even happen if the two schema changes happen in the same transaction:

root@127.0.0.1:62055/movr> create table t (a int);
CREATE TABLE

Time: 3.051ms

root@127.0.0.1:62055/movr> begin; alter table t alter column a set not null; alter table t drop column a; commit;
ERROR: transaction committed but schema change aborted with error: (42703): validation of NOT NULL constraint failed: validate check constraint: column "a" does not exist
SQLSTATE: XXA00
HINT: Some of the non-DDL statements may have committed successfully, but some of the DDL statement(s) failed.
Manual inspection may be required to determine the actual state of the database.
--
See: https://github.com/cockroachdb/cockroach/issues/42061

This particular bug is not very severe; failing with an error is what we should be doing in this case, even though it's not a very good error. The larger question is what we can do to avoid bugs related to interactions between in-progress schema changes and schema changes that are later in line. We're currently dealing with these interactions on an ad-hoc basis, which is going to inevitably lead to similar bugs.

@thoszhang
Copy link
Contributor Author

This would be fixed with the approach outlined in #47989 (comment).

@ajwerner
Copy link
Contributor

ajwerner commented Mar 9, 2021

It turns out this is much more severe if the DROP COLUMN is concurrent but in a different transaction and thus has a different mutation ID.

@ajwerner
Copy link
Contributor

ajwerner commented Mar 9, 2021

@fqazi see #61500 (comment).

I think we can do something about this without too too much effort. Namely I think we want to skip over some set of mutations which are made irrelevant by some set of later mutations. Specifically I'm thinking about constraints like not null and check that have to do with a now dropped column.

It's possible that we should just straight up drop those mutations when we're dropping the column. Then, we can augment that by being a bit more careful about the code that relies on those mutations existing. All of this is unsatisfying but I think these bugs are worth doing something about even if it's only for the short term. We'll want to backport whatever we do here I suspect.

fqazi added a commit to fqazi/cockroach that referenced this issue Mar 17, 2021
Fixes: cockroachdb#47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.
fqazi added a commit to fqazi/cockroach that referenced this issue Mar 18, 2021
Fixes: cockroachdb#47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.
craig bot pushed a commit that referenced this issue Mar 19, 2021
62076: sql: add interleaved_indexes/interleaved_tables into crdb_internal r=ajwerner a=fqazi

Previous, when we added the crdb_internal.interleaved table that
showed all interleaved indexes, however it was impossible tell
which table the primary key of the parent table was interleaved.
This was inadequate because users could not tell what the parent
table was. To address this, this patch removes
crdb_internal.interleaved and converts it into two virtual tables
interlaved_indexes/interleaved_tables where the one is for
non-primary key indexes and the second is for tables interleaved
on the primary key.

Release note (sql change): Renamed crdb_internal.interleaved to
crdb_internal.interlaved_indexes and added crdb_internal.interlaved_table
for viewing interleaved tables on the primary key.

62167: sql: incorrect behaviour setting NOT NULL on column and dropping it r=ajwerner a=fqazi

Fixes: #47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.

Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
@craig craig bot closed this as completed in a9aa92c Mar 19, 2021
fqazi added a commit to fqazi/cockroach that referenced this issue Mar 19, 2021
Fixes: cockroachdb#47719

Previously, if a column was mutated and then dropped in a single
transaction we would still try to validate the dropped column.
Additionally, in general if a drop operation occurred in a
different transaction we ran the danger of doing something
similar. From a behaviour perspective this was bad, since
it can lead to unexpected behaviour for users. To address, this
patch introduces logic to scan later mutations to check if
a drop column occurs. If one does occur then it will skip any
operations made irrelevant by the drop. Additionally, for a
correctness perspective it will wait for the drop to complete
before returning back.

Release note (bug fix): A constraint like not null or check on
a column made irrelevant by a drop in a later concurrent transaction
would lead to incorrect errors / behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants