-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Backfill existing rows when new column is added with default value #4415
Comments
This is expected behavior for now @d-uspenskiy and is being tracked under the umbrella of online schema changes: #4192 |
With D10666 (relaxing conflicts on ALTER), I found a bug with default not null: session A | session B
CREATE TABLE t (i int);
\c
BEGIN;
SELECT 1;
| ALTER TABLE t ADD j int default 44 NOT NULL;
INSERT INTO t VALUES (7);
COMMIT;
SELECT * FROM t;
A possible explanation is that this bug was previously masked by catalog version mismatch. An other explanation is that our transaction model is broken (insert after alter should fail). |
Another way to look at this bug: #9970 |
For the record, SQL Server and Sybase ASE don't fill default values with this statement: alter table t add v int default 5; But they do with this one: alter table t add v int not null default 5; |
A workaround for this is: alter table t add v int not null default 5;
update t set v = default; |
…DD COLUMN ... DEFAULT Summary: Presently, in YB, we disallow this operation for volatile default values: ``` yugabyte=# ALTER TABLE test ADD COLUMN y float DEFAULT random(); ERROR: Rewriting of YB table is not yet implemented ``` Meanwhile, for non-volatile values, we do not backfill existing rows. The new column’s value for the existing rows is null. This departs from PG’s semantics and can also lead to a constraint violation if the new column has a NOT NULL constraint. Note: For new rows, the default value is inserted correctly. ``` yugabyte=# CREATE TABLE test (t int); CREATE TABLE yugabyte=# INSERT INTO test VALUES (1); INSERT 0 1 yugabyte=# ALTER TABLE test ADD COLUMN y int DEFAULT 2 NOT NULL; ALTER TABLE yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here. This is also a constraint violation. (1 row) yugabyte=# INSERT INTO test(t) VALUES (2); INSERT 0 1 yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here 2 | 2 -- Newly inserted rows correctly use the default value. (2 rows) ``` This diff fixes this issue by introducing a new field (missing_val) which will be stored in the column's metadata (ColumnSchema and for packed rows in ColumnPackingData as well). We do **not** backfill the existing rows. The default value will be filled in on-the-fly if the column value is missing for a row. Implementation details: High-level design: - Evaluate the non-volatile default expression in the PG layer. - Augment the existing Alter Table Add Column flow to take an optional missing value, and pass it down to DocDB. - Store the missing value in the newly added column’s schema (`ColumnSchema`). - When doc_reader reads a row (using the latest schema) and encounters a missing column for the row, it will look at the column schema to see if there is a missing default value for the column. If there is, it will fill in the missing default value (instead of null as it currently does). Regular storage: - For this feature, we need a way to distinguish whether the column entry is missing, or if a null was explicitly inserted by the user. - Presently, null entries are stored as kTombstone records. These tombstone records are compacted away after the retention history period has exceeded. - Therefore, for columns that have a missing value, we will store null entries as kNullLow. - Note: It does not matter what we store the column nulls as. All of kNullLow, kNullHigh, kTombstone are interpreted as null by docdb. - Although we can fill in missing values on a full compaction, it is not required for correctness, and can be handled later as an optimization. Therefore, we will NOT backfill the rows on compaction in this case. We will rely on the stored missing value in ColumnSchema instead. Packed storage: - Explicitly inserted nulls are stored as columns with 0 length in packed storage format. This behavior will not change. - On compaction, we write the packed rows in accordance with the latest schema packing. - Presently, for missing values we write an explicit null. This behavior will be altered to write the missing value instead (if any). - Therefore, packed rows will be backfilled with the missing value on compaction. Note: This feature is under the `Persisted` autoflag, as we are adding a new field to the `ColumnSchema` which is used in backups. The feature will only be turned on after upgrade is complete. Once enabled, it is not safe to downgrade/rollback. Jira: DB-1463 Test Plan: - Imported PG test `yb_pg_fast_default` - Tests in `yb_alter_table`: test different default value types and expressions (UDTs, functions etc.), test explicit null values for the new column, test `SET DEFAULT`, test to verify that we don't set `attmissingval` and `atthasmissing` in YB, test default values for partitioned tables - `YBAddColumnDefaultBackupTest.TestYSQLDefaultMissingValues` -- verify that missing default values are backed up and restored - `YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlDeleteColumnWithMissingDefault` -- verify that missing default values are restored after PITR - `PgDdlAtomicitySanityTest.DropColumnWithMissingDefaultTest` -- verify that we are able to roll back `DROP COLUMN` on a column with a missing default value - `PgAddColumnDefaultTest.AddColumnDefaultConcurrency ` -- verify that concurrently inserted rows use the missing default value, and that the missing default values are read even after compaction. - `PgAddColumnDefaultTest.AddColumnDefaultCompactionAfterUpdate` -- verify that compaction after updates on columns with missing default values works correctly. - `PgAddColumnDefaultTest.AddColumnDefaultCopy` -- verify that `COPY FROM` command works correctly on a table with columns that have missing default values. - `CDCYsqlAddColumnBeforeImageTest.TestAddColumnBeforeImage/0` -- verify that the `BeforeImage` in CDC uses missing default value Reviewers: sergei, timur, dsrinivasan, abharadwaj Reviewed By: sergei, dsrinivasan Subscribers: hsunder, ycdcxcluster, ybase, yql, sergei, rthallam, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D25297
…perations Summary: NOTE: This backport diverges from the original commit. It does not include a `NOTICE` for `ALTER TYPE` since the operation is not supported in 2.18. It also adds a `NOTICE` for `ADD COLUMN ... DEFAULT` to indicate that existing rows are not backfilled with the default value. This was not included in the original master commit/2.20 backport as the issue (#4415) was resolved in 2.20. Original commit's summary: Table rewrite operations (ALTER primary key, ALTER TYPE) currently do not acquire a table lock and are therefore unsafe (concurrent writes may be dropped). This diff adds a `NOTICE` to explicitly indicate the same when executing these commands.' The `NOTICE` can be suppressed by setting the `ysql_suppress_unsafe_alter_notice` tserver gflag to true. Jira: DB-8160 Original commit: 0baeed2 / D29430 Test Plan: Jenkins: java only Reviewers: jason Reviewed By: jason Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31136
Jira Link: DB-1463
expected (vanilla postgresql) result:
The text was updated successfully, but these errors were encountered: