-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make pg_catalog OIDs stable and refer to the descriptor IDs directly #33697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again!
Thank you very much for your change, and the code as usual looks very good.
I'd like to come back to two points.
-
the numbering approach. You have chosen 1, 19, 12 bits for the 3 parts. This raises several questions:
-
how did you choose these 3 values. Did you perform experiments? Did you conduct reasoning? In any case, the motivation for this choice must be explained in the commit message and as a comment close to where this is defined in the code.
-
why do you allocate separate areas in the OID for table IDs and column IDs. If the first bit is set, then I'd expect all remaining 30 bits to become a column ID, and conversely if the first bit is not set, then all remaining 30 bits become a table ID. Why do you need both table and column ID side-by-side?
-
-
the release note: this must explain what problem this is solving. You should explain in the commit message what was the problem before this PR, and what is the new situation is afterwards, (how this is different) and why it is good. For example: "previous to this patch, CockroachDB had problem X. This was inconvenient because it would cause users to experience Y and Z. This patch addresses problem X by doing ABC. Now users cannot experience Y any more because of AB, and Z because of C."
Reviewed 15 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
It turns out that many virtual tables (e.g. in `pg_catalog` schema) have instances for each catalog (and the context differs within each catalog). Unfortunately, these instances all share the same virtual table ID. This violates our assumptions and prevents us from invalidating cached plans properly. This change fixes this by also putting the database ID in the StableID for virtual tables. Note that the situation is in fact even worse: all virtual tables have the *same* ID. This is being addressed separately (cockroachdb#33697, cockroachdb#32963). Fortunately this doesn't currently cause problems in practice because virtual tables have static names and can't be involved in FKs. Release note: None
It turns out that many virtual tables (e.g. in `pg_catalog` schema) have instances for each catalog (and the context differs within each catalog). Unfortunately, these instances all share the same virtual table ID. This violates our assumptions and prevents us from invalidating cached plans properly. This change fixes this by also putting the database ID in the StableID for virtual tables. Note that the situation is in fact even worse: all virtual tables have the *same* ID. This is being addressed separately (cockroachdb#33697, cockroachdb#32963). Fortunately this doesn't currently cause problems in practice because virtual tables have static names and can't be involved in FKs. Release note: None
@knz I would like to hear advice before improving PR. I have to allocate separate areas in the OID for table IDs and column IDs. It is because the column ID in the cockroachdb is not unique.
Which way do you think is more appropriate |
I understand the column IDs in CockroachDB are not unique. They are not unique in PostgreSQL either. Whether that matters is another question -- why do you think it matters? Is there a |
Thank you for your advice. Now I agree that the column ID need not be unique. I have added a new commit, so please check back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this looks better.
Two final remarks:
-
since you now have ensured that all tables have distinct IDs, even virtual tables, it's possible to simplify the
h.IndexOid
method, see my comment below. -
Now, I have double-checked and it appears that postgres does indeed allocate globally-unique OIDs for columns (not column IDs) in very few place, like
pg_attrdef
. For these uses (and these uses only) I encourage you to add a hash function that combines the table ID with the column ID. See my recommendation below.
Reviewed 14 of 14 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @hueypark)
pkg/sql/pg_catalog.go, line 334 at r2 (raw file):
defSrc := tree.NewDString(*column.DefaultExpr) return addRow( columnOid(column.ID), // oid
here use h.ColumnID(table.ID, column.ID)
pkg/sql/pg_catalog.go, line 2429 at r2 (raw file):
func (h oidHasher) writeDB(db *sqlbase.DatabaseDescriptor) { h.writeUInt32(uint32(db.ID)) h.writeStr(db.Name)
I think the use of the database name can be removed here.
pkg/sql/pg_catalog.go, line 2438 at r2 (raw file):
func (h oidHasher) writeTable(table *sqlbase.TableDescriptor) { h.writeUInt32(uint32(table.ID)) h.writeStr(table.Name)
The table name can be removed here, because now all the table IDs are unique.
pkg/sql/pg_catalog.go, line 2471 at r2 (raw file):
h.writeTypeTag(indexTypeTag) h.writeDB(db) h.writeSchema(scName)
you can remove the database and schema from here because now all table IDs are unique.
pkg/sql/pg_catalog.go, line 2505 at r2 (raw file):
} func (h oidHasher) ColumnOid(
I recommend re-introducing this, with only the two arguments TableDescriptor+ColumnDescriptor.
pkg/sql/pg_catalog.go, line 2574 at r2 (raw file):
} func columnOid(columnID sqlbase.ColumnID) *tree.DOid {
prefer h.ColumnOid
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good! thank you!
Reviewed 10 of 10 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained
…ctly This patch changes the definition of vtables to assign a unique table ID to every virtual table. Moreover, it also extends vtable descriptors to assign proper column IDs to virtual table columns. This aims to support the logical planning code, the table and plan caches, and simplify+fix the introspection tables. Incidentally it also makes it possible to use COMMENT ON on virtual tables and their columns too. The table IDs are picked descending from MaxUint32 although this may be refined in future PRs to accommodate the numbering of virtual schema descriptors. Incidentally `pg_catalog.pg_attribute` is fixed to properly use the column ID in column `attnum`, so that `attnum` remains stable across column drops. Release note (sql change): Virtual tables in `pg_catalog`, `information_schema` etc now support `COMMENT ON` like regular tables.
bors r+ |
bors r- |
Canceled |
bors r+ |
33697: sql: make pg_catalog OIDs stable and refer to the descriptor IDs directly r=knz a=hueypark Fixes #32940 Fixes #32963 Fixes #34710 Enables #32964 This patch changes the definition of vtables to assign a unique table ID to every virtual table. Moreover, it also extends vtable descriptors to assign proper column IDs to virtual table columns. This aims to support the logical planning code, the table and plan caches, and simplify+fix the introspection tables. Incidentally it also makes it possible to use COMMENT ON on virtual tables and their columns too. The table IDs are picked descending from MaxUint32 although this may be refined in future PRs to accommodate the numbering of virtual schema descriptors. Incidentally `pg_catalog.pg_attribute` is fixed to properly use the column ID in column `attnum`, so that `attnum` remains stable across column drops. Release note (sql change): Virtual tables in `pg_catalog`, `information_schema` etc now support `COMMENT ON` like regular tables. Co-authored-by: Jaewan Park <jaewan.huey.park@gmail.com>
@knz |
@andy-kimball @RaduBerinde you're going to love this :) |
@hueypark we are very grateful for this change. Even though you were primarily interested in ensuring compatibility with COMMENT ON, this change is very influential for many projects inside CockroachDB. With this change you have unlocked progress in multiple areas at once. This is not only very good work; it has larger-scale value for the future of CockroachDB too. |
This uses column IDs directly for the `attnum` column, so that the value remains stable across column drops. This is a sub-set of the changes in cockroachdb#33697. Required by ORMs, requested for by the TypeORM dev looking at crdb compat in TypeORM. Release note (bug fix): the value of the `attnum` column in `pg_catalog.pg_attribute` now remains stable across column drops.
Build succeeded |
@knz Thank you for your praise. It is a great motivation for me. |
Thanks @hueypark, this is great! |
Fixes #32940
Fixes #32963
Fixes #34710
Enables #32964
This patch changes the definition of vtables to assign a unique table
ID to every virtual table. Moreover, it also extends vtable
descriptors to assign proper column IDs to virtual table columns. This
aims to support the logical planning code, the table and plan caches,
and simplify+fix the introspection tables. Incidentally it also makes
it possible to use COMMENT ON on virtual tables and their columns too.
The table IDs are picked descending from MaxUint32 although this may
be refined in future PRs to accommodate the numbering of virtual
schema descriptors.
Incidentally
pg_catalog.pg_attribute
is fixed to properly use thecolumn ID in column
attnum
, so thatattnum
remains stable acrosscolumn drops.
Release note (sql change): Virtual tables in
pg_catalog
,information_schema
etc now supportCOMMENT ON
like regular tables.