Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make pg_catalog OIDs stable and refer to the descriptor IDs directly #33697

Merged
merged 1 commit into from
Feb 8, 2019

Conversation

hueypark
Copy link
Contributor

@hueypark hueypark commented Jan 13, 2019

Fixes #32940
Fixes #32963
Fixes #34710
Enables #32964

This patch changes the definition of vtables to assign a unique table
ID to every virtual table. Moreover, it also extends vtable
descriptors to assign proper column IDs to virtual table columns. This
aims to support the logical planning code, the table and plan caches,
and simplify+fix the introspection tables. Incidentally it also makes
it possible to use COMMENT ON on virtual tables and their columns too.

The table IDs are picked descending from MaxUint32 although this may
be refined in future PRs to accommodate the numbering of virtual
schema descriptors.

Incidentally pg_catalog.pg_attribute is fixed to properly use the
column ID in column attnum, so that attnum remains stable across
column drops.

Release note (sql change): Virtual tables in pg_catalog,
information_schema etc now support COMMENT ON like regular tables.

@hueypark hueypark requested review from a team January 13, 2019 11:47
@hueypark hueypark requested a review from a team as a code owner January 13, 2019 11:47
@hueypark hueypark requested review from a team January 13, 2019 11:47
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again!
Thank you very much for your change, and the code as usual looks very good.

I'd like to come back to two points.

  1. the numbering approach. You have chosen 1, 19, 12 bits for the 3 parts. This raises several questions:

    • how did you choose these 3 values. Did you perform experiments? Did you conduct reasoning? In any case, the motivation for this choice must be explained in the commit message and as a comment close to where this is defined in the code.

    • why do you allocate separate areas in the OID for table IDs and column IDs. If the first bit is set, then I'd expect all remaining 30 bits to become a column ID, and conversely if the first bit is not set, then all remaining 30 bits become a table ID. Why do you need both table and column ID side-by-side?

  2. the release note: this must explain what problem this is solving. You should explain in the commit message what was the problem before this PR, and what is the new situation is afterwards, (how this is different) and why it is good. For example: "previous to this patch, CockroachDB had problem X. This was inconvenient because it would cause users to experience Y and Z. This patch addresses problem X by doing ABC. Now users cannot experience Y any more because of AB, and Z because of C."

Reviewed 15 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Jan 29, 2019
It turns out that many virtual tables (e.g. in `pg_catalog` schema)
have instances for each catalog (and the context differs within each
catalog). Unfortunately, these instances all share the same virtual
table ID. This violates our assumptions and prevents us from
invalidating cached plans properly.

This change fixes this by also putting the database ID in the StableID
for virtual tables.

Note that the situation is in fact even worse: all virtual tables have
the *same* ID. This is being addressed separately (cockroachdb#33697, cockroachdb#32963).
Fortunately this doesn't currently cause problems in practice because
virtual tables have static names and can't be involved in FKs.

Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Jan 29, 2019
It turns out that many virtual tables (e.g. in `pg_catalog` schema)
have instances for each catalog (and the context differs within each
catalog). Unfortunately, these instances all share the same virtual
table ID. This violates our assumptions and prevents us from
invalidating cached plans properly.

This change fixes this by also putting the database ID in the StableID
for virtual tables.

Note that the situation is in fact even worse: all virtual tables have
the *same* ID. This is being addressed separately (cockroachdb#33697, cockroachdb#32963).
Fortunately this doesn't currently cause problems in practice because
virtual tables have static names and can't be involved in FKs.

Release note: None
@hueypark
Copy link
Contributor Author

hueypark commented Feb 3, 2019

@knz
Thank you for your kind review.

I would like to hear advice before improving PR. I have to allocate separate areas in the OID for table IDs and column IDs. It is because the column ID in the cockroachdb is not unique.

  1. Solution 1. Make cockroachdb column ID unique.
    • Pros: The column ID can be easily changed to an OID like a table ID, and the overall OID processing is simplified.
    • Cons: Since the previous column ID determination method are different, separate work for supporting the lower version is needed. Maybe it will be available in 3.x versions.
  2. Solution 2. Assign an appropriate bit for ID like my PR.
    • Pros: Do not need to worry about backward compatibility.
    • Cons: The range of IDs we can use is dramatically reduced.

Which way do you think is more appropriate

@knz
Copy link
Contributor

knz commented Feb 4, 2019

I have to allocate separate areas in the OID for table IDs and column IDs. It is because the column ID in the cockroachdb is not unique.

I understand the column IDs in CockroachDB are not unique. They are not unique in PostgreSQL either.

Whether that matters is another question -- why do you think it matters? Is there a pg_catalog table or elsewhere where all the columns are listed, and their column ID is a primary key or unique column for the table?

@hueypark
Copy link
Contributor Author

hueypark commented Feb 6, 2019

Thank you for your advice. Now I agree that the column ID need not be unique.

I have added a new commit, so please check back.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this looks better.

Two final remarks:

  1. since you now have ensured that all tables have distinct IDs, even virtual tables, it's possible to simplify the h.IndexOid method, see my comment below.

  2. Now, I have double-checked and it appears that postgres does indeed allocate globally-unique OIDs for columns (not column IDs) in very few place, like pg_attrdef. For these uses (and these uses only) I encourage you to add a hash function that combines the table ID with the column ID. See my recommendation below.

Reviewed 14 of 14 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @hueypark)


pkg/sql/pg_catalog.go, line 334 at r2 (raw file):

					defSrc := tree.NewDString(*column.DefaultExpr)
					return addRow(
						columnOid(column.ID),            // oid

here use h.ColumnID(table.ID, column.ID)


pkg/sql/pg_catalog.go, line 2429 at r2 (raw file):

func (h oidHasher) writeDB(db *sqlbase.DatabaseDescriptor) {
	h.writeUInt32(uint32(db.ID))
	h.writeStr(db.Name)

I think the use of the database name can be removed here.


pkg/sql/pg_catalog.go, line 2438 at r2 (raw file):

func (h oidHasher) writeTable(table *sqlbase.TableDescriptor) {
	h.writeUInt32(uint32(table.ID))
	h.writeStr(table.Name)

The table name can be removed here, because now all the table IDs are unique.


pkg/sql/pg_catalog.go, line 2471 at r2 (raw file):

	h.writeTypeTag(indexTypeTag)
	h.writeDB(db)
	h.writeSchema(scName)

you can remove the database and schema from here because now all table IDs are unique.


pkg/sql/pg_catalog.go, line 2505 at r2 (raw file):

}

func (h oidHasher) ColumnOid(

I recommend re-introducing this, with only the two arguments TableDescriptor+ColumnDescriptor.


pkg/sql/pg_catalog.go, line 2574 at r2 (raw file):

}

func columnOid(columnID sqlbase.ColumnID) *tree.DOid {

prefer h.ColumnOid instead.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good! thank you!

Reviewed 10 of 10 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

…ctly

This patch changes the definition of vtables to assign a unique table
ID to every virtual table. Moreover, it also extends vtable
descriptors to assign proper column IDs to virtual table columns. This
aims to support the logical planning code, the table and plan caches,
and simplify+fix the introspection tables. Incidentally it also makes
it possible to use COMMENT ON on virtual tables and their columns too.

The table IDs are picked descending from MaxUint32 although this may
be refined in future PRs to accommodate the numbering of virtual
schema descriptors.

Incidentally `pg_catalog.pg_attribute` is fixed to properly use the
column ID in column `attnum`, so that `attnum` remains stable across
column drops.

Release note (sql change): Virtual tables in `pg_catalog`,
`information_schema` etc now support `COMMENT ON` like regular tables.
@knz
Copy link
Contributor

knz commented Feb 8, 2019

bors r+

@knz
Copy link
Contributor

knz commented Feb 8, 2019

bors r-

@craig
Copy link
Contributor

craig bot commented Feb 8, 2019

Canceled

@knz
Copy link
Contributor

knz commented Feb 8, 2019

bors r+

craig bot pushed a commit that referenced this pull request Feb 8, 2019
33697: sql: make pg_catalog OIDs stable and refer to the descriptor IDs directly r=knz a=hueypark

Fixes #32940
Fixes #32963
Fixes #34710
Enables #32964

This patch changes the definition of vtables to assign a unique table
ID to every virtual table. Moreover, it also extends vtable
descriptors to assign proper column IDs to virtual table columns. This
aims to support the logical planning code, the table and plan caches,
and simplify+fix the introspection tables. Incidentally it also makes
it possible to use COMMENT ON on virtual tables and their columns too.

The table IDs are picked descending from MaxUint32 although this may
be refined in future PRs to accommodate the numbering of virtual
schema descriptors.

Incidentally `pg_catalog.pg_attribute` is fixed to properly use the
column ID in column `attnum`, so that `attnum` remains stable across
column drops.

Release note (sql change): Virtual tables in `pg_catalog`,
`information_schema` etc now support `COMMENT ON` like regular tables.

Co-authored-by: Jaewan Park <jaewan.huey.park@gmail.com>
@hueypark
Copy link
Contributor Author

hueypark commented Feb 8, 2019

@knz
Thank you for your careful and detailed review.

@knz
Copy link
Contributor

knz commented Feb 8, 2019

@andy-kimball @RaduBerinde you're going to love this :)

@knz
Copy link
Contributor

knz commented Feb 8, 2019

@hueypark we are very grateful for this change.

Even though you were primarily interested in ensuring compatibility with COMMENT ON, this change is very influential for many projects inside CockroachDB. With this change you have unlocked progress in multiple areas at once. This is not only very good work; it has larger-scale value for the future of CockroachDB too.

knz added a commit to knz/cockroach that referenced this pull request Feb 8, 2019
This uses column IDs directly for the `attnum` column, so that the
value remains stable across column drops.

This is a sub-set of the changes in cockroachdb#33697.

Required by ORMs, requested for by the TypeORM dev looking at crdb
compat in TypeORM.

Release note (bug fix): the value of the `attnum` column in
`pg_catalog.pg_attribute` now remains stable across column drops.
@craig
Copy link
Contributor

craig bot commented Feb 8, 2019

Build succeeded

@craig craig bot merged commit b1580f2 into cockroachdb:master Feb 8, 2019
@hueypark
Copy link
Contributor Author

hueypark commented Feb 8, 2019

@knz Thank you for your praise. It is a great motivation for me.

@RaduBerinde
Copy link
Member

Thanks @hueypark, this is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants