-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual schema docs #8909
Virtual schema docs #8909
Conversation
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/d64db96b58b2ab3fef67cd73bd9db721526127b4/ Edited pages: |
@glennfawcett FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I go into the details of the contents, there are three high-level concerns here:
-
I am surprised that
pg_catalog
does not get documented. We are committed to supportingpg_catalog
, and in fact it gives access to more useful data thaninfo_schema
/crdb_internal
. Moreover, it happens to help more with applications already developed for postgres. What is the plan about that? -
there's been recent development, which I hope @piyush-singh can help clarify: that we do not wish to promote
crdb_internal
as a supported SQL API for use by end-users. This means that documentation, if any (we can discuss if it should be present at all) should disclaim loudly that all this information does not come with any guarantee of stability or preservation across versions. Hence the following questions:- is it desirable to have
crdb_internal
documented here at all? - if yes, what is the best way to properly communicate that this is provided with many strings attached?
- is it desirable to have
-
each of the
crdb_internal
vtables come with a CPU, memory and latency cost associated, which in some cases can impact cluster stability or performance upon use. This must absolutely be called out if we decide to document this virtual schema. (the cost functions are displayed in the table comments)
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing these concerns forward, @knz !
Some quick responses to your points:
-
I am more than happy to add docs for
pg_catalog
to this PR. -
I think product should make the call on whether or not we want to encourage our end users to use
crdb_internal
. If we want to discourage/not encourage its use, then we definitely shouldn't document the schemas contents in detail. -
Okay! If we do choose to document the schema, I can port all of the information from those table comments (how fortuitous) into the document.
Some additional thoughts:
Throughout our docs, we use crdb_internal
tables in examples that require certain information about a cluster that is not exposed by SQL statements or other virtual schema tables. There might have been some new statements introduced over the past few releases that make these crdb_internal
tables less useful.
If we want to not document crdb_internal
, then I can open an issue to remove (or replace, where possible) references to crdb_internal
tables. We can then limit its documentation to brief mentions in the existing Name Resolution page (and possibly the proposed Virtual Schemas overview page).
If we do want to document crdb_internal
, then I can modify the documentation introduced in this PR to have no stability distinctions among tables in crdb_internal
. I can also add a warning to the top of the page calling out the lack of stability guarantees. And also additional documentation (a column in the table of tables, and possibly a short paragraph) concerning the costs associated with using crdb_internal
tables.
I'll wait on the final word from @piyush-singh before moving forward on this.
Reviewable status: complete! 0 of 0 LGTMs obtained
@piyush-singh Ping on whether or not we want to document |
d64db96
to
51e027f
Compare
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/51e027f85f7bce6c045c6c89f0d8bc008cd2351d/ Edited pages: |
51e027f
to
bb64665
Compare
@piyush-singh |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/bb64665a60832d743e2963383b9e85d97e87b4b9/ Edited pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout our docs, we use crdb_internal tables in examples that require certain information about a cluster that is not exposed by SQL statements or other virtual schema tables.
This is a strong signal that something is amiss in our product management. If there was a user need to see some information exposed, the proper "next action" was to ensure we created some SHOW statement or did some research to see if pg_catalog
could be used. Adding documentation on how to use crdb_internal
was not the right approach.
I would recommend some investigation by PM about this. @piyush-singh what do you think?
There might have been some new statements introduced over the past few releases that make these crdb_internal tables less useful.
Yes that might be the case too. But then if you have not been informed of this, that is perhaps another defect in our process as we should have had github/jira tickets to address these missing features and you would have been informed via release notes.
I added a page on pg_catalog. I'm not sure how much more information we want to provide on that schema.
I'd say we want to document everything we support, and highlight in particular those columns which exist in our schema but which we have not yet populated (i.e. they contain NULL values), because they may be sources of incompatibilities with postgres.
We also want to remind users to talk to us if they find they would like to use pg_catalog but some aspect of it is still missing.
Reviewed 1 of 3 files at r2, 8 of 8 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)
v20.1/crdb-internal.md, line 15 at r3 (raw file):
{{site.data.alerts.callout_danger}} We do not recommend using `crdb_internal` tables in production environments for the following reasons: - The contents of `crdb_internal` schema are unstable, and subject to change in new releases of CockroachDB.
to change without prior notice
v20.1/crdb-internal.md, line 16 at r3 (raw file):
We do not recommend using `crdb_internal` tables in production environments for the following reasons: - The contents of `crdb_internal` schema are unstable, and subject to change in new releases of CockroachDB. - There are memory and latency costs associated with each table in `crdb_internal`. Accessing the tables in the schema can impact cluster stability and performance.
This particular point is also generally true of SHOW statements and other features. I would rather see callouts about memory usage / performance for each individual feature that incurs them.
@ericharmeling, what's the current state of this PR? |
TL;DR: It has been on hold, pending further direction/prioritization from product management. I just synced with @vy-ton and now have a clearer direction for this work, as a result of this 21.1 product roadmap item. More detailed response: I originally started working on virtual schema docs in 2019, in response to several requests for Despite the longstanding requests for On the topic of virtual schemas, we definitely do want to document So the plan moving forward is:
|
Thanks for this update, @ericharmeling! |
bb64665
to
19ced20
Compare
Updates to See here for a tracking doc (still working on this doc, so let me know if you want to add anything/change anything before you do): https://docs.google.com/spreadsheets/d/1azfgQJQJriUii1dd8_RwhgHBqhlrwkjao1rISR8YAOc/edit#gid=1321389238 |
There will also be subsequent PRs to update |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/19ced20608541be518ad3cfcd4041d8fe8a0eb1a/ Edited pages: |
19ced20
to
16a6f1e
Compare
Adding you to review |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/16a6f1e7546c817de64d5709ee101949339c8570/ Edited pages: |
Adding both of you for a general review, following our conversation today (3/15/21). |
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/ac8a611e71d1c84ff71795677a001fe152cbd1b7/ Edited pages: |
|
v21.1/pg-catalog.md
Outdated
-----------------------------|-------------- | ||
`pg_aggregate` | `pg_aggregate` | ||
`pg_am` | `pg_am` | ||
`pg_amop` | `pg_amop` (unimplemented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but piggybacking on Vy's comment, I think we should clarify that unimplemented means that the table exists but the rows are not populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @jordanlewis, and @knz)
v21.1/information-schema.md, line 13 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
@RichardJCai Is
information_schema
what powersSHOW
commands?
Based on Richard's answer, I was confused why this table is nested under information_schema
, since I read it as information_schema
tables are what backs the SHOW
commands`.
I think it's worth putting the SHOW
table in its own top-level page (we have individual pages for each SHOW
command but no over) or stick it under the top Virtual schemas
page
Added warning to crdb_internal page
…ailed table to pg_catalog
…of system catalog additions
….1 pg_catalog tables
ac8a611
to
02f99fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR @vy-ton and @RichardJCai !
I think I've addressed all of your comments.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @jordanlewis, @knz, @RichardJCai, and @vy-ton)
_includes/sidebar-data-v20.2.json, line 2422 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
2 suggestions:
- Put this closer to
Name Resolution
as it's the closest relative- We should get Engineering input but I've been referring to this as
System Catalog
. Developers might not understand "virtual".
I like both of those suggestions.
-
Done
-
I updated the nav title, the title of the overview page ("Virtual Schemas" -> "System Catalogs"), the name of the file (
virtual-schemas.md
->system-catalogs.md
), and most of the instances of "virtual schema" to "system catalog" (where it made sense).
v20.1/crdb-internal.md, line 15 at r3 (raw file):
Previously, knz (kena) wrote…
to change without prior notice
Done.
v20.2/information-schema.md, line 9 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
can therefore be relied on to remain stable over time
We're not quite here yet with
information_schema
. I'm looking into this so will have more details soon
I removed this entire paragraph.
v20.2/pg-catalog.md, line 7 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
They correspond to both system catalog and system views, maybe use https://www.postgresql.org/docs/13/catalogs.html as the link
Done.
v20.2/pg-catalog.md, line 13 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
Looking into the
None
tables here, since I expected to be able to query emptypg_catalog
tables
See response below.
v20.2/pg-catalog.md, line 13 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
We should clarify what
None
here means. We have added tables likepg_amop
as empty tables. So they are unimplemented but they do output inSHOW TABLES
and you can query the empty table
This is a list of the pg_catalog
tables in the official 20.2.6 release. We should only document what is in the latest production release, for production-release docs (i.e., anything with the URL prefix https://www.cockroachlabs.com/docs/stable).
I just built cockroach from the latest commit on the release-20.2
branch, and see that all of the new, unimplemented tables are in pg_catalog
. I'll follow up with an update after 20.2.7 is released (which is scheduled for March 29, but still has a few blockers).
The 21.1 beta (scheduled for Monday 3/22) should include all of the new, unimplemented pg_catalog
tables. But IMO, because 21.1 is a testing release, the docs can include features that aren't in an "official release" in any case.
v21.1/information-schema.md, line 13 at r5 (raw file):
Is information_schema what powers SHOW commands?
crdb_internal
is what powers SHOW
commands.
I was confused why this table is nested under information_schema...
Which table?
I think it's worth putting the SHOW table in its own top-level page (we have individual pages for each SHOW command but no over)
SHOW TABLES
is a SQL statement, with its own page. Unless you are talking about an actual virtual table? I don't see a table called "SHOW
". Or are you talking about crdb_internal.create_statements
?
or stick it under the top Virtual schemas page
I'd prefer to keep just the system catalog schemas at the top level of the Virtual Schemas page (now called the "System Catalogs" page)
v21.1/pg-catalog.md, line 13 at r5 (raw file):
Previously, vy-ton (Vy Ton) wrote…
How are you determining the list for the first column? We might be falling into a trap of duplicating the list of Postgres pg_catalog. Instead of just documenting the ones we do implement.
The list for the first column is exactly what is listed on this page (which includes all of the tables in pg_catalog
, as well as a handful of system views): https://www.postgresql.org/docs/13/catalogs.html
The list for the second column is what is returned when I run SHOW TABLES FROM pg_catalog
from a 3/17 build on the `release-21.1 branch.
The "None" values in the second columns indicate that a table/view in the postgres system catalog (https://www.postgresql.org/docs/13/catalogs.html) is missing from CRDB's pg_catalog
schema in that 3/17 21.1 build.
The "(unimplemented)" indicates that a table is empty (I've changed "unimplemented"->"empty"). These are tables that are not in pg_catalog
in 20.2.0-20.2.6, but are in pg_catalog
in 21.1.beta-1 and 20.2.7. I assume that all of the tables recently added are empty. @RichardJCai Please correct me if I am wrong.
v21.1/pg-catalog.md, line 17 at r5 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
LGTM but piggybacking on Vy's comment, I think we should clarify that unimplemented means that the table exists but the rows are not populated.
I changed "unimplemented" to "empty". "unimplemented" suggests that that will at one point be implemented. "empty" is a little clearer (i.e., they have no contents).
Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/02f99fc11e65a173ab1bdebd1bb405ade9932fd7/ Edited pages: |
SGTM! |
TFTR, @ianjevans ! I'm going to go ahead and merge this, so we have the pages live. If we need to make any updates after it's been published, it should be pretty easy to do so. |
Fixes #2957.
Fixes #8706.
Fixes #9569.
Fixes #9578.
Update of #5454.