-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: proposal to make schemas more pg-compatible #21456
Conversation
Thanks! This is an excellent overview of the problem. I like the strategy you propose for fixing the problem with compatibility: adding I don't like the idea of globally replacing our SQL "database" concept with "catalog". Users aren't getting confused by our use of the word database instead of catalog, and it's already known in the community that database and catalog are used interchangably. See the StackOverflow post I linked in an inline comment. Furthermore, I think this naming change goes beyond what's required for compatibility, which is perhaps a stronger reason that we shouldn't lump it in with the compatibility work. I think the terminology we have right now is completely sufficient and unambiguous. In a world where we did have user schemas, it would still be okay to have So, I'm suggesting that the scope of this RFC be narrowed to exactly what's necessary to achieve virtual-catalog level compatibility with ORMs, tools and apps. Reviewed 1 of 1 files at r1. docs/RFCS/pg_virtual_namespacing.md, line 39 at r1 (raw file):
I think this is actually incorrect. I think in Postgres Database == Catalog, and a Postgres installation is also known as a cluster. See the SO post I linked below. docs/RFCS/pg_virtual_namespacing.md, line 132 at r1 (raw file):
I don't know if we really need to make this change. Database is often synonymous with catalog in the relational world. I think making this change will add to the confusion rather than diminish it. Check out this answer: https://stackoverflow.com/a/17943883/73632 docs/RFCS/pg_virtual_namespacing.md, line 140 at r1 (raw file):
👍 docs/RFCS/pg_virtual_namespacing.md, line 145 at r1 (raw file):
stray tab docs/RFCS/pg_virtual_namespacing.md, line 151 at r1 (raw file):
Again -1 on this change. Let's leave the docs/RFCS/pg_virtual_namespacing.md, line 171 at r1 (raw file):
CockroachDB* docs/RFCS/pg_virtual_namespacing.md, line 174 at r1 (raw file):
s/tabs/spaces/ Comments from Reviewable |
f7ab800
to
e7a10ba
Compare
Thank you Jordan for the quick review.
I have performed the proposed simplifications and fixes. PTAL.
…--
Raphael 'kena' Poss
|
Review status: 0 of 1 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. docs/RFCS/pg_virtual_namespacing.md, line 237 at r2 (raw file):
I think this is necessary. It's our current "fully qualified" format and it's used pretty widely. Removing it will need additional planning. For example, we'd need to go through a release cycle with some sort of metrics/logging of this feature's usage so that users can be confident that their apps won't blow up when upgrading to the version which removes it. I'd also consider this a semver-major change, so the release that removes it would need to be (at least) 3.0. Comments from Reviewable |
An alternative I just figured out would be to also support the db name as schema alias for I do not think that cross-db queries are common (we actually probably don't support them correctly). If we "just" support a single alias next to |
But what if there is no current database? That's the most common case for |
I.e. the user enters |
How so there is no current database? Does that even work? |
Hmm also the "no current db" case is specifically the one where we most likely need this backward compatibility right? In that case we can have a normalization rule conditional on whether
|
It's the default when you run
That's probably true, although I wouldn't be surprised if there are some frameworks that both set a current database and redundantly use the qualified form.
Why wouldn't we support them correctly? In the absence of multiple schemas per database, people may be using multiple databases as a substitute. I don't think this would be common, but I don't think it's so rare that we can break them. |
I haven't fully followed this conversation, but wanted to pipe in here. Apologies if this is already understood. Historically some of our load generators have connected to one database and then accessed another. For example, the query string would connect to the Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. Comments from Reviewable |
Thanks for putting this together @knz!
Postgres has a default catalog called "postgres". Have you given any thought to the creation of a default writable (i.e. not Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. docs/RFCS/pg_virtual_namespacing.md, line 44 at r2 (raw file):
nit: random double spaces all over the place. docs/RFCS/pg_virtual_namespacing.md, line 102 at r2 (raw file):
For reference, docs/RFCS/pg_virtual_namespacing.md, line 135 at r2 (raw file):
To be explicit about this we should probably add support for docs/RFCS/pg_virtual_namespacing.md, line 186 at r2 (raw file):
On that note, are you supportive of making "catalog" synonymous with "database" globally and adding these aliases? I've personally gone back and forth. On one hand, it improves our PG compatibility. On the other, it introduces more room for users to get confused and leads to more users needing to ask questions like that stack overflow question in the first place. docs/RFCS/pg_virtual_namespacing.md, line 195 at r2 (raw file):
👍 Comments from Reviewable |
I like your idea, @nvanbenschoten. The story for Postgres is slightly more complicated because Regardless, I would be in support of a change that removes the "no current db" state, and changes What are the backward compatibility complications you hint at? Are you talking about cross-database references? Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. docs/RFCS/pg_virtual_namespacing.md, line 186 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think that adding these aliases will definitely lead to more confusion. That being said, @knz, I would suggest we make the discussion of exposing the catalog==database concept as syntax to a separate work item, distinct from whether we add the compatibility-focused features. That way if there's some disagreement about this aliasing it doesn't have to hold up the compatibility wins we get by changing our virtual tables to behave more like Postgres's. Comments from Reviewable |
Hi all, thanks for the many suggestions and feedback. I think the creation of a default database is a very good way forward. Adopted. The prototype can now successfully boot a cockroachdb node and answer queries using the new rules, but the following are still not supported:
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks pending. docs/RFCS/pg_virtual_namespacing.md, line 39 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 132 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Clarified. docs/RFCS/pg_virtual_namespacing.md, line 171 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 174 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 102 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 135 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 186 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Clarified (below). docs/RFCS/pg_virtual_namespacing.md, line 237 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Updated in light of experiments. PTAL. Comments from Reviewable |
I find the description of catalog/database, schema and table a bit confusing. The current approach in cockroach is what I would describe as a 2-level directory hierarchy. We have the root level which contains databases and the second level (databases) contain tables (e.g. I'm wondering if we can allow tables to exist (for name lookup purposes) both directly within a catalog/database and also within a schema. I think this can be accomplished with a tweak to the name lookup rules. Similar to Postgres, I might very well be misunderstanding the problem here. I need to give this RFC another thorough read. Review status: 0 of 1 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful. docs/RFCS/pg_virtual_namespacing.md, line 128 at r3 (raw file):
This isn't accurate. The Database is not prefixed on the keys. Keys are only prefixed by their Table ID. docs/RFCS/pg_virtual_namespacing.md, line 133 at r3 (raw file):
Any mention of KV above is a bit confused. The KV structure of keys knows nothing about Databases, Schemas or Tables. In SQL, keys that are part of a table are prefixed by their Table ID. To find the Database for a key we need to lookup the TableDescriptor and access the Comments from Reviewable |
I will clarify the text to answer your general questions.
As to the specific idea to make table names exist both at the level of database and schema. This is very ad hoc and it reminds me of windows/dos refusal to create a file named "con" because someone somewhere decided that a common namespace for fundamentally different things was future-proof.
Anyway there are immediate downsides.
- if we ever support a distinction between schema and db, it creates a syntactic ambiguity for GRANT and possibly zone configs.
- the normalization rules become ever slightly more complex.
Also my general reaction here is "this thinking is premature". What about 1) you let me investigate how to make cockroachdb compatible with what pg does first, 2) validate that client apps that need this support work well, and 3) *only then* make experiments with non-standard behavior while checking that apps that started working in step 2 don't break as a result.
Right now if I give mind share to your proposal we might not even get to step 2.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Isn't that what this whole RFC is about?
Can you clarify which support you are referring to? I'm anxious about backwards compatibility with our current semantics. |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/pg_virtual_namespacing.md, line 233 at r3 (raw file):
The SQL to KV mapping could be extended without too much difficulty to support an arbitrary number of levels. The key bits are in docs/RFCS/pg_virtual_namespacing.md, line 251 at r3 (raw file):
Treating The downside is additional name lookup complexity, but I think that is fairly contained. Another downside is mild confusion with tables named The upside to this approach is no need to migrate existing views, no need to force sessions to always have a database, and we're backward compatible with existing apps. Comments from Reviewable |
In the light of my code experiments and further thinking stimulated by Peter, I have more or less entirely rewritten this RFC. A detailed section about expected behavior is now also provided, so that there is no ambiguity about what the end-result should comply to. Special focus is given to the information_schema tables and the contract they have with the name resolution rules. These details are necessary to understand why arbitrarily deep name hierarchies are just not a good idea. Nevertheless, Peter your other solution from earlier is interesting. I have copied it into the RFC in the section "Detailed solution", together with your additional comments from a later review. I think it needs further consideration. It does not have my preference yet though, because of the pollution it implies in the logical schema name position of FQNs. That's too much painting ourselves into a corner for my taste, even considering the other benefits. Review status: 0 of 1 files reviewed at latest revision, 11 unresolved discussions, some commit checks pending. docs/RFCS/pg_virtual_namespacing.md, line 27 at r3 (raw file):
|
Review status: 0 of 1 files reviewed at latest revision, 11 unresolved discussions, some commit checks pending. docs/RFCS/pg_virtual_namespacing.md, line 687 at r4 (raw file):
@petermattis Also, in other words for every name in the SQL query entered as Comments from Reviewable |
9abe85b
to
bc67a0e
Compare
All right I have integrated Peter's proposal. We need two different algorithms depending on whether we're accessing an existing object or creating a new object. I hadn't properly spelled that out in the previous draft and it's now detailed separately. Review status: 0 of 1 files reviewed at latest revision, 13 unresolved discussions. docs/RFCS/pg_virtual_namespacing.md, line 128 at r3 (raw file): Previously, petermattis (Peter Mattis) wrote…
Updated. docs/RFCS/pg_virtual_namespacing.md, line 233 at r3 (raw file): Previously, petermattis (Peter Mattis) wrote…
Yes thanks. docs/RFCS/pg_virtual_namespacing.md, line 251 at r3 (raw file): Previously, petermattis (Peter Mattis) wrote…
Noted. Changed accordingly. docs/RFCS/pg_virtual_namespacing.md, line 177 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Removed. docs/RFCS/pg_virtual_namespacing.md, line 201 at r4 (raw file): Previously, petermattis (Peter Mattis) wrote…
Yep, understood. docs/RFCS/pg_virtual_namespacing.md, line 247 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Not needed with Peter's suggestion. Removed. docs/RFCS/pg_virtual_namespacing.md, line 539 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I agree! But we really implement many of those already. Which are those that are missing? docs/RFCS/pg_virtual_namespacing.md, line 687 at r4 (raw file): Previously, petermattis (Peter Mattis) wrote…
As explained offline: it's not performance, it's the story about how to explain this. The real simplicity of your proposal is that it is an additional rule and only covers cases that would otherwise cause an error in postgres. I had missed that. Changed the RFC to reflect. Comments from Reviewable |
@nvanbenschoten @justinj it would be swell if you could check if the rules I propose here are a strict superset of what pg does (i.e. it resolves everything pg would resolve in the same way, and resolves more things that would otherwise fail in pg). Review status: 0 of 1 files reviewed at latest revision, 14 unresolved discussions. docs/RFCS/pg_virtual_namespacing.md, line 1 at r4 (raw file):
Comments from Reviewable |
with a question below about the Review status: 0 of 1 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful. docs/RFCS/pg_virtual_namespacing.md, line 198 at r5 (raw file):
docs/RFCS/pg_virtual_namespacing.md, line 239 at r5 (raw file):
I think we might be missing a case here. What happens if Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful. docs/RFCS/pg_virtual_namespacing.md, line 198 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/pg_virtual_namespacing.md, line 239 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
I had a note about this and I think I mistakenly deleted it. Adding it back. tldr: the Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful. Comments from Reviewable |
There's an oversight in the RFC, table patterns may need to normalize using the current db and search path too. Looking into it. |
Will update the RFC with the learnings from #22371/#22753:
Also I'll outline the implementation. |
Release note: none
Note that I chose against supporting database prefixes in |
RFC updated to reflect the implementation, Merging |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file):
There is a level of rigor to this updated section that is at the same time awe inspiring and disconcerting. The detail is counter to the goal of concise high-level descriptions in RFCs. Is this level of detail captured in the comments in the code? If yes then there is a significant duplication here, and if no, this detail should be captured in comments. Perhaps I should be viewing this as detailed code documentation for future engineers, though my response would be that this detail would be better captured in a tech-note or detailed comment. I guess my high-level question is who is the expected reader? docs/RFCS/20180219_pg_virtual_namespacing.md, line 190 at r7 (raw file):
nit: Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file):
yes, insofar that the comments describe at a high level what the code does...
... but not so much that the comments repeat what the code does. This RFC states what the code should do in human language; Go code is no substitute for a clear English-language specification.
Yes I think a tech note is justified. I am going to observe how further bug fixes / performance work will refine this work and produce a tech note with the result.
Mostly 1) people who will later ask "why is this code so complex?" and 2) people who will embark on rewriting the code without bothering to look at the existing implementation. Their reviewer will then have materials to evaluate whether the new code is up to spec. docs/RFCS/20180219_pg_virtual_namespacing.md, line 190 at r7 (raw file): Previously, petermattis (Peter Mattis) wrote…
Thanks #22829. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file):
My complaint wasn't about the duplication of this detail with the Go code, but the duplication of this detail with the comments. I agree the human language description is valuable for understanding, but I feel this belongs in detailed comments, not this RFC. My contention, perhaps mistaken, is that RFCs are read once and then forgotten while code is read many many (many!) times.
Are we actually seeing these concerns happen in practice? Regardless, detailed commentary in the code seems a better defense. The RFC can easily be overlooked when code is rewritten. The comments have to be either ignored or deleted. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file):
I tend to agree that comments could be more extensive. I am open to extending the comments by including some of the details from the RFC into them. I'll see what I can. However it is not mutually exclusive, because...
.. I think your contention is mistaken. Code is indeed read many times, but RFCs provide a time-contextual "storyline" around the changes in the code. For example, one cannot form a clear high-level picture of name resolution in SQL by looking at the 100s separate places where it happens in CockroachDB, nor by trying to aggregate the comments that accompany these 100s of places. The RFCs / tech notes can do that. I do not claim here to favor RFCs over tech notes to provide these rationales. Yet we have already numerous times in the team observed that RFCs are being re-read "after the fact"! For various purposes, including but not limited to: a) dig into history Maybe you could make a point that a well-populated tech note repository could satisfy these various needs better than RFCs. But we're not there yet, and I'm happy that RFCs play this role in the interim instead of not being able to do those things at all. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file): Previously, knz (kena) wrote…
I forgot to mention that if a new hire or a new contributor or a 3rd party comes to us and says "how can I get an intro to this area of the product" having a RFC with a clear section "how to explain this to newcomers" is 1000x better than comments 1) scattered throughout the code 2) that have been written with an already experienced audience in mind. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file):
I exaggerated for effect. Here is less of an exaggeration: code and comments are read at least an order of magnitude more than RFCs.
Certainly, all new contributors (new hires or 3rd parties) should read through the RFCs in areas they are interacting with, yet they need to do so with the understanding that RFCs are out of date almost as soon as they are written. Fighting against that is a losing battle. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/20180219_pg_virtual_namespacing.md, line 157 at r7 (raw file): Previously, petermattis (Peter Mattis) wrote…
So more comments it is then. Comments from Reviewable |
Do not let the size of the RFC suggest this is a lot of work. The challenge with this RFC will be to educate users / modify the docs. The technical changes are relatively minor.