Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Format] Add arbitrarily nested (metadata) schemas #320

Open
lidavidm opened this issue Jan 6, 2023 · 5 comments
Open

[Format] Add arbitrarily nested (metadata) schemas #320

lidavidm opened this issue Jan 6, 2023 · 5 comments

Comments

@lidavidm
Copy link
Member

lidavidm commented Jan 6, 2023

ADBC's model of database metadata is limited to a catalog-schema-table hierarchy; some systems allow unlimited nesting. (See #46.)

Proposal:

  • Nested schemas are allowed in the result of ConnectionGetObjects, but will be flattened in the ADBC representation. (This is somewhat unavoidable since you can’t model arbitrarily recursively nested data in Arrow.)
  • Nested schemas are flattened by joining the names with a backend-specific delimiter.
  • Add a new key for ConnectionGetInfo to get the delimiter value.

For example: a table like postgres1.dbname.schemaname.table (which has 4 layers of hierarchy rather than 3) would be represented as: catalog = postgres1, db_schema = dbname.schemaname, table = table.

This proposal applies across languages. See #317 for how to manage backwards compatibility.

@lidavidm
Copy link
Member Author

lidavidm commented May 18, 2023

There's some ambiguity here; what should we do if a schema name (for whatever reason) contains the delimiter? Normally the driver would be returning 'unescaped' names, but now there's nothing we can do here.

A different solution might be to make the name columns list<utf8>. But this complicates the common case (unfortunately) and would be a hard breaking change (even if not on the API level, no current client would be able to make use of the new schema).

It would be good to combine this with #621 for a future 'GetObjects2'-style function. (And possibly, break up the nested schema.)

@lidavidm
Copy link
Member Author

It also significantly complicates the filters in the API; again, do you supply the escaped name? Or all functions have to take a list, which is a bit painful

@lidavidm
Copy link
Member Author

And hmm, what do JDBC/ODBC do here? JDBC also assumes a fixed catalog/schema/table hierarchy. So what happens for Dremio, etc.?

Dremio doesn't even implement getDbSchemas (IIRC) despite having designed the spec, so not sure if we can use it as a real life example. What does the Trino JDBC driver do?

@WillAyd
Copy link
Contributor

WillAyd commented Jun 12, 2023

I don't know that I've personally worked with a database that allows for arbitrarily nested schemas but would be curious to know how client tools handle this via ODBC/JDBC today. I'm guessing that a nested schema would be represented as schema_parent.schema_child in most if not all cases, so I think we could fit that same pattern into the GetObjects design we have today without changing much.

You would lose a "clean" understanding of the schema hierarchy, but AFAIK that is not a SQL standard anyway

@lidavidm
Copy link
Member Author

Yup, I'm just curious about a few things like

  • Is the separator always .?
  • Do these systems allow escaping names that use the separator? If so, how do you even write the name anymore? A schema written in SQL as "schema.1" would be represented in code as just the string schema.1 (i.e. you don't work with/store escaped names, but unescaped names), but then what about "schema.1"."subschema"?

etc.

And I wonder if these systems even expose this info in their metadata in the first place, given the main use case appears to be pulling in data from external systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants