-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MotherDuck multi-catalog configs #4001
base: main
Are you sure you want to change the base?
Conversation
78cb8cf
to
8ddd5dd
Compare
sqlmesh/core/config/connection.py
Outdated
if not ( | ||
'database with name "memory" already exists' in str(e) | ||
and path_options == ":memory:" | ||
) and not ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the ATTACH
commands for the individual MotherDuck databases included in catalogs
will raise an error since they have been automatically attached by the catch all md:
in the connection string.
We could either:
- Ignore the errors (this way)
- Skip
ATTACH
ing the MD catalogs and assume they exist - Skip ATTACHing the catalogs but verify they are already attached and available
@@ -377,8 +376,7 @@ def to_sql(self, alias: str) -> str: | |||
f" AS {alias}" if not (self.type == "motherduck" or self.path.startswith("md:")) else "" | |||
) | |||
options_sql = f" ({', '.join(options)})" if options else "" | |||
token_sql = "?motherduck_token=" + self.token if self.token else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing because it doesn't actually work
cursor.execute(query) | ||
except BinderException as e: | ||
# If a user tries to create a catalog pointing at `:memory:` and with the name `memory` | ||
# then we don't want to raise since this happens by default. They are just doing this to | ||
# set it as the default catalog. | ||
# If a user tried to attach a MotherDuck database/share which has already by attached via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If a user tried to attach a MotherDuck database/share which has already by attached via | |
# If a user tried to attach a MotherDuck database/share which has already been attached via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not an error? Why would anyone specify the same MD catalog more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way the implementation for MotherDuckConnectionConfig works right now, the db adapter uses md:
as the connection string for duckdb, implicitly adding all the databases/shares available to the token account. if you then issue an ATTACH
command to a specific db md:some_db
, duckdb will emit an error.
per my comment (#4001) above, we could either skip the ATTACH trusting that the db actually exists on the account, or we could check that it's there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This makes sense, thank you
sqlmesh/core/config/connection.py
Outdated
connection_str = f"md:{self.database or ''}" | ||
connection_str = "md:" | ||
if self.database: | ||
# Attach MD database in single mode to block accessing other databases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we block accessing other databases here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if the motherduck connection is created with the database
argument instead of catalogs
, that implies we are only using a single MotherDuck catalog/db and there shouldn't be any reason to allow attaching another MD database (see docs). Do you think that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is there any reason to be this limiting? Alternatively, I might suggest to ALWAYS connect with attach_model=single
in which case users need to manually ATTACH catalogs and so we no longer need to check that error message. Or am I thinking about this wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i guess I was understanding this functionality wrong. this doesn't block you from attaching other databases but rather forces you to invoke ATTACH 'md:some_other_db'
if you end up wanting to access another catalog.
Alternatively, I might suggest to ALWAYS connect with attach_model=single in which case users need to manually ATTACH catalogs and so we no longer need to check that error message.
attach_mode=single
is only available in the connection string passed when launching duckdb - when the connection is figured with database
, the user should specify a catalog name, whereas with the catalogs:
option we invoke duckdb with the string 'md:'
so we don't have a catalog name to pass to the attach_mode=single
option.
We could possibly:
- arbitrarily pick the first MD catalog under
catalogs
and pass it to the connection string - or alternatively, launch duckdb with a blank connection string (i.e. no MD connection until we
ATTACH
the listed catalogs). maybe this makes the most sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies about the delay.
or alternatively, launch duckdb with a blank connection string
This makes sense to me
b1a71b7
to
ec48875
Compare
ec48875
to
dbbe1bc
Compare
Creating a MotherDuck connection config with the
catalogs
argument (multiple attached databases / data sources) rather than specifyingdatabase
(single db) was failing because the MotherDuck connection was never authenticated.The reason is because the
motherduck_token
has to be sent in the initial connection string when instantiating the connection factory (or otherwise made available via env var) - TIL thatATTACH md:{database_name}?motherduck_token={token_string}
is actually not valid, so the MD token can't be passed in on a per-catalog basis. In any case the same token (=MD account) should be shared throughout the gateway config.