Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MotherDuck multi-catalog configs #4001

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

naoyak
Copy link
Contributor

@naoyak naoyak commented Mar 15, 2025

Creating a MotherDuck connection config with the catalogs argument (multiple attached databases / data sources) rather than specifying database (single db) was failing because the MotherDuck connection was never authenticated.

The reason is because the motherduck_token has to be sent in the initial connection string when instantiating the connection factory (or otherwise made available via env var) - TIL that ATTACH md:{database_name}?motherduck_token={token_string} is actually not valid, so the MD token can't be passed in on a per-catalog basis. In any case the same token (=MD account) should be shared throughout the gateway config.

@naoyak naoyak force-pushed the fix-duckdb-multi branch from 78cb8cf to 8ddd5dd Compare March 15, 2025 07:39
if not (
'database with name "memory" already exists' in str(e)
and path_options == ":memory:"
) and not (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the ATTACH commands for the individual MotherDuck databases included in catalogs will raise an error since they have been automatically attached by the catch all md: in the connection string.

We could either:

  1. Ignore the errors (this way)
  2. Skip ATTACHing the MD catalogs and assume they exist
  3. Skip ATTACHing the catalogs but verify they are already attached and available

@@ -377,8 +376,7 @@ def to_sql(self, alias: str) -> str:
f" AS {alias}" if not (self.type == "motherduck" or self.path.startswith("md:")) else ""
)
options_sql = f" ({', '.join(options)})" if options else ""
token_sql = "?motherduck_token=" + self.token if self.token else ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing because it doesn't actually work

cursor.execute(query)
except BinderException as e:
# If a user tries to create a catalog pointing at `:memory:` and with the name `memory`
# then we don't want to raise since this happens by default. They are just doing this to
# set it as the default catalog.
# If a user tried to attach a MotherDuck database/share which has already by attached via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If a user tried to attach a MotherDuck database/share which has already by attached via
# If a user tried to attach a MotherDuck database/share which has already been attached via

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not an error? Why would anyone specify the same MD catalog more than once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way the implementation for MotherDuckConnectionConfig works right now, the db adapter uses md: as the connection string for duckdb, implicitly adding all the databases/shares available to the token account. if you then issue an ATTACH command to a specific db md:some_db, duckdb will emit an error.

per my comment (#4001) above, we could either skip the ATTACH trusting that the db actually exists on the account, or we could check that it's there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This makes sense, thank you

connection_str = f"md:{self.database or ''}"
connection_str = "md:"
if self.database:
# Attach MD database in single mode to block accessing other databases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we block accessing other databases here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the motherduck connection is created with the database argument instead of catalogs, that implies we are only using a single MotherDuck catalog/db and there shouldn't be any reason to allow attaching another MD database (see docs). Do you think that makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is there any reason to be this limiting? Alternatively, I might suggest to ALWAYS connect with attach_model=single in which case users need to manually ATTACH catalogs and so we no longer need to check that error message. Or am I thinking about this wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i guess I was understanding this functionality wrong. this doesn't block you from attaching other databases but rather forces you to invoke ATTACH 'md:some_other_db' if you end up wanting to access another catalog.

Alternatively, I might suggest to ALWAYS connect with attach_model=single in which case users need to manually ATTACH catalogs and so we no longer need to check that error message.

attach_mode=single is only available in the connection string passed when launching duckdb - when the connection is figured with database, the user should specify a catalog name, whereas with the catalogs: option we invoke duckdb with the string 'md:' so we don't have a catalog name to pass to the attach_mode=single option.

We could possibly:

  • arbitrarily pick the first MD catalog under catalogs and pass it to the connection string
  • or alternatively, launch duckdb with a blank connection string (i.e. no MD connection until we ATTACH the listed catalogs). maybe this makes the most sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies about the delay.

or alternatively, launch duckdb with a blank connection string

This makes sense to me

@naoyak naoyak force-pushed the fix-duckdb-multi branch 3 times, most recently from b1a71b7 to ec48875 Compare March 18, 2025 22:11
@naoyak naoyak force-pushed the fix-duckdb-multi branch from ec48875 to dbbe1bc Compare March 19, 2025 05:25
@naoyak naoyak requested a review from izeigerman March 19, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants