-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MotherDuck multi-catalog configs #4001
Open
naoyak
wants to merge
13
commits into
TobikoData:main
Choose a base branch
from
naoyak:fix-duckdb-multi
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+66
−32
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
c3bd55b
try to fix duckdb multi attach postgres
naoyak fc857a8
cleanup
naoyak e1a7a82
whitespace
naoyak 61c7e0d
revert
naoyak 49b9099
add MotherDuck token only one time for multi-catalog
naoyak b11a94f
remove MotherDuck token attribute from DuckDBAttachOptions
naoyak 6e20878
use single mode for single MD connection
naoyak 19ad9b6
flexible format for MD single database name
naoyak 12bc409
lint
naoyak 651ae42
rewrite motherduck token masking test
naoyak e6446f6
enforce no md: for single MD database
naoyak 821f6a9
fix test
naoyak 007a6ed
fix MD attach error handlnig
naoyak File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,7 +41,7 @@ | |
# Nullable types are problematic | ||
"clickhouse", | ||
} | ||
MOTHERDUCK_TOKEN_REGEX = re.compile(r"(\?motherduck_token=)(\S*)") | ||
MOTHERDUCK_TOKEN_REGEX = re.compile(r"(\?|\&)(motherduck_token=)(\S*)") | ||
|
||
|
||
class ConnectionConfig(abc.ABC, BaseConfig): | ||
|
@@ -269,16 +269,20 @@ def init(cursor: duckdb.DuckDBPyConnection) -> None: | |
query = f"ATTACH '{path_options}'" | ||
if not path_options.startswith("md:"): | ||
query += f" AS {alias}" | ||
elif self.token: | ||
query += f"?motherduck_token={self.token}" | ||
cursor.execute(query) | ||
except BinderException as e: | ||
# If a user tries to create a catalog pointing at `:memory:` and with the name `memory` | ||
# then we don't want to raise since this happens by default. They are just doing this to | ||
# set it as the default catalog. | ||
if not ( | ||
'database with name "memory" already exists' in str(e) | ||
and path_options == ":memory:" | ||
# If a user tried to attach a MotherDuck database/share which has already by attached via | ||
# `ATTACH 'md:'`, then we don't want to raise since this is expected. | ||
if ( | ||
not ( | ||
'database with name "memory" already exists' in str(e) | ||
and path_options == ":memory:" | ||
) | ||
and f"""database with name "{path_options.path.replace('md:', '')}" already exists""" | ||
not in str(e) | ||
): | ||
raise e | ||
if i == 0 and not getattr(self, "database", None): | ||
|
@@ -331,7 +335,9 @@ def get_catalog(self) -> t.Optional[str]: | |
return None | ||
|
||
def _mask_motherduck_token(self, string: str) -> str: | ||
return MOTHERDUCK_TOKEN_REGEX.sub(lambda m: f"{m.group(1)}{'*' * len(m.group(2))}", string) | ||
return MOTHERDUCK_TOKEN_REGEX.sub( | ||
lambda m: f"{m.group(1)}{m.group(2)}{'*' * len(m.group(3))}", string | ||
) | ||
|
||
|
||
class MotherDuckConnectionConfig(BaseDuckDBConnectionConfig): | ||
|
@@ -349,19 +355,19 @@ def _static_connection_kwargs(self) -> t.Dict[str, t.Any]: | |
from sqlmesh import __version__ | ||
|
||
custom_user_agent_config = {"custom_user_agent": f"SQLMesh/{__version__}"} | ||
if not self.database: | ||
return {"config": custom_user_agent_config} | ||
connection_str = f"md:{self.database or ''}" | ||
connection_str = "md:" | ||
if self.database: | ||
# Attach single MD database instead of all databases on the account | ||
connection_str += f"{self.database}?attach_mode=single" | ||
if self.token: | ||
connection_str += f"?motherduck_token={self.token}" | ||
connection_str += f"{'&' if self.database else '?'}motherduck_token={self.token}" | ||
return {"database": connection_str, "config": custom_user_agent_config} | ||
|
||
|
||
class DuckDBAttachOptions(BaseConfig): | ||
type: str | ||
path: str | ||
read_only: bool = False | ||
token: t.Optional[str] = None | ||
|
||
def to_sql(self, alias: str) -> str: | ||
options = [] | ||
|
@@ -371,14 +377,18 @@ def to_sql(self, alias: str) -> str: | |
options.append(f"TYPE {self.type.upper()}") | ||
if self.read_only: | ||
options.append("READ_ONLY") | ||
options_sql = f" ({', '.join(options)})" if options else "" | ||
alias_sql = "" | ||
# TODO: Add support for Postgres schema. Currently adding it blocks access to the information_schema | ||
alias_sql = ( | ||
if self.type == "motherduck": | ||
# MotherDuck does not support aliasing | ||
f" AS {alias}" if not (self.type == "motherduck" or self.path.startswith("md:")) else "" | ||
) | ||
options_sql = f" ({', '.join(options)})" if options else "" | ||
token_sql = "?motherduck_token=" + self.token if self.token else "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removing because it doesn't actually work |
||
return f"ATTACH '{self.path}{token_sql}'{alias_sql}{options_sql}" | ||
if (md_db := self.path.replace("md:", "")) != alias.replace('"', ""): | ||
raise ConfigError( | ||
f"MotherDuck does not support assigning an alias different from the database name {md_db}." | ||
) | ||
else: | ||
alias_sql += f" AS {alias}" | ||
return f"ATTACH '{self.path}'{alias_sql}{options_sql}" | ||
|
||
|
||
class DuckDBConnectionConfig(BaseDuckDBConnectionConfig): | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not an error? Why would anyone specify the same MD catalog more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way the implementation for MotherDuckConnectionConfig works right now, the db adapter uses
md:
as the connection string for duckdb, implicitly adding all the databases/shares available to the token account. if you then issue anATTACH
command to a specific dbmd:some_db
, duckdb will emit an error.per my comment (#4001) above, we could either skip the ATTACH trusting that the db actually exists on the account, or we could check that it's there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This makes sense, thank you