Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix!: Improve handling of the --where option in table_diff #2975

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

erindru
Copy link
Collaborator

@erindru erindru commented Jul 31, 2024

Prior to this change, running sqlmesh table_diff with something like --where "foo = 'bar'" would throw an error like:

duckdb.duckdb.BinderException: Binder Error: Ambiguous reference to column name "foo" (use: "t.foo" or "s.foo")

This was because the where clause was injected verbatim without automatically qualifying the columns. The user had to know that table_diff used a subselect internally and supply --where "s.foo = 'bar' and t.foo = 'bar'" in order to get this to work.

This PR refactors the generated table_diff query to use CTE's to build the source / target / stats datasets so the --where clause can still be injected verbatim but it's now scoped to each dataset so there are no more ambiguous column references

@@ -272,7 +272,22 @@ def _column_expr(name: str, table: str) -> exp.Expression:
def name(e: exp.Expression) -> str:
return e.args["alias"].sql(identify=True)

query = (
source_query = (
exp.select(*(exp.column(c) for c in source_schema.keys()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just pass c, if it’s a string

you don’t need keys(), dict is iterable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I still need to use exp.column() due to this effect:

>>> exp.select("country code").sql()
'SELECT country AS code'

>>> exp.select(exp.column("country code")).sql()
'SELECT "country code"'

however i've removed .keys()

.where(self.where)
)
target_query = (
exp.select(*(exp.column(t) for t in target_schema.keys()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

join_alias="t",
)
.where(self.where)
.from_(exp.alias_(source_identifier, "s"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn’t seem right since it’s an identifier, is there no error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> exp.alias_(exp.to_identifier("__source"), "s").sql()
'__source AS s'

It got the correct result. What should I have done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exp.tables(…).as_()

.from_(exp.alias_(source_identifier, "s"))
.join(exp.alias_(target_identifier, "t"), on=self.on, join_type="FULL")
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don’t like manually creating with, can you use a builder api or use subqueries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the builder API documented anywhere? There didnt seem to be a .with(..)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think I found it. There appears to be no top-level with_() like there is for select() but doing exp.Select().with_(...) gets me there.

Is that better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea

@erindru erindru merged commit 1cdda84 into main Aug 1, 2024
21 checks passed
@erindru erindru deleted the erin/table-diff-where branch August 1, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants