-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: set quote_identifiers in qualify, add normalize flag in schema #1701
Conversation
3bd0d6b
to
a0bce4f
Compare
Yeah, I leveraged the one you wrote: https://github.com/tobymao/sqlglot/pull/1701/files#diff-739b55fbe772c7434981d6a9f1f6d0dfa499468cb5251b26c588589fb66108cdR713-R716 |
what's the case for feature 2? when did you encounter a need for it? why did annotate types fail? |
>>> from sqlglot.optimizer import optimize
>>> optimize("select * from tbl", dialect="snowflake", schema={"tbl": {'"a"': "int"}}).sql()
> sqlglot/optimizer/annotate_types.py(297)annotate()
-> col.type = self.schema.get_column_type(source, col)
(Pdb) col
(COLUMN this:
(IDENTIFIER this: a, quoted: False), table:
(IDENTIFIER this: TBL, quoted: False)) Notice how the column -> column_type = table_schema.get(normalized_column_name)
(Pdb) normalized_column_name
'A'
(Pdb) self.mapping
{'TBL': {'a': 'int'}}
Discussed in Slack. |
this is problematic. sqlmesh doesn’t quote things which means annotate types would fail. could we fix annotate types to work without quotes? maybe the problem is the double normalize? |
So the issue is not actually related directly to column_type = table_schema.get(normalized_column_name) # Fails because we have `a` instead of `A` in the schema
if isinstance(column_type, exp.DataType):
return column_type
elif isinstance(column_type, str):
return self._to_data_type(column_type.upper(), dialect=dialect)
raise SchemaError(f"Unknown column type '{column_type}'")
Why can't we add quotes over there as well?
You mean in both |
now that qualify normalizes everything. schema shouldn’t need to renormalize |
Ok sounds interesting, I'll explore this idea a bit more tomorrow and see if we can get rid of the schema normalization logic altogether. |
or else just move identify before annotate types. as a stand-alone step |
So I'm not sure about this one: a user may supply whatever names they want for the schema, and they may not necessarily match the normalized names |
…obymao#1701) * Fix: set quote_identifiers in qualify, add normalize flag in schema * import typing as t * Fixup * PR feedback * Use new quote_identifiers rule before annotate_types * Reset quote_identifiers kwarg to False in optimize * Formatting * Set kwargs instead of positional arguments in qualify * Include quote_identifiers rule in test_canonicalize * Formatting * PR feedback * Remove copy arg from quote_identifiers
I guess I could've split this PR into two separate ones, as it introduces both a fix and a feature, but anyway.
quote_identifiers
flag in thequalify
rule by default, because this case is currently failing:Added a new flag in
MappingSchema
to control whether or not the normalization logic will kick off. This might be useful when e.g. we want to add (unquoted) names that are already normalized, and a 2nd normalization pass would mess them up.Added, improved some type hints and comments.