-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect case-sensitive json key path extractions #269
Comments
Thanks for the report, @sgoley ! BigQuery is case-sensitive when accessing JSON properties using dot notation, also (mentioned in this discussion). I don't think I'll be solving the BQ issue, since parsing the names as field/property names instead of table/column names is out of scope as long as I'm the sole maintainer. BUT if colons are rare enough to make parsing easy (e.g., they don't appear in other, similar contexts), I would consider respecting case sensitivity here. The hardest and most time-consuming part of this will just be researching the colon operator in all of the major dialects and determining how unique its usage as a JSON property accessor is. I know the colon can also be used to slice arrays in some dialects, but that should be easy to disambiguate from this operator (since it will be directly inside of square brackets). I don't know if there are other uses for the colon in SQL. If you or anyone else could contribute this research, I'd be very grateful. Until then, the workaround is to quote all json properties that are not snake_case:
|
I'm running into this issue in with DBT + Snowflake as well. |
Thanks. Still looking for help researching the various uses for the colon in different dialects |
Oh man, i just got burned by this. took me forever to find the problem. This was for snowflake specifically. I know pyfmt is opinionated without options, but perhaps one of the opinions could be to force quotation marks around the keys e.g. |
Coming over from #568. Completely empathize with the challenges of sole maintenance and respect the call here. If I might suggest a documentation change to save others in my shoes similar pain - I would consider documenting the known cases where sqlfmt will break a project. We missed this change when implementing sqlfmt on a portion of our project, which ended up causing a significant outage for us in production (since case changes lead to silent failures). Ultimately the responsibility lies fully on us for not catching it. But given it appears to be a known concern - I might consider listing it in the Beta warning block (or similar) in the docs. We had (incorrectly in retrospect) assumed that given sqlfmt is now the default implementation in DBT cloud that it was fairly unlikely to break a dbt/snowflake project out of the box. |
@tconbeer , i was looking at this to work on i). after a quick perusal of the code, it appears the casing happens here in would a modification to Is one solution for this to add a dialect-specific check for a
to simplify this work, would it be possible/advised that we break this in to dialect-specific implementation? not meaning a different implementation but slowly adding this for snowflake first then BQ or whatever. my guess is doing snowflake first would solve the problem for the largest amount of users and potentially simplify this implementation. |
I just ran into this issue as well. |
Describe the bug
sqlfmt utility automatically formats json keys which can be case sensitive.
It will make all keys fully lower case and return nulls as values if the .sql keys do not access the actual case sensitive json keys.
To Reproduce
Use sqlfmt on any sql file that includes something like the following key path extractions:
src:salesperson.name
<- all lowercaseSRC:salesperson.name
<- column ref lower caseSRC:Salesperson.Name
<- json keys have first letter capitalsExpected behavior
sqlfmt should be sensitive to adapter specific json extraction notations.
Example: Snowflake "dot" or "bracket" semi-structured notation
Example: Postgres "colon" semi-structured notation
Exception: Bigquery at least appears to have only stringified semi-structured extracts so this adapter should be safe as-is.
The text was updated successfully, but these errors were encountered: