-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first pass at better handling of special duckdb syntax #6
Conversation
Pull Request Test Coverage Report for Build 6388458559
💛 - Coveralls |
and self.last_keyword_normalized in TABLE_ADJUSTMENT_KEYWORDS | ||
and self.previous_token.normalized not in ["AS", "WITH"] | ||
and self.normalized not in ["AS", "SELECT", "IF", "SET", "WITH"] | ||
and not self.last_keyword_pivot_operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that I had encountered before in my playing around in the ast-service and which is on display in this PR is that the table inference logic provided by sql_metadata
is definitely not bulletproof. I had encountered some test cases where it would tack on some extra entries in the extracted tables, and the pivot
syntax appears to be particularly prone to this. I will comment inline elsewhere but there are some telling test assertions where we just check that a table is in parser.tables
instead of asserting the full contents of parser.table
because it contains incorrect entries.
I think that is mostly down to this Token.is_potential_table_name
function being pretty broad. However, of all the functions in here it is the one I am most terrified of making changes to for fear of breaking some obscure but legitimate cases where an identifier is a table reference. The most I dared change was adding this scoping which I will elaborate on inline below. This feels like the right tradeoff - we care much more about false negatives than false positives here, as false positives will generally be harmless as they won't match the names of variables in the project, and worst case would just add some unnecessary reactive edges.
source = """ | ||
SELECT * | ||
FROM monthly_sales | ||
PIVOT(SUM(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR')) | ||
AS p | ||
ORDER BY EMPID | ||
""" | ||
parser = Parser(source) | ||
assert ["monthly_sales"] == parser.tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only test case that is affected by the changes I made in token.py
and in fact it is not correcting regressed behavior due to the change in handling of the pivot
keyword. I checked and it was parsing the same before those changes. Without the last_keyword_pivot_operator
check this would also pull out SUM
and amount
as tables.
parser = Parser("select * from pivot join other using (id)") | ||
assert "pivot" in parser.tables and "other" in parser.tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unfortunately getting confused by the new handling of pivot
and as a result interprets join
as a potential table name as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine, hopefully a pretty rare case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
parser = Parser("select * from pivot join other using (id)") | ||
assert "pivot" in parser.tables and "other" in parser.tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine, hopefully a pretty rare case!
SUP-1069 this PR pulls in hex-inc/sqlparse#5 and maps these new keywords to the
SELECT
query type since they are all variations of same.