first pass at better handling of special duckdb syntax #6

dylanscott · 2023-10-03T02:52:30Z

SUP-1069 this PR pulls in hex-inc/sqlparse#5 and maps these new keywords to the SELECT query type since they are all variations of same.

github-actions · 2023-10-03T03:50:05Z

Pull Request Test Coverage Report for Build 6388458559

12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 6354706848:	0.0%
Covered Lines:	888
Relevant Lines:	888

💛 - Coveralls

dylanscott · 2023-10-03T02:58:11Z

sql_metadata/token.py

            and self.last_keyword_normalized in TABLE_ADJUSTMENT_KEYWORDS
            and self.previous_token.normalized not in ["AS", "WITH"]
            and self.normalized not in ["AS", "SELECT", "IF", "SET", "WITH"]
+            and not self.last_keyword_pivot_operator


Something that I had encountered before in my playing around in the ast-service and which is on display in this PR is that the table inference logic provided by sql_metadata is definitely not bulletproof. I had encountered some test cases where it would tack on some extra entries in the extracted tables, and the pivot syntax appears to be particularly prone to this. I will comment inline elsewhere but there are some telling test assertions where we just check that a table is in parser.tables instead of asserting the full contents of parser.table because it contains incorrect entries.

I think that is mostly down to this Token.is_potential_table_name function being pretty broad. However, of all the functions in here it is the one I am most terrified of making changes to for fear of breaking some obscure but legitimate cases where an identifier is a table reference. The most I dared change was adding this scoping which I will elaborate on inline below. This feels like the right tradeoff - we care much more about false negatives than false positives here, as false positives will generally be harmless as they won't match the names of variables in the project, and worst case would just add some unnecessary reactive edges.

dylanscott · 2023-10-03T03:51:08Z

test/test_duckdb.py

+    source = """
+SELECT *
+  FROM monthly_sales
+    PIVOT(SUM(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR'))
+      AS p
+  ORDER BY EMPID
+    """
+    parser = Parser(source)
+    assert ["monthly_sales"] == parser.tables


this is the only test case that is affected by the changes I made in token.py and in fact it is not correcting regressed behavior due to the change in handling of the pivot keyword. I checked and it was parsing the same before those changes. Without the last_keyword_pivot_operator check this would also pull out SUM and amount as tables.

dylanscott · 2023-10-03T03:52:11Z

test/test_duckdb.py

+    parser = Parser("select * from pivot join other using (id)")
+    assert "pivot" in parser.tables and "other" in parser.tables


this is unfortunately getting confused by the new handling of pivot and as a result interprets join as a potential table name as well.

Seems fine, hopefully a pretty rare case!

jkillian

Looks good!

jkillian · 2023-10-03T21:33:36Z

test/test_duckdb.py

+    parser = Parser("select * from pivot join other using (id)")
+    assert "pivot" in parser.tables and "other" in parser.tables


Seems fine, hopefully a pretty rare case!

dylanscott added 6 commits October 2, 2023 19:29

pull in sqlparse with duckdb keywords

6100d34

map these statement types to select

291a1f3

try to distinguish pivot operator

61f5dad

tests for duckdb syntax

2721c85

cleanup

8cd9b62

testing for table named pivot

40d5c91

linting

b16b06a

dylanscott commented Oct 3, 2023

View reviewed changes

dylanscott requested a review from jkillian October 3, 2023 03:59

jkillian mentioned this pull request Oct 3, 2023

add keywords for duckdb pivot statements, from-first syntax hex-inc/sqlparse#5

Merged

jkillian approved these changes Oct 3, 2023

View reviewed changes

pull in merged sqlparse version

c1f4df9

dylanscott merged commit dfea863 into master Oct 4, 2023

dylanscott deleted the dscott/duckdb-syntax branch October 4, 2023 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first pass at better handling of special duckdb syntax #6

first pass at better handling of special duckdb syntax #6

dylanscott commented Oct 3, 2023

github-actions bot commented Oct 3, 2023 •

edited

Loading

dylanscott Oct 3, 2023 •

edited

Loading

dylanscott Oct 3, 2023

dylanscott Oct 3, 2023

jkillian Oct 3, 2023

jkillian left a comment

jkillian Oct 3, 2023

		parser = Parser("select * from pivot join other using (id)")
		assert "pivot" in parser.tables and "other" in parser.tables

first pass at better handling of special duckdb syntax #6

first pass at better handling of special duckdb syntax #6

Conversation

dylanscott commented Oct 3, 2023

github-actions bot commented Oct 3, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6388458559

💛 - Coveralls

dylanscott Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

dylanscott Oct 3, 2023

Choose a reason for hiding this comment

dylanscott Oct 3, 2023

Choose a reason for hiding this comment

jkillian Oct 3, 2023

Choose a reason for hiding this comment

jkillian left a comment

Choose a reason for hiding this comment

jkillian Oct 3, 2023

Choose a reason for hiding this comment

github-actions bot commented Oct 3, 2023 •

edited

Loading

dylanscott Oct 3, 2023 •

edited

Loading