Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misuse of alias()? #101

Closed
LeoniePhiline opened this issue Feb 18, 2023 · 1 comment
Closed

Misuse of alias()? #101

LeoniePhiline opened this issue Feb 18, 2023 · 1 comment

Comments

@LeoniePhiline
Copy link
Contributor

LeoniePhiline commented Feb 18, 2023

I noticed that while creating syntax queries for hive statements.

I found that some uses of alias() (see https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl) appear questionable to me, and I wanted to check with you.

1. Doubts: (identifier) for builtin functions

In both the following builtin functions, the function name keyword is replaced in the AST with identifier. I just wanted to check if that's really intentional?

Isn't an (identifier) rather identifying a application-domain-specific item, such as a table or field - rather than a builtin SQL feature keyword?

At first sight, not only (identifier) is wrong in this case, but the entire field('name', ...) should just be replaced by $.keyword_cast and $.keyword_count.

However, field('name', ...) gives compatibility with the generic (invocation) (

tree-sitter-sql/grammar.js

Lines 1378 to 1381 in 393e0d3

invocation: $ => seq(
field('name', $.identifier),
paren_list(field('parameter', $._expression)),
),
), where a field is necessary to extract the function name -- so it's probably only the (identifier) alias that is to be removed.

✅ Done so in e4e43ba and eac9da2

1.1. CAST

tree-sitter-sql/grammar.js

Lines 1341 to 1350 in 393e0d3

cast: $ => seq(
field('name', alias($.keyword_cast, $.identifier)),
'(',
seq(
field('parameter', $._expression),
$.keyword_as,
$._type,
),
')',
),

✅ Fixed in e4e43ba fix: Remove identifier alias from cast builtin function - PR #102

1.2. COUNT

tree-sitter-sql/grammar.js

Lines 1357 to 1362 in 393e0d3

count: $ => seq(
field('name', alias($.keyword_count, $.identifier)),
'(',
$._aggregate_expression,
')',
),

✅ Fixed in eac9da2 fix: Remove identifier alias from count builtin function - PR #102

2. Misuse? String literals as (identifier)

tree-sitter-sql/grammar.js

Lines 980 to 988 in 393e0d3

storage_location: $ => prec.right(
seq(
$.keyword_location,
field('path', alias($._literal_string, $.identifier)),
optional(
seq(
$.keyword_cached,
$.keyword_in,
field('pool', alias($._literal_string, $.identifier)),

Shouldn't these rather be alias($._literal_string, $.literal)?

✅ Fixed in 4216ecf fix: Replace identifier with literal for hive storage location and cache pool - PR #102

';', '"', '\n' and '/path/data' in the Hive CREATE TABLE test end up as identifier tree item, while they clearly are literals:

grafik

See also LOCATION under https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html#parameters

2.1 Update: ';', '"' and '\n' do not show up in the AST at all

================================================================================
Create hive table
================================================================================
CREATE EXTERNAL TABLE tab (col int)
PARTITIONED BY (col int)
SORT BY (col)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' ESCAPED BY '"'
LINES TERMINATED BY '\n'
STORED AS PARQUET
LOCATION '/path/data'
CACHED IN 'pool1' WITH REPLICATION = 2
--------------------------------------------------------------------------------
(program
(statement
(create_table
(keyword_create)
(keyword_external)
(keyword_table)
(table_reference
name: (identifier))
(column_definitions
(column_definition
name: (identifier)
type: (int (keyword_int))))
(table_partition
(keyword_partitioned)
(keyword_by)
(column_definitions
(column_definition
name: (identifier)
type: (int (keyword_int)))))
(table_sort
(keyword_sort)
(keyword_by)
(identifier))
(row_format
(keyword_row)
(keyword_format)
(keyword_delimited)
(keyword_fields)
(keyword_terminated)
(keyword_by)
(keyword_escaped)
(keyword_by)
(keyword_lines)
(keyword_terminated)
(keyword_by))
(stored_as
(keyword_stored)
(keyword_as)
(keyword_parquet))
(storage_location
(keyword_location)
path: (identifier)
(keyword_cached)
(keyword_in)
pool: (identifier)
(keyword_with)
(keyword_replication)
value: (literal)))))

✅ Fixed in d2f0f66 _ feat: Add AST-fields for row terminated/escaped-by and lines terminated-by-_ PR #102

Field names from: https://spark.apache.org/docs/latest/sql-ref-syntax-hive-format.html

@DerekStride
Copy link
Owner

However, field('name', ...) gives compatibility with the generic (invocation) (

You guessed correctlty, I initially setup count to be compatible with invocation but I think you're right that we shouldn't alias it to an identifier and instead expose the keyword named node 👍

The rest of the changes look good to me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants