Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the parsing result for the special double number #1621

Merged
merged 14 commits into from
Dec 28, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Dec 27, 2024

close #1619
close #1622

Changed

Breaking change

  • Change the arguments of Parser::parse_identifier.

Comment on lines +12581 to +12584
let support_struct = all_dialects_where(|d| d.supports_struct_literal());
let stmt = support_struct
.verified_only_select("SELECT STRUCT(STRUCT(1 AS a, NULL AS b) AS c, NULL AS d).c.a");
let expected = SelectItem::UnnamedExpr(Expr::CompoundFieldAccess {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The access chain for the struct literal is represented by CompoundFieldAccess currently. See #1551 for the details.
cc @ayman-sigma

@goldmedal
Copy link
Contributor Author

Tested with DataFusion by goldmedal/datafusion#2

src/tokenizer.rs Outdated
.map(|c| c.is_ascii_digit())
.unwrap_or(false)
{
if self.dialect.support_unquoted_hyphenated_identifiers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goldmedal @ayman-sigma
I'm wondering if it would make sense to move this special case handling from the tokenizer and into the parser. Since this is a BigQuery specific feature and only applies in a specific context, I feel like it would be more complicated to pull off within the tokenizer without affecting other use cases/dialects. Thinking in the parser we have more context on when this syntax is valid and a lot of that work was done in #1109
(I'm imagining since the tokenizer would be left to parse decimals as usual, we won't have the consideration in #1622 ?)

Essentially I'm wondering if we can extend the work to cover this scenario doublle number scenario. Using the example from #1598 it would essentially mean that e.g. the following token stream

[Word("foo"), Minus, Number("123."), Word("bar")]

gets combined into

ObjectName(vec![Word("foo-123"), Word("bar")])

would something like that be feasible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially I'm wondering if we can extend the work to cover this scenario doublle number scenario. Using the example from #1598 it would essentially mean that e.g. the following token stream

[Word("foo"), Minus, Number("123."), Word("bar")]

gets combined into

ObjectName(vec![Word("foo-123"), Word("bar")])

would something like that be feasible?

It's a good point. "Keeping tokenizer parse decimals as usual" makes sense to me.
I'll try it. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it in b1d02f2
The double-value tests are enabled for BigQuery.

/// similar table clause. Currently, this is used only to support unquoted hyphenated identifiers in
// this context on BigQuery.
pub fn parse_identifier(&mut self, in_table_clause: bool) -> Result<Ident, ParserError> {
pub fn parse_identifier(&mut self) -> Result<Ident, ParserError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a breaking change for the downstream project using this public method.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for fixing this @goldmedal!
cc @alamb

@iffyio iffyio merged commit d0d4153 into apache:main Dec 28, 2024
8 checks passed
@goldmedal goldmedal deleted the fix/1619-fix-number-value branch December 28, 2024 13:18
@goldmedal
Copy link
Contributor Author

Thanks @iffyio 👍

@alamb
Copy link
Contributor

alamb commented Dec 28, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants