Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite special comment parser #711

Merged
merged 12 commits into from
Jan 8, 2020
Merged

Conversation

kennytm
Copy link
Contributor

@kennytm kennytm commented Jan 6, 2020

What problem does this PR solve?

Rewrite the special comment parser (/*! ... */ and /*+ ... */) to better match MySQL's behavior.

This is the first step to solve pingcap/tidb#14030 (comment).

What is changed and how it works?

  • Removed specialCommentScanner. Instead:

    1. When encountering /*! or /*!12345 or /*T! or /*T!012345, drop this substring and continue parsing as if outside a comment. Use a flag to indicate we should skip the stray */.
    2. When encountering /*+, read the entire comment as a string token. This token will be parsed into []*ast.TableOptimizerHint using the new hint parser on-demand. All errors coming from that parsers are demoted to warnings.
  • The /*+ ... */ is only parsed after keywords SELECT, INSERT, REPLACE, UPDATE, DELETE, CREATE and PARTITION, otherwise it is treated as an ordinary comment. Therefore, some SQL previously failing like SELECT 1 FROM /*+ hash_agg() */ t1 now becomes successful and ignoring the hint.

    • Unlike MySQL, however, statements like SELECT 1 FOR UPDATE /*+ max_execution_time(60) */ would still produce an error. We could fix it later?
  • And we now have two parsers, the main parser.y, and the optimizer-hint specific hintparser.y

    1. All hint-related rules from deleted from parser.y. This also deletes the related TiDB-specific keywords.
    2. goyacc is modified to support generating two parsers. Also make goyacc to exit with error on conflict, so we don't need to run the parser twice on the same .y file.
    3. In the new hint parser, all optimizer hints from MySQL 5.7+ are recognized for completeness (though all of them are ignored with warnings).
  • The structure ast.TableOptimizerHint is not yet changed, so we are still API-compatible.

Check List

Tests

  • Unit test

Code changes

Side effects

  • Possible performance regression
  • Breaking backward compatibility

Related changes

  • Need to be included in the release note

Instead, simply strip off '/*!' and the pairing '*/'.

Temporarily disabled handling of /*+ ... */
This new token has type `hintComment`. The actual hint will be parsed
lazily.
change the "DO NOT EDIT" line to fit the Go standard
deleted all optimizer hint rules from parser.go

refactor the Makefile to support building both parsers (also deleted some
outdated fixup of *parser.go)
@kennytm kennytm added type/enhancement New feature or request status/PTAL labels Jan 6, 2020
@kennytm kennytm requested a review from a team January 6, 2020 20:19
@ghost ghost requested review from tangenta and removed request for a team January 6, 2020 20:19
@codecov
Copy link

codecov bot commented Jan 7, 2020

Codecov Report

Merging #711 into master will decrease coverage by 0.29%.
The diff coverage is 82.08%.

@@            Coverage Diff            @@
##           master     #711     +/-   ##
=========================================
- Coverage   77.98%   77.68%   -0.3%     
=========================================
  Files          38       40      +2     
  Lines       14020    14219    +199     
=========================================
+ Hits        10933    11046    +113     
- Misses       2435     2513     +78     
- Partials      652      660      +8

Copy link
Contributor

@tangenta tangenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

hintparserimpl.go Show resolved Hide resolved
lexer.go Outdated Show resolved Hide resolved
@kennytm
Copy link
Contributor Author

kennytm commented Jan 7, 2020

PTAL @tangenta @tiancaiamao cc @lonng

The integration test is failing because the warnings produced by invalid hint is changed.

@tiancaiamao
Copy link
Collaborator

LGTM
Good job! @kennytm

@tiancaiamao
Copy link
Collaborator

CI failed here https://github.com/pingcap/tidb/blob/master/expression/integration_test.go#L4407

@kennytm kennytm merged commit 7de2c99 into master Jan 8, 2020
@kennytm kennytm deleted the kennytm/revamp-special-comments branch January 8, 2020 06:55
tangenta pushed a commit to tangenta/parser that referenced this pull request Apr 14, 2020
* lexer: added a function to scan version digits

* lexer: removed special comment scanner

Instead, simply strip off '/*!' and the pairing '*/'.

Temporarily disabled handling of /*+ ... */

* lexer: support collecting the entire /*+ ... */ as a single token

This new token has type `hintComment`. The actual hint will be parsed
lazily.

* lexer,yy_parser: move lastErrorAsWarn() from Parser into Scanner

* goyacc: do not allow conflict, support changing parser type name

change the "DO NOT EDIT" line to fit the Go standard

* parser,hintparser: created a new parser just for parsing optimizer hints

deleted all optimizer hint rules from parser.go

refactor the Makefile to support building both parsers (also deleted some
outdated fixup of *parser.go)

* lexer: fix comment parser

* codecov: don't wait for integration test before showing the coverage

* lexer: fix comment parsing again

* hintparser: TiDB still expects `HASH_JOIN(@qb1 foo@qb2)` to be valid :(

* parser,hintparser: provide the true line/column offset in the warnings

cache the hintparser in the main parser to avoid repeated allocation

* lexer: delete unused sqlOffsetInComment()
tangenta added a commit that referenced this pull request Apr 20, 2020
tangenta added a commit to tangenta/parser that referenced this pull request Apr 28, 2020
tiancaiamao pushed a commit to tiancaiamao/parser that referenced this pull request Apr 27, 2021
* lexer: added a function to scan version digits

* lexer: removed special comment scanner

Instead, simply strip off '/*!' and the pairing '*/'.

Temporarily disabled handling of /*+ ... */

* lexer: support collecting the entire /*+ ... */ as a single token

This new token has type `hintComment`. The actual hint will be parsed
lazily.

* lexer,yy_parser: move lastErrorAsWarn() from Parser into Scanner

* goyacc: do not allow conflict, support changing parser type name

change the "DO NOT EDIT" line to fit the Go standard

* parser,hintparser: created a new parser just for parsing optimizer hints

deleted all optimizer hint rules from parser.go

refactor the Makefile to support building both parsers (also deleted some
outdated fixup of *parser.go)

* lexer: fix comment parser

* codecov: don't wait for integration test before showing the coverage

* lexer: fix comment parsing again

* hintparser: TiDB still expects `HASH_JOIN(@qb1 foo@qb2)` to be valid :(

* parser,hintparser: provide the true line/column offset in the warnings

cache the hintparser in the main parser to avoid repeated allocation

* lexer: delete unused sqlOffsetInComment()
lyonzhi pushed a commit to lyonzhi/parser that referenced this pull request Apr 25, 2024
* lexer: added a function to scan version digits

* lexer: removed special comment scanner

Instead, simply strip off '/*!' and the pairing '*/'.

Temporarily disabled handling of /*+ ... */

* lexer: support collecting the entire /*+ ... */ as a single token

This new token has type `hintComment`. The actual hint will be parsed
lazily.

* lexer,yy_parser: move lastErrorAsWarn() from Parser into Scanner

* goyacc: do not allow conflict, support changing parser type name

change the "DO NOT EDIT" line to fit the Go standard

* parser,hintparser: created a new parser just for parsing optimizer hints

deleted all optimizer hint rules from parser.go

refactor the Makefile to support building both parsers (also deleted some
outdated fixup of *parser.go)

* lexer: fix comment parser

* codecov: don't wait for integration test before showing the coverage

* lexer: fix comment parsing again

* hintparser: TiDB still expects `HASH_JOIN(@qb1 foo@qb2)` to be valid :(

* parser,hintparser: provide the true line/column offset in the warnings

cache the hintparser in the main parser to avoid repeated allocation

* lexer: delete unused sqlOffsetInComment()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 LGT2 type/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants