Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port the SQL parser from PDO #4397

Merged
merged 1 commit into from
Nov 14, 2020
Merged

Port the SQL parser from PDO #4397

merged 1 commit into from
Nov 14, 2020

Conversation

morozov
Copy link
Member

@morozov morozov commented Oct 31, 2020

Q A
Type improvement
BC Break no

Fixes: #4383

Additional Advantages:

Apart from fixing the issue with parsing comments, the new parser has the following advantages:

  1. It is better extensible. It allows adding the support for platform-specific syntaxes (e.g. using a backslash for escaping in string literals, escaping PDO parameter placeholders) without changing the parser design.
  2. It allows to separate the parsing and the transformation logic. Currently, the same parser is reused by the wrapper level connection (conversion of named and array parameters) and the OCI8 driver (conversion of positional parameters). In the future, it can be reused to address Named statement parameters are unnecessarily converted to positional on Oracle #3744 by combining multiple transformation steps on top of just one parsing pass, and only if necessary.

TODO:

  • Only MySQL supports backslash escaping and only in strings, not in identifiers.
  • Test escaping ] in SQL Server identifiers. Looks like SQL Server doesn't know how to handle them itself:
    SELECT QUOTENAME('?]?');
    -- [?]]?]
    SELECT 1 AS [?]]?]
    -- Parameter #1 has not been set.
  • Test missing positional parameter.
  • Test comments in SQL.
  • See if we can get away with not having platform-specific configuration. Somehow PDO gets away with this. Otherwise, the Parser API will have to be declared public as a dependency of the Platform API. No. In fact, it's a bug (#80340).
  • When testing escaping string literals, make sure both the ANSI and MySQL escaping syntaxes are tested evenly.
  • Test the cases w/o escaping in string literals in both the ANSI and MySQL modes.
  • Performance testing (see below).
  • Integration testing on a real app (found QueryBuilder uses 1-based positional parameter keys #4421).
  • Rework the internal API of the wrapper Connection. This @return array{0: …, 1: …, 2: …} looks ugly. This looks like too big of a change to make as part of this one. There's a lot of duplicated logic: logging, binding DBAL types, etc. that needs taking care of.

@morozov morozov linked an issue Oct 31, 2020 that may be closed by this pull request
@morozov morozov force-pushed the issues/4383 branch 6 times, most recently from 7518a0a to 5f07baf Compare November 7, 2020 18:33
@morozov morozov force-pushed the issues/4383 branch 6 times, most recently from f3d2380 to d0f0e8d Compare November 11, 2020 02:30
@morozov
Copy link
Member Author

morozov commented Nov 12, 2020

TL;DR The new parser is ~2.5 times slower than the old one (80μs vs. 30μs per query on synthetic tests).

Testing Methodology (gist):

  1. Take the 29 queries from SQLParserUtilsTest::dataExpandListParameters().
  2. Expand array parameters in each of them
  3. Make sure that the query gets actually processed (no early return).
  4. Repeat 10000 times.
  5. Calculate the average.

Additional Considerations:

  1. The performance penalty is inevitable due to the statefulness of the parser. Unlike the old implementation which does preg_match_all() once per query, this one does multiple preg_match() per query. This is necessary for the correct parsing of different token combinations (e.g. ignore parameters in comments, comments in string literals, etc).
  2. The parser is only used when the query uses named or array parameters. The share of such queries depends on the project.
  3. In the applications that use the relational database, the database is often the performance bottleneck. SQL queries can take 10-100 ms and it's rarely less than that. Adding 50μs to some of them (which is 0.05…0.5% in this case) doesn't look like a lot. Other application components like the web framework and the ORM will likely add more.

@morozov morozov marked this pull request as ready for review November 12, 2020 02:55
Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this come with changes under docs? Maybe an explanation on why the DBAL needs to have an SQL Parser, what features it allows us to implement?

src/Connection.php Outdated Show resolved Hide resolved
@morozov
Copy link
Member Author

morozov commented Nov 12, 2020

Should this come with changes under docs?

I agree it would be a good addition but I don't think it's required for this PR to be accepted. The parser is not a user-facing API, so they normally don't have to know of its existence. I'd rather work on documenting it independently.

Copy link
Member

@greg0ire greg0ire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed a better search than I did before and found that this is actually already documented in https://github.com/doctrine/dbal/blob/3.0.x/docs/en/reference/data-retrieval-and-manipulation.rst#list-of-parameters-conversion

Approving just in case, but please proofread it for anything that should be changed.

@morozov
Copy link
Member Author

morozov commented Nov 12, 2020

I performed a better search [...]

Thanks for digging in. I don't see any mismatch between the new implementation and the current documentation. Except perhaps "a very powerful parsing process" which is now "an even more powerful parsing process".

@greg0ire
Copy link
Member

A parsing process the likes of which you've never seen, frankly it's the best out there, everyone says it is!

@morozov morozov added this to the 3.0.0 milestone Nov 14, 2020
@morozov morozov merged commit 041df42 into doctrine:3.0.x Nov 14, 2020
@morozov morozov deleted the issues/4383 branch November 14, 2020 18:17
vitek-rostislav added a commit to shopsys/shopsys that referenced this pull request Mar 28, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
vitek-rostislav added a commit to shopsys/shopsys that referenced this pull request Mar 28, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
vitek-rostislav added a commit to shopsys/shopsys that referenced this pull request Mar 28, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
ShopsysBot pushed a commit to shopsys/framework that referenced this pull request Mar 29, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
vitek-rostislav added a commit to shopsys/shopsys that referenced this pull request Mar 30, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
ShopsysBot pushed a commit to shopsys/framework that referenced this pull request Mar 31, 2022
… SqlParametersFlattener

- the original method is not available anymore (it has been moved to Connection and made private) in the new version of doctrine/dbal
- see doctrine/dbal#4397
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL parser does not recognize comments
3 participants