Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a number of encoding issues when evaluating expressions with the evalengine #13509

Merged
merged 14 commits into from
Jul 19, 2023

Commits on Jul 17, 2023

  1. collations: Fix appending to initial slice

    The logic here was wrong and we were always starting from the start of
    slice if an existing buffer was passed in to append to. Instead we need
    to actually append from the end of the slice.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    e40a738 View commit details
    Browse the repository at this point in the history
  2. Fix encoding issues for evaluated expressions

    As described in vitessio#13493 we have various bugs where we transcode inputs
    incorrectly and where we lack collation information in the results
    returned.
    
    The changes here fix a number of things:
    
    - Ensure that we always return collation information for querypb.Field
      instances. This also updates a lot of the static mocked inputs to
      ensure that they match what real queries would return. Many of the
      changes associated with this part of the fixes is ensuring the mocks are
      expectations are updated properly.
    - Introduce an IntroduceExpr in the evalengine. One of the issues
      identified is that we didn't correctly handle hex escaped values and
      we would never see them being unescaped if a charset introducer was
      used. The cause of this is that we directly would change the collation
      information on the expression. This is wrong in the case of a HexNum or
      HexVal, since we loose information we need to transcode those. We
      could work around this for the AST evaluator, but the compiler couldn't
      be fixed as the real underlying value was not available with the actual
      hex encoded value. By introducing a new expression, we can also now
      properly compile this.
    - No flags information was sent along with evaluated results. This is
      similar to the collation information, although usually not as
      important for clients. We opted here to fix this nonetheless and match
      MySQL as closely as possible.
    - Introduce a `collations.SystemCollation` to use consistently for all
      internal table / column names information. It was using mostly the
      constant for utf8mb3 or no information at all which would be problematic
      as it means we woulnd't know what encoding it was.
    - Convert outputs to the connection collation / charset. This was
      entirely missing from Vitess but is behavior that MySQL has. If you
      read from say a `latin1` encoded column with a connection charset of
      `utf8mb4`, the value is converted before it's sent over the wire to
      `utf8mb4`. This fix also applies for the `evalengine` tests where we
      can now test also non UTF-8 compatible encodings, fixing a bunch of
      todos. This uncovered some other small bugs in the `evalengine` where
      we were converting things wrong.
    - The rowstreamer would return collation information from the result
      set, but that is now wrong because of the above conversion fixes. This
      means we need to use the actual fields information which we already have
      available to see the proper collation. We can copy all the attributes
      for the PK column.
    - To be explicit, it includes a rename from `collations.CollationUtf8ID`
      to `collations.CollationUtf8mb3ID` to be explicit about this collation
      name and match how MySQL is also changing this to be explicit.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    66fac65 View commit details
    Browse the repository at this point in the history
  3. normalizer: Simplify normalization

    This removes the runtime regexp compilation as that's not necessary. It
    also normalizes the casing to match MySQL to upcase things and adds a
    leading `0` if needed.
    
    We move the helpers for this to a new utility package.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    4666428 View commit details
    Browse the repository at this point in the history
  4. Fix tests

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    d2c421f View commit details
    Browse the repository at this point in the history
  5. Use explicit types and collations for columns and bindvars

    This way we can ensure we always initialize these with the proper
    explicit typing information.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    41ab596 View commit details
    Browse the repository at this point in the history
  6. evalengine: Cleanup unneeded flag

    We know this value always based on whether the type is -1 or a valid
    value, so we don't need the additional boolean value here.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    8da8cdf View commit details
    Browse the repository at this point in the history
  7. planbuilder: Pass through type information as well

    With the type information we can better decide if we need to print the
    collation change or not, since it's only needed for textual columns.
    
    This also fixes a number of cases where we used to use the weight_string
    but we no longer need to since we now pass through the correct type
    information in more cases.
    
    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    cd02256 View commit details
    Browse the repository at this point in the history
  8. vtexplain: Update test output for improved query support

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    2aee93a View commit details
    Browse the repository at this point in the history
  9. Fix another test issue

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    12d3cc9 View commit details
    Browse the repository at this point in the history
  10. Fix license and do cleanup

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 17, 2023
    Configuration menu
    Copy the full SHA
    e382a3f View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2023

  1. Add constant for sqltypes.Unknown

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    f780e60 View commit details
    Browse the repository at this point in the history
  2. Small code cleanup

    Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
    dbussink committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    7afedbf View commit details
    Browse the repository at this point in the history
  3. Merge remote-tracking branch 'upstream/main' into dbussink/fix-many-e…

    …ncoding-issues
    
    Signed-off-by: Andres Taylor <andres@planetscale.com>
    systay committed Jul 18, 2023
    Configuration menu
    Copy the full SHA
    67b164f View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. fix fake test result to not have weight_string as it is not present i…

    …n the query send down
    
    Signed-off-by: Harshit Gangal <harshit@planetscale.com>
    harshit-gangal committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    afcebb2 View commit details
    Browse the repository at this point in the history