Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalengine: numbers #9623

Merged
merged 11 commits into from
Feb 3, 2022
Merged

evalengine: numbers #9623

merged 11 commits into from
Feb 3, 2022

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Feb 3, 2022

Description

Heyyooo you thought that tagging RC1 would stop me BUT I CANT BE STOPPED.

Here's a battery of bug fixes (no new functionality) to the way the evaluation engine handles numbers. This is divided in broadly two fixes:

Hexadecimal literals were not being handled at all

This was an oversight because they've been supported by the parser forever, and the evaluation engine would just report them as unsupported. This was causing a lot of plans for gen4 to show up as unsupported, when the plan was actually fully supported except for the hex identifier.

Implementing support for these literals was not trivial, because their semantics change whenever they're used in a numeric or non-numeric context, but once I got them working properly it greatly increased my confidence in a lot more numerical operations which we were supported but which I hadn't tested well enough yet. I've expanded these tests significantly and the results are good.

As a result of these changes, several query plans that were previously unsupported are now supported. 👌

Negative numbers

"But I thought we already handled negative numbers?" Not quite. The evalengine was actually blissfully unaware of negative numbers, because of an optimization in the sqlparser which was not quite correct.

A helper method on ast_funcs.go in the parser would modify the resulting AST for the parse, so that when a numeric literal was preceded by a -, the - would be inserted as part of the literal (instead of emitting an unary "minus" operator). Likewise, when two unary minus operators appeared in a row, the same helper in ast_funcs.go would just optimize them away (since they cancel each other).

It turns out that neither of these optimizations is valid at all. The unary minus operator needs to appear in the AST for evaluation to be valid because the unary minus operator can change the types of the numerical identifier it applies to. Using - before a large enough BIGINT can cause it to be promoted to decimal if it's negative is too small to fit in the data type, and likewise for unsigneds, floats, and all kinds of janky stuff.

So, this PR removes that optimization and makes it so sqlparser always emits unary minus nodes for all literals, which now we always handle gracefully.

Interestingly, this has also fixed several other query plans which were previously failing, more specifically, query plans that involved literals with very large numbers. We now handle those properly, reducing the amount of unsupported plans in our tests and improving the quality of some of the generated plans.


The only remaining question is whether this needs a backport to the RC1 branch. My hunch is "yes", it fixes real bugs that have been here forever and that would be nice to ship in the upcoming release. It increases the amount of supported plans for gen4, and it increases our confidence in the behavior of the new evalengine.

I'll discuss this further on today's open hour.

Related Issue(s)

Checklist

  • Should this PR be backported? YES, to release-13.0
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

vmg added 6 commits February 3, 2022 14:06
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
mattlord and others added 3 commits February 3, 2022 14:57
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg vmg requested a review from shlomi-noach as a code owner February 3, 2022 14:53
Comment on lines +204 to +209
expected :=
"[[VARBINARY(\"\\x01\") INT64(1)] " +
"[VARBINARY(\"Hello Gopher!\") INT64(3)] " +
"[VARBINARY(\"\\xa5\") INT64(2)] " +
"[VARBINARY(\"\\xc2l\\xaa\\x1a^\\xb9@\\x96Қ\\x1b\\xec\") INT64(4)]]"
assert.Equal(t, expected, fmt.Sprintf("%v", result.Rows))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattlord I've cherry picked your test from #9616 here. I did have to change the expected output. Two notes:

  1. Because of the context they're in, the hex values and numbers are always coerced to VARBINARY during evaluation, so their original types are lost. AFAICT this matches MySQL's behavior -- I hope it won't be a problem in practice.

  2. Since the resulting rows are order by id, they should come as 1, 3, 2, 4, since that's the proper ordering of the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal transformations should be fine. I did a quick manual test with this branch and the results match up with what I had seen with PlanValue before:

$ mysql customer -e "select id, hex_keyspace_id, shard from customer.binary_vdx where id = 0x997901"
+------+-----------------+-------+
| id   | hex_keyspace_id | shard |
+------+-----------------+-------+
| �y  | 997901          | 80-   |
+------+-----------------+-------+
$ mysql customer -e "select id, hex_keyspace_id, shard from customer.binary_vdx where id = x'997901'"
+------+-----------------+-------+
| id   | hex_keyspace_id | shard |
+------+-----------------+-------+
| �y  | 997901          | 80-   |
+------+-----------------+-------+

$ mysql customer -e "select id, hex_keyspace_id, shard from customer.binary_md5_vdx where id = 0x997901"
+------+----------------------------------+-------+
| id   | hex_keyspace_id                  | shard |
+------+----------------------------------+-------+
| �y  | 2d2da332b59f56aeb235ecb36d639440 | -80   |
+------+----------------------------------+-------+
vitess@003f6cbe061b:/vt/local$ mysql customer -e "select id, hex_keyspace_id, shard from customer.binary_md5_vdx where id = x'997901'"
+------+----------------------------------+-------+
| id   | hex_keyspace_id                  | shard |
+------+----------------------------------+-------+
| �y  | 2d2da332b59f56aeb235ecb36d639440 | -80   |
+------+----------------------------------+-------+

$ mysql customer -e "select id, hex_keyspace_id, shard from customer.xxhash_vdx where id = 0x997901"
+------+------------------+-------+
| id   | hex_keyspace_id  | shard |
+------+------------------+-------+
| �y  | ac29dcaa4a0b4527 | 80-   |
+------+------------------+-------+
vitess@003f6cbe061b:/vt/local$ mysql customer -e "select id, hex_keyspace_id, shard from customer.xxhash_vdx where id = x'997901'"
+------+------------------+-------+
| id   | hex_keyspace_id  | shard |
+------+------------------+-------+
| �y  | ac29dcaa4a0b4527 | 80-   |
+------+------------------+-------+

THANK YOU! 😍

@deepthi
Copy link
Member

deepthi commented Feb 3, 2022

Should this PR be backported? YES, to release-13.0

YES to release-13.0.

vmg added 2 commits February 3, 2022 16:52
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@vmg vmg merged commit dac8450 into vitessio:main Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants