opt: improve re-typing to equivalent but non-identical types #51023

RaduBerinde · 2020-07-06T20:35:00Z

I noticed that the ReType function uses Equivalent. This is not strict
enough if our expectation is that the re-typed expression will have the wanted
ResolvedType(). This commit changes this to use Identical.

Release note: None

cockroach-teamcity · 2020-07-06T20:35:07Z

This change is

RaduBerinde · 2020-07-06T20:36:07Z

I am not super-sure of this. I don't know of any specific case that is fixed by this. If we keep Equivalent then we should document better what ReType does.

jordanlewis

This is very intriguing - what does it do to a query like this:

CREATE TABLE a (a INT2, b INT8)
SELECT a+b FROM a

We've been forced to do various annoying stuff in vectorized because of the lack of optimizer-driven type casts between different integer widths, and I'm wondering if this could be the cure...

RaduBerinde · 2020-07-06T22:28:09Z

In that case I think we do what you need, because we ReType the arguments to the types in the builtin BinOp:

root@127.0.0.1:33293/defaultdb> EXPLAIN (OPT, VERBOSE) SELECT a+b FROM a;
                                 text
-----------------------------------------------------------------------
  project
   ├── columns: "?column?":4
   ├── immutable
   ├── stats: [rows=1000]
   ├── cost: 1120.03
   ├── prune: (4)
   ├── scan a
   │    ├── columns: a:1 b:2
   │    ├── stats: [rows=1000]
   │    ├── cost: 1100.02
   │    └── prune: (1,2)
   └── projections
        └── b:2 + a:1::INT8 [as="?column?":4, outer=(1,2), immutable]
(13 rows)

Time: 554.739µs

root@127.0.0.1:33293/defaultdb> EXPLAIN (VERBOSE) SELECT a+b FROM a;
    tree    |    field     | description |   columns    | ordering
------------+--------------+-------------+--------------+-----------
            | distribution | full        |              |
            | vectorized   | false       |              |
  render    |              |             | ("?column?") |
   │        | render 0     | b + a::INT8 |              |
   └── scan |              |             | (a, b)       |
            | table        | a@primary   |              |
            | spans        | FULL SCAN   |              |

On the other hand, with this change if you have some types with custom width and you add them, you always get the vanilla type as a result. It seems like we would need more sophisticated type checking.

jordanlewis · 2020-07-07T04:18:41Z

That's okay though - the vanilla type result is a limitation of the type system as you say and we already have to account for this, right? I think even before this change if you added int2 to int2 you'd get int8, no matter what.

root@127.0.0.1:65034/defaultdb> create table a (a int2, b int2);
CREATE TABLE

root@127.0.0.1:65034/defaultdb> explain(opt,types) select a+b from a;
                                 text
----------------------------------------------------------------------
  project
   ├── columns: "?column?":4(int)
   ├── immutable
   ├── stats: [rows=1000]
   ├── cost: 1120.03
   ├── prune: (4)
   ├── scan a
   │    ├── columns: a:1(int2) b:2(int2)
   │    ├── stats: [rows=1000]
   │    ├── cost: 1100.02
   │    └── prune: (1,2)
   └── projections
        └── plus [as="?column?":4, type=int, outer=(1,2), immutable]
             ├── variable: a:1 [type=int2]
             └── variable: b:2 [type=int2]
(15 rows)

Time: 990µs

Just having the casts here is a big improvement, I'm pretty sure. @yuzefovich will be able to say for sure, but I think this will permit us to remove some of the complex cross-width templates we needed to add.

yuzefovich · 2020-07-07T04:30:38Z

I agree with Jordan that it's an improvement from the execution perspective (although I'm not sure whether we will be able to remove any of the code we put in place).

RaduBerinde · 2020-07-07T14:11:32Z

Updated with some relevant changes in vectorize_local and vectorize_overloads plans. BTW these should be execbuilder tests if they depend on planning..

yuzefovich · 2020-07-07T14:48:54Z

BTW these should be execbuilder tests if they depend on planning..

Yes, we'll clean this up at some point.

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/logictest/testdata/logic_test/vectorize_local, line 188 at r1 (raw file):

          └ *colexec.projPlusInt64Int64ConstOp
            └ *colexec.projPlusInt64Int64Op
              └ *colexec.castOp

This is interesting - will this prevent us from ever using integer widths less than 64 bit in execution until we fix the overload system to support them fully?

I noticed that the `ReType` function uses `Equivalent`. This is not strict enough if our expectation is that the re-typed expression will have the wanted `ResolvedType()`. This commit changes this to use `Identical`. Release note: None

RaduBerinde · 2020-07-08T16:02:59Z

Yeah, sounds like it..

andy-kimball

For history, the original type system only had "equivalent" types - there was no conception of different OIDs for equivalent types. optbuilder dates back before that time. When I did the type refactor, I wanted to minimize behavior changes, and so simply ported over type equality calls to the Equivalent method rather than use the new Identical method that I added. There are therefore probably many places where we'd need to get stricter. We'd need to be OK with possible breaking changes (I think justified in this case).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

jordanlewis · 2020-07-09T02:37:13Z

I think I'd be okay with merging this assuming it passes tests and so on. It would make performance for vectorized queries against non 64 bit integers worse, probably by a marginal amount.

But the mixed types logic in the engine is extremely annoying. Being able to remove it would be worth it, I think. @yuzefovich I'm pretty sure this would allow us to rip out a lot of hacks, right? It's sad because we already fixed all of the (known) bugs here, but all of those issues with unexpected widths would just vanish.

I think that ultimately we need to follow @otan's proposal for implicit casts, and also our overload system to allow more precise descriptions of the widths of arguments to operators. If we did that, then this change would do the right thing while still permitting things like int4+int4 without pointless upcasting.

yuzefovich · 2020-07-09T05:48:37Z

But the mixed types logic in the engine is extremely annoying. Being able to remove it would be worth it, I think. @yuzefovich I'm pretty sure this would allow us to rip out a lot of hacks, right?

Yes, it is annoying, but because of

It's sad because we already fixed all of the (known) bugs here, but all of those issues with unexpected widths would just vanish.

(meaning we have already put noticeable effort into getting that mixed-type expression evaluation right) I wouldn't be too eager to remove that logic since eventually, once the overload system is fixed properly, we'll need that code back. I'm not saying that we should definitely keep the existing annoying code, rather that we should be more thoughtful about pros and cons of keeping that about-to-become-dead code vs removing it and possibly needing to reimplement it later.

RaduBerinde · 2020-07-09T15:56:27Z

I think regardless of the type system changes, it's reasonable for the optimizer to make sure there is always an explicit upcast in mixed-width scenarios.

jordanlewis · 2020-07-10T00:00:15Z

I wouldn't be too eager to remove that logic since eventually, once the overload system is fixed properly, we'll need that code back.

I agree, but my hope is that once the overload system is fixed properly, we won't have a separate concept of widths vs top level types. We'll promote the non standard integer widths to top level types, instead.

@RaduBerinde, if we were to model ourselves after Postgres's type system, though, we would permit mixed-width operators without needing any upcasts. For example, here is the list of + operators in Postgres for int2 on the left:

jordan=# select oprleft::regtype, oprright::regtype, oprresult::regtype from pg_operator where oprname='+' and oprleft='int2'::regtype;
 oprleft  | oprright | oprresult
----------+----------+-----------
 smallint | smallint | smallint
 smallint | integer  | integer
 smallint | bigint   | bigint
(3 rows)

As you can see, they support int2+int2, int2+int4, and int2+int8, all with different result widths. Maybe this level of granularity isn't necessary, and we could just provide equal-width operators with implicit casts to pave over the differences, I'm not sure.

RaduBerinde · 2020-07-10T18:59:44Z

Yes, agreed. My thought is just that if we don't think that's useful / more efficient, we can easily set up optimizer to add casts at execbuild time (e.g. cast the smaller-width to match the other one).

RaduBerinde · 2020-07-10T19:14:37Z

I'm going to go ahead and merge this, and we'll see how this new world of strict typing feels.

bors r+

craig · 2020-07-10T19:52:28Z

Build succeeded

GitHub CI (Cockroach)

RaduBerinde requested review from jordanlewis and andy-kimball July 6, 2020 20:35

RaduBerinde requested a review from a team as a code owner July 6, 2020 20:35

jordanlewis reviewed Jul 6, 2020

View reviewed changes

RaduBerinde force-pushed the retype-identical branch from 0e99021 to 2a9c22b Compare July 7, 2020 14:11

jordanlewis reviewed Jul 8, 2020

View reviewed changes

RaduBerinde force-pushed the retype-identical branch from 2a9c22b to c28e322 Compare July 8, 2020 16:05

andy-kimball reviewed Jul 9, 2020

View reviewed changes

craig bot merged commit d0c7962 into cockroachdb:master Jul 10, 2020

RaduBerinde deleted the retype-identical branch July 15, 2020 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: improve re-typing to equivalent but non-identical types #51023

opt: improve re-typing to equivalent but non-identical types #51023

RaduBerinde commented Jul 6, 2020

cockroach-teamcity commented Jul 6, 2020

RaduBerinde commented Jul 6, 2020

jordanlewis left a comment

RaduBerinde commented Jul 6, 2020

jordanlewis commented Jul 7, 2020

yuzefovich commented Jul 7, 2020

RaduBerinde commented Jul 7, 2020

yuzefovich commented Jul 7, 2020

jordanlewis left a comment

RaduBerinde commented Jul 8, 2020

andy-kimball left a comment

jordanlewis commented Jul 9, 2020

yuzefovich commented Jul 9, 2020

RaduBerinde commented Jul 9, 2020 •

edited

Loading

jordanlewis commented Jul 10, 2020

RaduBerinde commented Jul 10, 2020

RaduBerinde commented Jul 10, 2020

craig bot commented Jul 10, 2020

opt: improve re-typing to equivalent but non-identical types #51023

opt: improve re-typing to equivalent but non-identical types #51023

Conversation

RaduBerinde commented Jul 6, 2020

cockroach-teamcity commented Jul 6, 2020

RaduBerinde commented Jul 6, 2020

jordanlewis left a comment

Choose a reason for hiding this comment

RaduBerinde commented Jul 6, 2020

jordanlewis commented Jul 7, 2020

yuzefovich commented Jul 7, 2020

RaduBerinde commented Jul 7, 2020

yuzefovich commented Jul 7, 2020

jordanlewis left a comment

Choose a reason for hiding this comment

RaduBerinde commented Jul 8, 2020

andy-kimball left a comment

Choose a reason for hiding this comment

jordanlewis commented Jul 9, 2020

yuzefovich commented Jul 9, 2020

RaduBerinde commented Jul 9, 2020 • edited Loading

jordanlewis commented Jul 10, 2020

RaduBerinde commented Jul 10, 2020

RaduBerinde commented Jul 10, 2020

craig bot commented Jul 10, 2020

Build succeeded

RaduBerinde commented Jul 9, 2020 •

edited

Loading