Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: improve re-typing to equivalent but non-identical types #51023

Merged
merged 1 commit into from
Jul 10, 2020

Conversation

RaduBerinde
Copy link
Member

I noticed that the ReType function uses Equivalent. This is not strict
enough if our expectation is that the re-typed expression will have the wanted
ResolvedType(). This commit changes this to use Identical.

Release note: None

@RaduBerinde RaduBerinde requested a review from a team as a code owner July 6, 2020 20:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member Author

I am not super-sure of this. I don't know of any specific case that is fixed by this. If we keep Equivalent then we should document better what ReType does.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very intriguing - what does it do to a query like this:

CREATE TABLE a (a INT2, b INT8)
SELECT a+b FROM a

We've been forced to do various annoying stuff in vectorized because of the lack of optimizer-driven type casts between different integer widths, and I'm wondering if this could be the cure...

@RaduBerinde
Copy link
Member Author

In that case I think we do what you need, because we ReType the arguments to the types in the builtin BinOp:

root@127.0.0.1:33293/defaultdb> EXPLAIN (OPT, VERBOSE) SELECT a+b FROM a;
                                 text
-----------------------------------------------------------------------
  project
   ├── columns: "?column?":4
   ├── immutable
   ├── stats: [rows=1000]
   ├── cost: 1120.03
   ├── prune: (4)
   ├── scan a
   │    ├── columns: a:1 b:2
   │    ├── stats: [rows=1000]
   │    ├── cost: 1100.02
   │    └── prune: (1,2)
   └── projections
        └── b:2 + a:1::INT8 [as="?column?":4, outer=(1,2), immutable]
(13 rows)

Time: 554.739µs

root@127.0.0.1:33293/defaultdb> EXPLAIN (VERBOSE) SELECT a+b FROM a;
    tree    |    field     | description |   columns    | ordering
------------+--------------+-------------+--------------+-----------
            | distribution | full        |              |
            | vectorized   | false       |              |
  render    |              |             | ("?column?") |
   │        | render 0     | b + a::INT8 |              |
   └── scan |              |             | (a, b)       |
            | table        | a@primary   |              |
            | spans        | FULL SCAN   |              |

On the other hand, with this change if you have some types with custom width and you add them, you always get the vanilla type as a result. It seems like we would need more sophisticated type checking.

@jordanlewis
Copy link
Member

That's okay though - the vanilla type result is a limitation of the type system as you say and we already have to account for this, right? I think even before this change if you added int2 to int2 you'd get int8, no matter what.

root@127.0.0.1:65034/defaultdb> create table a (a int2, b int2);
CREATE TABLE

root@127.0.0.1:65034/defaultdb> explain(opt,types) select a+b from a;
                                 text
----------------------------------------------------------------------
  project
   ├── columns: "?column?":4(int)
   ├── immutable
   ├── stats: [rows=1000]
   ├── cost: 1120.03
   ├── prune: (4)
   ├── scan a
   │    ├── columns: a:1(int2) b:2(int2)
   │    ├── stats: [rows=1000]
   │    ├── cost: 1100.02
   │    └── prune: (1,2)
   └── projections
        └── plus [as="?column?":4, type=int, outer=(1,2), immutable]
             ├── variable: a:1 [type=int2]
             └── variable: b:2 [type=int2]
(15 rows)

Time: 990µs

Just having the casts here is a big improvement, I'm pretty sure. @yuzefovich will be able to say for sure, but I think this will permit us to remove some of the complex cross-width templates we needed to add.

@yuzefovich
Copy link
Member

I agree with Jordan that it's an improvement from the execution perspective (although I'm not sure whether we will be able to remove any of the code we put in place).

@RaduBerinde
Copy link
Member Author

Updated with some relevant changes in vectorize_local and vectorize_overloads plans. BTW these should be execbuilder tests if they depend on planning..

@yuzefovich
Copy link
Member

BTW these should be execbuilder tests if they depend on planning..

Yes, we'll clean this up at some point.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)


pkg/sql/logictest/testdata/logic_test/vectorize_local, line 188 at r1 (raw file):

          └ *colexec.projPlusInt64Int64ConstOp
            └ *colexec.projPlusInt64Int64Op
              └ *colexec.castOp

This is interesting - will this prevent us from ever using integer widths less than 64 bit in execution until we fix the overload system to support them fully?

I noticed that the `ReType` function uses `Equivalent`. This is not strict
enough if our expectation is that the re-typed expression will have the wanted
`ResolvedType()`. This commit changes this to use `Identical`.

Release note: None
@RaduBerinde
Copy link
Member Author

Yeah, sounds like it..

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For history, the original type system only had "equivalent" types - there was no conception of different OIDs for equivalent types. optbuilder dates back before that time. When I did the type refactor, I wanted to minimize behavior changes, and so simply ported over type equality calls to the Equivalent method rather than use the new Identical method that I added. There are therefore probably many places where we'd need to get stricter. We'd need to be OK with possible breaking changes (I think justified in this case).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

@jordanlewis
Copy link
Member

I think I'd be okay with merging this assuming it passes tests and so on. It would make performance for vectorized queries against non 64 bit integers worse, probably by a marginal amount.

But the mixed types logic in the engine is extremely annoying. Being able to remove it would be worth it, I think. @yuzefovich I'm pretty sure this would allow us to rip out a lot of hacks, right? It's sad because we already fixed all of the (known) bugs here, but all of those issues with unexpected widths would just vanish.

I think that ultimately we need to follow @otan's proposal for implicit casts, and also our overload system to allow more precise descriptions of the widths of arguments to operators. If we did that, then this change would do the right thing while still permitting things like int4+int4 without pointless upcasting.

@yuzefovich
Copy link
Member

But the mixed types logic in the engine is extremely annoying. Being able to remove it would be worth it, I think. @yuzefovich I'm pretty sure this would allow us to rip out a lot of hacks, right?

Yes, it is annoying, but because of

It's sad because we already fixed all of the (known) bugs here, but all of those issues with unexpected widths would just vanish.

(meaning we have already put noticeable effort into getting that mixed-type expression evaluation right) I wouldn't be too eager to remove that logic since eventually, once the overload system is fixed properly, we'll need that code back. I'm not saying that we should definitely keep the existing annoying code, rather that we should be more thoughtful about pros and cons of keeping that about-to-become-dead code vs removing it and possibly needing to reimplement it later.

@RaduBerinde
Copy link
Member Author

RaduBerinde commented Jul 9, 2020

I think regardless of the type system changes, it's reasonable for the optimizer to make sure there is always an explicit upcast in mixed-width scenarios.

@jordanlewis
Copy link
Member

I wouldn't be too eager to remove that logic since eventually, once the overload system is fixed properly, we'll need that code back.

I agree, but my hope is that once the overload system is fixed properly, we won't have a separate concept of widths vs top level types. We'll promote the non standard integer widths to top level types, instead.

@RaduBerinde, if we were to model ourselves after Postgres's type system, though, we would permit mixed-width operators without needing any upcasts. For example, here is the list of + operators in Postgres for int2 on the left:

jordan=# select oprleft::regtype, oprright::regtype, oprresult::regtype from pg_operator where oprname='+' and oprleft='int2'::regtype;
 oprleft  | oprright | oprresult
----------+----------+-----------
 smallint | smallint | smallint
 smallint | integer  | integer
 smallint | bigint   | bigint
(3 rows)

As you can see, they support int2+int2, int2+int4, and int2+int8, all with different result widths. Maybe this level of granularity isn't necessary, and we could just provide equal-width operators with implicit casts to pave over the differences, I'm not sure.

@RaduBerinde
Copy link
Member Author

Yes, agreed. My thought is just that if we don't think that's useful / more efficient, we can easily set up optimizer to add casts at execbuild time (e.g. cast the smaller-width to match the other one).

@RaduBerinde
Copy link
Member Author

I'm going to go ahead and merge this, and we'll see how this new world of strict typing feels.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 10, 2020

Build succeeded

@craig craig bot merged commit d0c7962 into cockroachdb:master Jul 10, 2020
@RaduBerinde RaduBerinde deleted the retype-identical branch July 15, 2020 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants