-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: improve re-typing to equivalent but non-identical types #51023
Conversation
I am not super-sure of this. I don't know of any specific case that is fixed by this. If we keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very intriguing - what does it do to a query like this:
CREATE TABLE a (a INT2, b INT8)
SELECT a+b FROM a
We've been forced to do various annoying stuff in vectorized because of the lack of optimizer-driven type casts between different integer widths, and I'm wondering if this could be the cure...
In that case I think we do what you need, because we
On the other hand, with this change if you have some types with custom width and you add them, you always get the vanilla type as a result. It seems like we would need more sophisticated type checking. |
That's okay though - the vanilla type result is a limitation of the type system as you say and we already have to account for this, right? I think even before this change if you added int2 to int2 you'd get int8, no matter what.
Just having the casts here is a big improvement, I'm pretty sure. @yuzefovich will be able to say for sure, but I think this will permit us to remove some of the complex cross-width templates we needed to add. |
I agree with Jordan that it's an improvement from the execution perspective (although I'm not sure whether we will be able to remove any of the code we put in place). |
0e99021
to
2a9c22b
Compare
Updated with some relevant changes in vectorize_local and vectorize_overloads plans. BTW these should be execbuilder tests if they depend on planning.. |
Yes, we'll clean this up at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/logictest/testdata/logic_test/vectorize_local, line 188 at r1 (raw file):
└ *colexec.projPlusInt64Int64ConstOp └ *colexec.projPlusInt64Int64Op └ *colexec.castOp
This is interesting - will this prevent us from ever using integer widths less than 64 bit in execution until we fix the overload system to support them fully?
I noticed that the `ReType` function uses `Equivalent`. This is not strict enough if our expectation is that the re-typed expression will have the wanted `ResolvedType()`. This commit changes this to use `Identical`. Release note: None
Yeah, sounds like it.. |
2a9c22b
to
c28e322
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For history, the original type system only had "equivalent" types - there was no conception of different OIDs for equivalent types. optbuilder
dates back before that time. When I did the type refactor, I wanted to minimize behavior changes, and so simply ported over type equality calls to the Equivalent
method rather than use the new Identical
method that I added. There are therefore probably many places where we'd need to get stricter. We'd need to be OK with possible breaking changes (I think justified in this case).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)
I think I'd be okay with merging this assuming it passes tests and so on. It would make performance for vectorized queries against non 64 bit integers worse, probably by a marginal amount. But the mixed types logic in the engine is extremely annoying. Being able to remove it would be worth it, I think. @yuzefovich I'm pretty sure this would allow us to rip out a lot of hacks, right? It's sad because we already fixed all of the (known) bugs here, but all of those issues with unexpected widths would just vanish. I think that ultimately we need to follow @otan's proposal for implicit casts, and also our overload system to allow more precise descriptions of the widths of arguments to operators. If we did that, then this change would do the right thing while still permitting things like |
Yes, it is annoying, but because of
(meaning we have already put noticeable effort into getting that mixed-type expression evaluation right) I wouldn't be too eager to remove that logic since eventually, once the overload system is fixed properly, we'll need that code back. I'm not saying that we should definitely keep the existing annoying code, rather that we should be more thoughtful about pros and cons of keeping that about-to-become-dead code vs removing it and possibly needing to reimplement it later. |
I think regardless of the type system changes, it's reasonable for the optimizer to make sure there is always an explicit upcast in mixed-width scenarios. |
I agree, but my hope is that once the overload system is fixed properly, we won't have a separate concept of widths vs top level types. We'll promote the non standard integer widths to top level types, instead. @RaduBerinde, if we were to model ourselves after Postgres's type system, though, we would permit mixed-width operators without needing any upcasts. For example, here is the list of
As you can see, they support int2+int2, int2+int4, and int2+int8, all with different result widths. Maybe this level of granularity isn't necessary, and we could just provide equal-width operators with implicit casts to pave over the differences, I'm not sure. |
Yes, agreed. My thought is just that if we don't think that's useful / more efficient, we can easily set up optimizer to add casts at execbuild time (e.g. cast the smaller-width to match the other one). |
I'm going to go ahead and merge this, and we'll see how this new world of strict typing feels. bors r+ |
Build succeeded |
I noticed that the
ReType
function usesEquivalent
. This is not strictenough if our expectation is that the re-typed expression will have the wanted
ResolvedType()
. This commit changes this to useIdentical
.Release note: None