-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sessiondata: extract all non-local parameters into a protobuf #55569
Conversation
4ae021f
to
2654bc4
Compare
I didn't extract all the "simple" fields, so currently this PR doesn't close the issue, yet it does solve a problem with default int size propagation and makes a step in the right direction. I've audited all settings, and there were no other omissions that I found (i.e. all others influence only pre-distributed-execution stages), so I think it should be good enough for now. Thoughts? |
b7b61f3
to
0e3822e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- What is left in to do in the issue that this PR doesn't address?
- Why not move all fields into the protobuf? I guess there's no point for local-only session settings.
Reviewed 32 of 43 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/internal.go, line 539 at r1 (raw file):
_ []pgwirebase.FormatCode, _ sessiondatapb.DataConversionConfig, _ *time.Location,
Was this added because this was moved out of DataConversionConfig
? What's the reasoning?
pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):
optional string temporary_schema_name = 13 [(gogoproto.nullable) = false]; reserved 14; optional sessiondatapb.SessionData session_data = 15 [(gogoproto.nullable) = false];
Why is this optional?
pkg/sql/parser/parse.go, line 98 at r1 (raw file):
// of how a "naked" INT type should be parsed returns the corresponding integer // type. func NakedIntTypeFromDefaultIntSize(defaultIntSize int32) *types.T {
What's the reasoning behind putting this in the parser
package?
pkg/sql/sessiondata/session_data.go, line 26 at r1 (raw file):
// A SQL Session changes fields in SessionData through sql.sessionDataMutator. type SessionData struct { sessiondatapb.SessionData
Comment please
pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):
// WARNING: consider whether a session parameter you're adding needs to // // be propagated to the remote nodes. If so, consider adding it to // // sessiondatapb.SessionData (preferred) or execinfrapb.EvalContext. //
I would remove a mention of EvalContext
. If we're adding session data, that should always live in sessiondatapb.SessionData
, right?
pkg/sql/sessiondatapb/session_data.proto, line 11 at r1 (raw file):
// licenses/APL.txt. syntax = "proto2";
Why not make this proto3?
0e3822e
to
8d39435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There are many fields in
sessiondata.SessionData
object that can be easily moved intosessiondatapb.SessionData
(because most of them areint
s orbool
s), so I think in order to close the issue we should decide to either move them or ignore them completely. - Yes, there is not much benefit to moving local-only settings, and it'd be more work. I do think that in the long run it would be beneficial to move as much as possible into the protobuf, so that instead of thinking whether a setting is local-only or not, we would be adding new things to the protobuf by default (unless it is not a "simple" field).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/internal.go, line 539 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Was this added because this was moved out of
DataConversionConfig
? What's the reasoning?
Yes. Location is serialized as string
, but then we're operation on *time.Location
probably in order to cache the result of a lookup. This field requires "custom" logic to deserialize it, so I moved it out of DataConversionConfig
. Do you know a way to not do that?
pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why is this optional?
Hm, good point - I didn't realize there is a required
keyword, I thought that only optional
and repeated
are there, updated.
pkg/sql/parser/parse.go, line 98 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What's the reasoning behind putting this in the
parser
package?
Hm, I guess because defaultNakedIntType
is defined here. Do you have a better suggestion? Currently the method is called from sql
and execinfrapb
packages, and the method is related to parsing - it determines how to parse ::INT
casts, so the current location seems reasonable to me.
pkg/sql/sessiondata/session_data.go, line 26 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Comment please
Done.
pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I would remove a mention of
EvalContext
. If we're adding session data, that should always live insessiondatapb.SessionData
, right?
Not everything can be easily serialized - that *time.Location
is an example. Hm, I think it is possible to have custom Marshal/Unmarshal
methods, but I don't know how to store *time.Location
which doesn't seem to have a protobuf representation in a proto struct.
pkg/sql/sessiondatapb/session_data.proto, line 11 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why not make this proto3?
No reason, just followed the example from api.proto
, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a field doesn't need to be sent to a remote node, it doesn't make sense to have it in the protobuf since it produces unnecessary serialization overhead so in my opinion this PR does enough (although I'd like to see the EvalContext proto disappear if possible). Maybe we could find a way to force an engineer to mark a field as local only, maybe using an init
-time check with reflection that checks some map keyed by the field names? If someone adds a field they would add it to this map to indicate that this field is local-only. Possibly this is overkill and what we should do is rename SessionData
to LocalOnlySessionData
Reviewed 1 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/internal.go, line 539 at r1 (raw file):
Previously, yuzefovich wrote…
Yes. Location is serialized as
string
, but then we're operation on*time.Location
probably in order to cache the result of a lookup. This field requires "custom" logic to deserialize it, so I moved it out ofDataConversionConfig
. Do you know a way to not do that?
Gotcha. Well I think we should still keep a string location in DataConversionConfig
and deserialize that similar to what we do with the EvalContext
protobuf. I think this PR should aim to replace that (the EvalContext protobuf) with this new SessionData protobuf. The nice thing is that we can use this protobuf natively for many fields. Unfortunately time.Location
is not one of those fields, so we'll just have to bite the bullet and find a different form of accessing it (moving it into the SessionData
non-protobuf struct and "deserializing" the string into the in-memory representation is ok I think)
pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):
Previously, yuzefovich wrote…
Hm, good point - I didn't realize there is a
required
keyword, I thought that onlyoptional
andrepeated
are there, updated.
Actually it might be better to keep this as you had it. In the future, if we for some reason decide to not use this field, we can avoid the need to send this just to keep the protobuf code happy.
pkg/sql/parser/parse.go, line 98 at r1 (raw file):
Previously, yuzefovich wrote…
Hm, I guess because
defaultNakedIntType
is defined here. Do you have a better suggestion? Currently the method is called fromsql
andexecinfrapb
packages, and the method is related to parsing - it determines how to parse::INT
casts, so the current location seems reasonable to me.
No objections
pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):
Previously, yuzefovich wrote…
Not everything can be easily serialized - that
*time.Location
is an example. Hm, I think it is possible to have customMarshal/Unmarshal
methods, but I don't know how to store*time.Location
which doesn't seem to have a protobuf representation in a proto struct.
Forgetting about time.Location which needs a special solution, if you're adding a field to either session data or the eval context, aren't you serializing that anyway and isn't the point to remove the EvalContext protobuf since it's an extra layer of translation that this PR attempts to resolve? I think this comment needs to be expanded to indicate when to choose which at least.
pkg/sql/sessiondatapb/session_data.proto, line 46 at r2 (raw file):
// DataConversionConfig contains the parameters that influence the conversion // between SQL data types and strings/byte arrays. message DataConversionConfig {
Is the location string serialized anywhere or are we still relying on the eval context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the drive-by review, I was monitoring the convo as I am interested in the end result.
Regarding locations: we have a general problem which is that the time zone database may not be the same on every node. So if we rely on the Location setting for computational results (I assume it's used for timestamp conversations) just transporting the name of the timezone and looking it back up from the filesystem on the target node does not guarantee stability.
I wanted to raise this because the general thrust behind this PR was to once and for all guarantee a "consistent input state" between the gateway and the remote processors, so they all operate on the same data. This escape to the node-specific timezone database breaks this guarantee.
I don't know if it's possible to actually serialize a time.Location object completely. Maybe we can capture the computed time offset? (I don't know if the offset is sufficient. For example if we convert timestamps in the past, a different location-specific offset applies. I think a location is in fact a more complex data structure.)
This discussion intimately relates to issue #36864 and @bdarnell 's comment here.
I'm not sure we can fully address this in this PR (you'd need to find a way to effectively serialize time.Location. Is that possible?) If not, there should be a clear explanation in the protobuf definition that points to issue #36864 with an outline of the problem to solve, and a comment on the issue thread for #36864 on github that referes back to the sessiondata proto so that the next person to work on this area thinks about looking here too.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can capture the computed time offset? (I don't know if the offset is sufficient. For example if we convert timestamps in the past, a different location-specific offset applies. I think a location is in fact a more complex data structure.)
A time.Location is more than an offset - it includes historical and future rules including daylight savings changes. (It's the DST changes that are responsible for most of the churn and complexity)
There is no interface in the time
package for serializing or deserializing a location, but that's exactly what the zoneinfo.zip file is (or one of the small per-zone files within it), so in theory we could ship that around (both in the session state and in the eval context). But I don't think duplicating this data into the session state and eval context is the right approach. I'd think of it as something more like a global descriptor to be leased by the sessions.
In any case, I don't think this adds significantly to the timezone issues we already have so I don't think we need to solve that problem here.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
a7c4f25
to
27e1ed0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly this is overkill and what we should do is rename
SessionData
toLocalOnlySessionData
.
I agree that sounds like an overkill, but I implemented your suggestion with separating out a separate struct.
I also added quick comments about the timezone issues to the proto definition.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/internal.go, line 539 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Gotcha. Well I think we should still keep a string location in
DataConversionConfig
and deserialize that similar to what we do with theEvalContext
protobuf. I think this PR should aim to replace that (the EvalContext protobuf) with this new SessionData protobuf. The nice thing is that we can use this protobuf natively for many fields. Unfortunatelytime.Location
is not one of those fields, so we'll just have to bite the bullet and find a different form of accessing it (moving it into theSessionData
non-protobuf struct and "deserializing" the string into the in-memory representation is ok I think)
Not everything in the eval context proto comes from the session data object - namely, StmtTimestamp
and TxnTimestamp
come from the eval context itself. It sounds wrong to me if we move those fields into the new sessiondatapb.SessionData
object.
Note that there are also other objects that do live in sessiondata.SessionData
but don't have a simple serialization, so I didn't move them either (in addition to the location one). Are you proposing that we move them as well?
To me the current layout kinda makes the most sense:
- "simple" fields from
sessiondata.SessionData
now live in easily serializable protobuf - "complicated" fields from
sessiondata.SessionData
continue to live in there and continue to have a custom serialization via the eval context proto - a few fields from the eval context are also serialized via the eval context proto.
It seems like you're suggesting that we wanna pretty much squash all the serialization into the same protobuf, but I don't like the idea of squashing the third category - timestamps don't have the session-level scope, so to me it seems wrong. Then, I also don't see much benefit in trying to combine the first and the second categories.
pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Actually it might be better to keep this as you had it. In the future, if we for some reason decide to not use this field, we can avoid the need to send this just to keep the protobuf code happy.
Done.
pkg/sql/parser/parse.go, line 98 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
No objections
Done.
pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Forgetting about time.Location which needs a special solution, if you're adding a field to either session data or the eval context, aren't you serializing that anyway and isn't the point to remove the EvalContext protobuf since it's an extra layer of translation that this PR attempts to resolve? I think this comment needs to be expanded to indicate when to choose which at least.
No, that wasn't my goal with this PR. The main goal I'm pursuing is making sure that all parameters that might influence the execution of the queries on the remote nodes are correctly propagated.
Moved some things around and added some comments.
pkg/sql/sessiondatapb/session_data.proto, line 46 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is the location string serialized anywhere or are we still relying on the eval context?
We're still relying on the eval context proto, and I don't think we want to get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 43 files at r1, 1 of 5 files at r2, 5 of 12 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/internal.go, line 539 at r1 (raw file):
Previously, yuzefovich wrote…
Not everything in the eval context proto comes from the session data object - namely,
StmtTimestamp
andTxnTimestamp
come from the eval context itself. It sounds wrong to me if we move those fields into the newsessiondatapb.SessionData
object.Note that there are also other objects that do live in
sessiondata.SessionData
but don't have a simple serialization, so I didn't move them either (in addition to the location one). Are you proposing that we move them as well?To me the current layout kinda makes the most sense:
- "simple" fields from
sessiondata.SessionData
now live in easily serializable protobuf- "complicated" fields from
sessiondata.SessionData
continue to live in there and continue to have a custom serialization via the eval context proto- a few fields from the eval context are also serialized via the eval context proto.
It seems like you're suggesting that we wanna pretty much squash all the serialization into the same protobuf, but I don't like the idea of squashing the third category - timestamps don't have the session-level scope, so to me it seems wrong. Then, I also don't see much benefit in trying to combine the first and the second categories.
In that case, I think the EvalContext
should only have non-session variables like the ones you mentioned, but I do think we should move the current session fields that are in EvalContext
into SessionData
(e.g. Location
).
The only reason they are in the eval context proto is because it was the only place they could be serialized. Since this PR tries to make that state of affairs better (i.e. not overload the eval context proto), these fields should also be moved into the session data proto. If not, it makes it very confusing to future developers where a new field should live. If we change that it's obvious, the answer becomes: "if the field is in the eval context and you need to serialize it, put it in the eval ctx proto, if the field is a local session data variable, put it in the session data struct, if you need to serialize it, put it in the session data protobuf"
To recap, I think it's fine to leave *Timestamp
in the eval context proto, but the "hard to serialize" fields should not be in the eval context proto and should instead be moved to the new session data proto.
pkg/sql/internal_test.go, line 359 at r3 (raw file):
}, LocalOnlySessionData: sessiondata.LocalOnlySessionData{ DisallowFullTableScans: true,
Why the addition?
27e1ed0
to
a61d741
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/internal.go, line 539 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
In that case, I think the
EvalContext
should only have non-session variables like the ones you mentioned, but I do think we should move the current session fields that are inEvalContext
intoSessionData
(e.g.Location
).The only reason they are in the eval context proto is because it was the only place they could be serialized. Since this PR tries to make that state of affairs better (i.e. not overload the eval context proto), these fields should also be moved into the session data proto. If not, it makes it very confusing to future developers where a new field should live. If we change that it's obvious, the answer becomes: "if the field is in the eval context and you need to serialize it, put it in the eval ctx proto, if the field is a local session data variable, put it in the session data struct, if you need to serialize it, put it in the session data protobuf"
To recap, I think it's fine to leave
*Timestamp
in the eval context proto, but the "hard to serialize" fields should not be in the eval context proto and should instead be moved to the new session data proto.
Alright, I moved everything session-related out of the eval context proto. I agree, it does look nicer - now we have a single place for the struct definition and for the custom logic for serialization.
pkg/sql/internal_test.go, line 359 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why the addition?
Nice catch, remnant of copy-pasting in the very first revision, fixed.
a61d741
to
c7e1c7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 43 files at r1, 4 of 12 files at r3, 13 of 13 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/sessiondata/session_data.go, line 220 at r4 (raw file):
// WARNING: consider whether a session parameter you're adding needs to // // be propagated to the remote nodes. If so, that parameter should live // // in SessionData struct above. //
nit: s/in SessionData/in the SessionData
This commit divides up all the session parameters into two categories: local only (those that don't influence the execution if not propagated to the remote nodes) and non-local (those that must be propagated). The former are extracted into a separate struct, and the latter are now mostly stored directly in a protobuf with a few exceptions having custom serialization/deserialization logic. This allows us to clean up the eval context proto and centralize the custom logic and the definition. This commit also fixed an issue of not propagating the default int size which - in theory - could influence the execution on the remote nodes (e.g. when performing a cast by an internal executor). Release note: None
c7e1c7b
to
ab19e5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)
Build succeeded: |
This commit divides up all the session parameters into two categories:
local only (those that don't influence the execution if not propagated
to the remote nodes) and non-local (those that must be propagated). The
former are extracted into a separate struct, and the latter are now
mostly stored directly in a protobuf with a few exceptions having custom
serialization/deserialization logic. This allows us to clean up the eval
context proto and centralize the custom logic and the definition.
This commit also fixed an issue of not propagating the default int size
which - in theory - could influence the execution on the remote nodes
(e.g. when performing a cast by an internal executor).
Fixes: #51075.
Release note: None