Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sessiondata: extract all non-local parameters into a protobuf #55569

Merged
merged 1 commit into from
Oct 22, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Oct 15, 2020

This commit divides up all the session parameters into two categories:
local only (those that don't influence the execution if not propagated
to the remote nodes) and non-local (those that must be propagated). The
former are extracted into a separate struct, and the latter are now
mostly stored directly in a protobuf with a few exceptions having custom
serialization/deserialization logic. This allows us to clean up the eval
context proto and centralize the custom logic and the definition.

This commit also fixed an issue of not propagating the default int size
which - in theory - could influence the execution on the remote nodes
(e.g. when performing a cast by an internal executor).

Fixes: #51075.

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Oct 15, 2020
@yuzefovich yuzefovich requested a review from a team October 15, 2020 02:08
@yuzefovich yuzefovich requested a review from a team as a code owner October 15, 2020 02:08
@yuzefovich yuzefovich requested review from pbardea and removed request for a team October 15, 2020 02:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich removed request for a team and pbardea October 15, 2020 02:08
@yuzefovich yuzefovich force-pushed the sessiondata-proto branch 2 times, most recently from 4ae021f to 2654bc4 Compare October 15, 2020 03:15
@yuzefovich yuzefovich changed the title WIP sessiondata: extract many fields into a protobuf sessiondata: extract many fields into a protobuf Oct 15, 2020
@yuzefovich yuzefovich requested review from asubiotto and a team October 15, 2020 03:16
@yuzefovich
Copy link
Member Author

I didn't extract all the "simple" fields, so currently this PR doesn't close the issue, yet it does solve a problem with default int size propagation and makes a step in the right direction. I've audited all settings, and there were no other omissions that I found (i.e. all others influence only pre-distributed-execution stages), so I think it should be good enough for now. Thoughts?

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Oct 15, 2020
@yuzefovich yuzefovich force-pushed the sessiondata-proto branch 2 times, most recently from b7b61f3 to 0e3822e Compare October 15, 2020 04:24
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. What is left in to do in the issue that this PR doesn't address?
  2. Why not move all fields into the protobuf? I guess there's no point for local-only session settings.

Reviewed 32 of 43 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/internal.go, line 539 at r1 (raw file):

	_ []pgwirebase.FormatCode,
	_ sessiondatapb.DataConversionConfig,
	_ *time.Location,

Was this added because this was moved out of DataConversionConfig? What's the reasoning?


pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):

  optional string temporary_schema_name = 13 [(gogoproto.nullable) = false];
  reserved 14;
  optional sessiondatapb.SessionData session_data = 15 [(gogoproto.nullable) = false];

Why is this optional?


pkg/sql/parser/parse.go, line 98 at r1 (raw file):

// of how a "naked" INT type should be parsed returns the corresponding integer
// type.
func NakedIntTypeFromDefaultIntSize(defaultIntSize int32) *types.T {

What's the reasoning behind putting this in the parser package?


pkg/sql/sessiondata/session_data.go, line 26 at r1 (raw file):

// A SQL Session changes fields in SessionData through sql.sessionDataMutator.
type SessionData struct {
	sessiondatapb.SessionData

Comment please


pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):

	// WARNING: consider whether a session parameter you're adding needs to  //
	// be propagated to the remote nodes. If so, consider adding it to       //
	// sessiondatapb.SessionData (preferred) or execinfrapb.EvalContext.     //

I would remove a mention of EvalContext. If we're adding session data, that should always live in sessiondatapb.SessionData, right?


pkg/sql/sessiondatapb/session_data.proto, line 11 at r1 (raw file):

// licenses/APL.txt.

syntax = "proto2";

Why not make this proto3?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There are many fields in sessiondata.SessionData object that can be easily moved into sessiondatapb.SessionData (because most of them are ints or bools), so I think in order to close the issue we should decide to either move them or ignore them completely.
  2. Yes, there is not much benefit to moving local-only settings, and it'd be more work. I do think that in the long run it would be beneficial to move as much as possible into the protobuf, so that instead of thinking whether a setting is local-only or not, we would be adding new things to the protobuf by default (unless it is not a "simple" field).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/internal.go, line 539 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Was this added because this was moved out of DataConversionConfig? What's the reasoning?

Yes. Location is serialized as string, but then we're operation on *time.Location probably in order to cache the result of a lookup. This field requires "custom" logic to deserialize it, so I moved it out of DataConversionConfig. Do you know a way to not do that?


pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why is this optional?

Hm, good point - I didn't realize there is a required keyword, I thought that only optional and repeated are there, updated.


pkg/sql/parser/parse.go, line 98 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What's the reasoning behind putting this in the parser package?

Hm, I guess because defaultNakedIntType is defined here. Do you have a better suggestion? Currently the method is called from sql and execinfrapb packages, and the method is related to parsing - it determines how to parse ::INT casts, so the current location seems reasonable to me.


pkg/sql/sessiondata/session_data.go, line 26 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Comment please

Done.


pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I would remove a mention of EvalContext. If we're adding session data, that should always live in sessiondatapb.SessionData, right?

Not everything can be easily serialized - that *time.Location is an example. Hm, I think it is possible to have custom Marshal/Unmarshal methods, but I don't know how to store *time.Location which doesn't seem to have a protobuf representation in a proto struct.


pkg/sql/sessiondatapb/session_data.proto, line 11 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why not make this proto3?

No reason, just followed the example from api.proto, updated.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a field doesn't need to be sent to a remote node, it doesn't make sense to have it in the protobuf since it produces unnecessary serialization overhead so in my opinion this PR does enough (although I'd like to see the EvalContext proto disappear if possible). Maybe we could find a way to force an engineer to mark a field as local only, maybe using an init-time check with reflection that checks some map keyed by the field names? If someone adds a field they would add it to this map to indicate that this field is local-only. Possibly this is overkill and what we should do is rename SessionData to LocalOnlySessionData

Reviewed 1 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/internal.go, line 539 at r1 (raw file):

Previously, yuzefovich wrote…

Yes. Location is serialized as string, but then we're operation on *time.Location probably in order to cache the result of a lookup. This field requires "custom" logic to deserialize it, so I moved it out of DataConversionConfig. Do you know a way to not do that?

Gotcha. Well I think we should still keep a string location in DataConversionConfig and deserialize that similar to what we do with the EvalContext protobuf. I think this PR should aim to replace that (the EvalContext protobuf) with this new SessionData protobuf. The nice thing is that we can use this protobuf natively for many fields. Unfortunately time.Location is not one of those fields, so we'll just have to bite the bullet and find a different form of accessing it (moving it into the SessionData non-protobuf struct and "deserializing" the string into the in-memory representation is ok I think)


pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):

Previously, yuzefovich wrote…

Hm, good point - I didn't realize there is a required keyword, I thought that only optional and repeated are there, updated.

Actually it might be better to keep this as you had it. In the future, if we for some reason decide to not use this field, we can avoid the need to send this just to keep the protobuf code happy.


pkg/sql/parser/parse.go, line 98 at r1 (raw file):

Previously, yuzefovich wrote…

Hm, I guess because defaultNakedIntType is defined here. Do you have a better suggestion? Currently the method is called from sql and execinfrapb packages, and the method is related to parsing - it determines how to parse ::INT casts, so the current location seems reasonable to me.

No objections


pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):

Previously, yuzefovich wrote…

Not everything can be easily serialized - that *time.Location is an example. Hm, I think it is possible to have custom Marshal/Unmarshal methods, but I don't know how to store *time.Location which doesn't seem to have a protobuf representation in a proto struct.

Forgetting about time.Location which needs a special solution, if you're adding a field to either session data or the eval context, aren't you serializing that anyway and isn't the point to remove the EvalContext protobuf since it's an extra layer of translation that this PR attempts to resolve? I think this comment needs to be expanded to indicate when to choose which at least.


pkg/sql/sessiondatapb/session_data.proto, line 46 at r2 (raw file):

// DataConversionConfig contains the parameters that influence the conversion
// between SQL data types and strings/byte arrays.
message DataConversionConfig {

Is the location string serialized anywhere or are we still relying on the eval context?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the drive-by review, I was monitoring the convo as I am interested in the end result.

Regarding locations: we have a general problem which is that the time zone database may not be the same on every node. So if we rely on the Location setting for computational results (I assume it's used for timestamp conversations) just transporting the name of the timezone and looking it back up from the filesystem on the target node does not guarantee stability.

I wanted to raise this because the general thrust behind this PR was to once and for all guarantee a "consistent input state" between the gateway and the remote processors, so they all operate on the same data. This escape to the node-specific timezone database breaks this guarantee.

I don't know if it's possible to actually serialize a time.Location object completely. Maybe we can capture the computed time offset? (I don't know if the offset is sufficient. For example if we convert timestamps in the past, a different location-specific offset applies. I think a location is in fact a more complex data structure.)

This discussion intimately relates to issue #36864 and @bdarnell 's comment here.

I'm not sure we can fully address this in this PR (you'd need to find a way to effectively serialize time.Location. Is that possible?) If not, there should be a clear explanation in the protobuf definition that points to issue #36864 with an outline of the problem to solve, and a comment on the issue thread for #36864 on github that referes back to the sessiondata proto so that the next person to work on this area thinks about looking here too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can capture the computed time offset? (I don't know if the offset is sufficient. For example if we convert timestamps in the past, a different location-specific offset applies. I think a location is in fact a more complex data structure.)

A time.Location is more than an offset - it includes historical and future rules including daylight savings changes. (It's the DST changes that are responsible for most of the churn and complexity)

There is no interface in the time package for serializing or deserializing a location, but that's exactly what the zoneinfo.zip file is (or one of the small per-zone files within it), so in theory we could ship that around (both in the session state and in the eval context). But I don't think duplicating this data into the session state and eval context is the right approach. I'd think of it as something more like a global descriptor to be leased by the sessions.

In any case, I don't think this adds significantly to the timezone issues we already have so I don't think we need to solve that problem here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

@yuzefovich yuzefovich force-pushed the sessiondata-proto branch 2 times, most recently from a7c4f25 to 27e1ed0 Compare October 20, 2020 01:48
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly this is overkill and what we should do is rename SessionData to LocalOnlySessionData.

I agree that sounds like an overkill, but I implemented your suggestion with separating out a separate struct.

I also added quick comments about the timezone issues to the proto definition.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/internal.go, line 539 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Gotcha. Well I think we should still keep a string location in DataConversionConfig and deserialize that similar to what we do with the EvalContext protobuf. I think this PR should aim to replace that (the EvalContext protobuf) with this new SessionData protobuf. The nice thing is that we can use this protobuf natively for many fields. Unfortunately time.Location is not one of those fields, so we'll just have to bite the bullet and find a different form of accessing it (moving it into the SessionData non-protobuf struct and "deserializing" the string into the in-memory representation is ok I think)

Not everything in the eval context proto comes from the session data object - namely, StmtTimestamp and TxnTimestamp come from the eval context itself. It sounds wrong to me if we move those fields into the new sessiondatapb.SessionData object.

Note that there are also other objects that do live in sessiondata.SessionData but don't have a simple serialization, so I didn't move them either (in addition to the location one). Are you proposing that we move them as well?

To me the current layout kinda makes the most sense:

  • "simple" fields from sessiondata.SessionData now live in easily serializable protobuf
  • "complicated" fields from sessiondata.SessionData continue to live in there and continue to have a custom serialization via the eval context proto
  • a few fields from the eval context are also serialized via the eval context proto.

It seems like you're suggesting that we wanna pretty much squash all the serialization into the same protobuf, but I don't like the idea of squashing the third category - timestamps don't have the session-level scope, so to me it seems wrong. Then, I also don't see much benefit in trying to combine the first and the second categories.


pkg/sql/execinfrapb/api.proto, line 78 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Actually it might be better to keep this as you had it. In the future, if we for some reason decide to not use this field, we can avoid the need to send this just to keep the protobuf code happy.

Done.


pkg/sql/parser/parse.go, line 98 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

No objections

Done.


pkg/sql/sessiondata/session_data.go, line 141 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Forgetting about time.Location which needs a special solution, if you're adding a field to either session data or the eval context, aren't you serializing that anyway and isn't the point to remove the EvalContext protobuf since it's an extra layer of translation that this PR attempts to resolve? I think this comment needs to be expanded to indicate when to choose which at least.

No, that wasn't my goal with this PR. The main goal I'm pursuing is making sure that all parameters that might influence the execution of the queries on the remote nodes are correctly propagated.

Moved some things around and added some comments.


pkg/sql/sessiondatapb/session_data.proto, line 46 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is the location string serialized anywhere or are we still relying on the eval context?

We're still relying on the eval context proto, and I don't think we want to get rid of it.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 43 files at r1, 1 of 5 files at r2, 5 of 12 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/internal.go, line 539 at r1 (raw file):

Previously, yuzefovich wrote…

Not everything in the eval context proto comes from the session data object - namely, StmtTimestamp and TxnTimestamp come from the eval context itself. It sounds wrong to me if we move those fields into the new sessiondatapb.SessionData object.

Note that there are also other objects that do live in sessiondata.SessionData but don't have a simple serialization, so I didn't move them either (in addition to the location one). Are you proposing that we move them as well?

To me the current layout kinda makes the most sense:

  • "simple" fields from sessiondata.SessionData now live in easily serializable protobuf
  • "complicated" fields from sessiondata.SessionData continue to live in there and continue to have a custom serialization via the eval context proto
  • a few fields from the eval context are also serialized via the eval context proto.

It seems like you're suggesting that we wanna pretty much squash all the serialization into the same protobuf, but I don't like the idea of squashing the third category - timestamps don't have the session-level scope, so to me it seems wrong. Then, I also don't see much benefit in trying to combine the first and the second categories.

In that case, I think the EvalContext should only have non-session variables like the ones you mentioned, but I do think we should move the current session fields that are in EvalContext into SessionData (e.g. Location).

The only reason they are in the eval context proto is because it was the only place they could be serialized. Since this PR tries to make that state of affairs better (i.e. not overload the eval context proto), these fields should also be moved into the session data proto. If not, it makes it very confusing to future developers where a new field should live. If we change that it's obvious, the answer becomes: "if the field is in the eval context and you need to serialize it, put it in the eval ctx proto, if the field is a local session data variable, put it in the session data struct, if you need to serialize it, put it in the session data protobuf"

To recap, I think it's fine to leave *Timestamp in the eval context proto, but the "hard to serialize" fields should not be in the eval context proto and should instead be moved to the new session data proto.


pkg/sql/internal_test.go, line 359 at r3 (raw file):

				},
				LocalOnlySessionData: sessiondata.LocalOnlySessionData{
					DisallowFullTableScans: true,

Why the addition?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/internal.go, line 539 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

In that case, I think the EvalContext should only have non-session variables like the ones you mentioned, but I do think we should move the current session fields that are in EvalContext into SessionData (e.g. Location).

The only reason they are in the eval context proto is because it was the only place they could be serialized. Since this PR tries to make that state of affairs better (i.e. not overload the eval context proto), these fields should also be moved into the session data proto. If not, it makes it very confusing to future developers where a new field should live. If we change that it's obvious, the answer becomes: "if the field is in the eval context and you need to serialize it, put it in the eval ctx proto, if the field is a local session data variable, put it in the session data struct, if you need to serialize it, put it in the session data protobuf"

To recap, I think it's fine to leave *Timestamp in the eval context proto, but the "hard to serialize" fields should not be in the eval context proto and should instead be moved to the new session data proto.

Alright, I moved everything session-related out of the eval context proto. I agree, it does look nicer - now we have a single place for the struct definition and for the custom logic for serialization.


pkg/sql/internal_test.go, line 359 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why the addition?

Nice catch, remnant of copy-pasting in the very first revision, fixed.

@yuzefovich yuzefovich changed the title sessiondata: extract many fields into a protobuf sessiondata: extract all non-local parameters into a protobuf Oct 20, 2020
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 43 files at r1, 4 of 12 files at r3, 13 of 13 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/sessiondata/session_data.go, line 220 at r4 (raw file):

	// WARNING: consider whether a session parameter you're adding needs to  //
	// be propagated to the remote nodes. If so, that parameter should live  //
	// in SessionData struct above.                                          //

nit: s/in SessionData/in the SessionData

This commit divides up all the session parameters into two categories:
local only (those that don't influence the execution if not propagated
to the remote nodes) and non-local (those that must be propagated). The
former are extracted into a separate struct, and the latter are now
mostly stored directly in a protobuf with a few exceptions having custom
serialization/deserialization logic. This allows us to clean up the eval
context proto and centralize the custom logic and the definition.

This commit also fixed an issue of not propagating the default int size
which - in theory - could influence the execution on the remote nodes
(e.g. when performing a cast by an internal executor).

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)

@craig
Copy link
Contributor

craig bot commented Oct 22, 2020

Build succeeded:

@craig craig bot merged commit dd647f5 into cockroachdb:master Oct 22, 2020
@yuzefovich yuzefovich deleted the sessiondata-proto branch October 22, 2020 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sessiondata: extract some fields of SessionData into a protobuf
5 participants