-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Store instance ID and network addresses of SQL pods #66600
Conversation
cb9ca6a
to
ed128ae
Compare
ed128ae
to
d9e34fc
Compare
d9e34fc
to
f684bac
Compare
f684bac
to
a61567d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment from yesterday still stands, and I'd like to hear what you think about the node ID vs instance ID distinction. It'd probably help David to know.
Also from that last comment:
Also nit for this PR, throughout the change: all comments need to be full sentences with a capital at the beginning and a period at the end.
Also acronyms like "HTTP", "ID" etc need to be capitalized.
(including in commit messages!)
Also it seems you've broken the last commit by doing a global search and replace "instance" -> "Instance". Many of these need to be reverted.
Reviewed 31 of 31 files at r1, 8 of 8 files at r2, 23 of 23 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instance.go, line 19 at r2 (raw file):
type instance struct { id base.SQLInstanceID
id -> instanceID
pkg/sql/sqlinstance/instancestorage/instance.go, line 28 at r3 (raw file):
// NewSQLInstance constructs a new instancestorage.Instance. func NewSQLInstance( id base.SQLInstanceID, httpAddr string, sessionID sqlliveness.SessionID,
id -> instanceID
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 80 at r2 (raw file):
// Start initiates the instancestorage internals. // TODO(rima): Add rangefeed functionality
nit: no need to break this line into two.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 91 at r3 (raw file):
} // CreateInstance allocates a unique Instance identifier for the
here and all the comments below, I think the capital was not needed in this place.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 119 at r3 (raw file):
return err } // If Instance id is available, reuse it.
ditto for "instance" but id -> ID
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 133 at r3 (raw file):
value, err := s.rowcodec.encodeRow(instance) if err != nil { log.Warningf(ctx, "failed to encode row for Instance id %d: %v", instance.id, err)
no capital in warnings
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 144 at r3 (raw file):
} // GetInstanceAddr retrieves the network address for an Instance
no capitals needed here
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 162 at r3 (raw file):
} if instanceExists := row.Value != nil; !instanceExists { return errors.New("non existent Instance")
no capitals in errors
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go, line 38 at r2 (raw file):
) func TestStorage(t *testing.T) {
It's customary to add a comment to explain what a test does and does not exercise.
pkg/sql/sqlinstance/instancestorage/test_helpers.go, line 90 at r3 (raw file):
// GetInstanceAddr implements the instanceprovider.storage interface func (f *FakeStorage) GetInstanceAddr( _ context.Context, id base.SQLInstanceID,
id -> instanceID
pkg/sql/sqlliveness/sqlliveness.go, line 71 at r3 (raw file):
// are known to be valid. // // See discussion in Open Questions in
I think it's worth summarizing what the status of the discussion here instead of fulling deferring to the external document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should update the PR description to be more verbose and wrapped like the commits because bors picks that up in the merge commit message.
Reviewed 31 of 31 files at r1, 8 of 8 files at r2, 23 of 23 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)
pkg/migration/migrations/migrations.go, line 70 at r1 (raw file):
deleteDeprecatedNamespaceTableDescriptorMigration), migration.NewTenantMigration( "add the systems.sql_instances table",
should this be singular system
?
pkg/sql/logictest/testdata/logic_test/ranges, line 314 at r1 (raw file):
[175] /Table/39 [176] /Table/40 system sqlliveness · {1} 1 [176] /Table/40 [177] /Table/41 system migrations · {1} 1 [177] /Table/41 [178] /Table/42 system join_tokens · {1} 1
Minor: not sure what this means but is it significant that the table ID and ranges change for the existing join_tokens
row?
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 37 at r2 (raw file):
type Storage struct { codec keys.SQLCodec stopper *stop.Stopper
I don't see usage of the stopper
in this file. Am I missing something?
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 63 at r2 (raw file):
cacheConfig := cache.Config{ Policy: cache.CacheLRU, ShouldEvict: func(size int, key, value interface{}) bool {
nit: can these args just be _
?
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 161 at r2 (raw file):
return err } if instanceExists := row.Value != nil; !instanceExists {
nit: do we need instanceExists
at all if this is just a conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
// ShutdownSQLInstance shuts down the sqlinstance.Instance. func (p *provider) ShutdownSQLInstance(ctx context.Context) {
@rimadeodhar does this need to be invoked when a node is inadvertently shut down? I was just trying this out locally and my tenant panic-ed a few times and left stale entries in the storage layer. Then when I start a new tenant up and it calls GetAllInstancesForTenant
i see multiple results with the same http_addr
entry and different IDs:
Or is there a different mechanism that can clean these up? Or maybe I should filter them on my end when iterating through the instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment from yesterday still stands, and I'd like to hear what you think about the node ID vs instance ID distinction. It'd probably help David to know.
Thanks for carrying this comment over from the prototype PR! I'm inclined to keep them separate because they refer to concepts that are semantically different. Node ids are for KV nodes that are in effect permanent and never reused. Whereas instance ids are unique identifiers for ephemeral SQL instances and can be reused. So it feels like they should be kept separate in code too to make that distinction clear.
Also it seems you've broken the last commit by doing a global search and replace "instance" -> "Instance". Many of these need to be reverted.
Aah yes, I did a refactor to rename the struct in GoLand and looks like it modified the comments too. I will fix these.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
@rimadeodhar does this need to be invoked when a node is inadvertently shut down? I was just trying this out locally and my tenant panic-ed a few times and left stale entries in the storage layer. Then when I start a new tenant up and it calls
GetAllInstancesForTenant
i see multiple results with the samehttp_addr
entry and different IDs:Or is there a different mechanism that can clean these up? Or maybe I should filter them on my end when iterating through the instances.
This is interesting. Currently, the only time this will be invoked is when the clearSession call within the session provider system detects an expired session. But it seems like instance ids should be released in such cases too. I wonder what would be the best way to accomplish this so that we don't have multiple instance ids associated with the same http address. @ajwerner wondering what would be a good way by which I can invoke shutdownSqlInstance when the tenant/SQL server shuts down uncleanly? Not sure if I can rely on session expiry here since it seems like those session entries still continue to be in the sqlliveness subsystem until they expire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
This is interesting. Currently, the only time this will be invoked is when the clearSession call within the session provider system detects an expired session. But it seems like instance ids should be released in such cases too. I wonder what would be the best way to accomplish this so that we don't have multiple instance ids associated with the same http address. @ajwerner wondering what would be a good way by which I can invoke shutdownSqlInstance when the tenant/SQL server shuts down uncleanly? Not sure if I can rely on session expiry here since it seems like those session entries still continue to be in the sqlliveness subsystem until they expire?
Huh, that's a fun one. I think that there's (at least) two options. One option is to find if there are address conflicts when writing and then decide that you can delete it if you know better. Another is to not try to prevent the conflict but tie-break in some form or another. I think I prefer the second approach. We could keep track of the timestamp at which the row was written without much work.
I guess it only matters for the use-case where you get all of the addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, ajwerner wrote…
Huh, that's a fun one. I think that there's (at least) two options. One option is to find if there are address conflicts when writing and then decide that you can delete it if you know better. Another is to not try to prevent the conflict but tie-break in some form or another. I think I prefer the second approach. We could keep track of the timestamp at which the row was written without much work.
I guess it only matters for the use-case where you get all of the addresses.
Yes, it only matters for this particular use case. I like the timestamp approach, that should be easy enough and will take care of the GetAddressesForAllInstances. I can fix it in this PR. But now I'm thinking on whether we should also have a background job to regularly clean out the sql_instances table of old instance ids that are potentially tied to expired sessions which were not cleaned out by clearSession for any reason. I could do that in a follow up PR and add a TODO for that here.
Start() | ||
} | ||
|
||
type provider struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to have a block of commentary here explaining what a "provider" is and how to use one. Or, if you have those comments elsewhere, a reference to those is good too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
I don't understand why having duplicate addresses in stale records is a problem?
We also have this with regular KV nodes, if you remove a node then add another one with the same address, it's going to be a new node ID with the same address, and the previous record is still there (with the same address, different node ID) side-by-side, albeit in a deactivated/decommissioned state.
Again can you explain why this is a problem?
maybe I should filter them on my end when iterating through the instances.
Regardless of duplicate address, yes it's important to filter dead instances out when iterating (so as to prevent connection errors, or worse, TCP connection timeouts, on dead instances).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 32 at r2 (raw file):
Quoted 8 lines of code…
// SQLInstance exposes information on a SQL // instance such as ID, network address and // the associated sqlliveness Session. type SQLInstance interface { InstanceID() base.SQLInstanceID InstanceAddr() string SessionID() sqlliveness.SessionID }
This feels like plain-old-data. I'm not so sure that this should be an interface. I might call this something like sqlinstance.Metadata
or sqlinstance.Info
and have it be a struct.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
We also have this with regular KV nodes, if you remove a node then add another one with the same address, it's going to be a new node ID with the same address, and the previous record is still there (with the same address, different node ID) side-by-side, albeit in a deactivated/decommissioned state.
There are two differences here. The biggest is that if a KV node restarts on the same host with the same address and disk, it'll use the same node ID whereas this process will grab a new instance ID. That makes the scenario seemingly more common. Another difference is the timescales of these liveness mechanism. The sqlliveness layer defaults to 1 minute as opposed to the ~10s of node liveness. That difference is not fundamental. .
Regardless of duplicate address, yes it's important to filter dead instances out when iterating (so as to prevent connection errors, or worse, TCP connection timeouts, on dead instances).
I agree. This should be done by injecting a sqlliveness.Reader
to filter entries which are not live.
One problem with the implementation as it currently exists is that if we don't know whether a session is alive then we'll go read, synchronously. It's probably not ideal to have the RPCs which are scanning addresses go read synchronously. I think it should be pretty trivial to make a non-blocking implementation of Reader
which defaults to telling you that a session is live if it does not know and then performs the lookup async or to extend the interface to expose a method that implies an async lookup.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 46 at r2 (raw file):
// liveInstances caches information on all // live instances for a tenant. liveInstances *cache.UnorderedCache
My sense is that the layering here could be clearer. I like that you've implemented the storage layer as its own component. I don't think this struct needs to be involved in caching. We're going to do something smarter for caching above with the rangefeed and we can use the sqlliveness.Reader
to determine whether claims are now invalid. Just having this package serve as the raw access to the durable state seems good.
What I'd suggest is that you make this layer a dependency of the "provider" which manages the current node's instance and of a "reader" which manages tracking the whole set. In the first pass, the "reader" just reads from the storage on every call (no caching) but does do filtering based on the liveness of the relevant sessions. Then, after that, we can add rangefeeds to bootstrap and watch the cache as well as some cleanup logic. What do you think?
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 183 at r2 (raw file):
// GetAllInstancesForTenant retrieves instance information // on all active instances for the tenant. func (s *Storage) GetAllInstancesForTenant(
I don't think that this needs to reference Tenant
.
pkg/sql/sqlliveness/sqlliveness.go, line 74 at r3 (raw file):
// http://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20200615_sql_liveness.md Expiration() hlc.Timestamp RegisterCallbackForSessionExpiry(func(ctx context.Context))
This needs documentation. Also, I think you should support more than one, potentially with unregistration support.
pkg/sql/sqlliveness/slinstance/slinstance.go, line 73 at r3 (raw file):
func (s *session) RegisterCallbackForSessionExpiry(sExp func(context.Context)) { s.sessionExpiry = sExp
We need to think about locking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
This should be done by injecting a sqlliveness.Reader to filter entries which are not live.
Appreciate the discussion. I'll move ahead w/ this implementation on my fan-out PR.
One problem with the implementation as it currently exists is that if we don't know whether a session is alive then we'll go read, synchronously. It's probably not ideal to have the RPCs which are scanning addresses go read synchronously. I think it should be pretty trivial to make a non-blocking implementation of Reader which defaults to telling you that a session is live if it does not know and then performs the lookup async or to extend the interface to expose a method that implies an async lookup.
I'm not familiar w/ the liveness implementation, but is it possible to support a cache similar to the one in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
I'm not familiar w/ the liveness implementation, but is it possible to support a cache similar to the one in this PR?
It already has all the caching you'd want. The only problem is that it synchronously goes and reads when it doesn't know the answer. We'd want a way to tell it to do that async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, ajwerner wrote…
I'm not familiar w/ the liveness implementation, but is it possible to support a cache similar to the one in this PR?
It already has all the caching you'd want. The only problem is that it synchronously goes and reads when it doesn't know the answer. We'd want a way to tell it to do that async.
Here's a PR that exposes what I think you want: #67075
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dhartunian, and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, ajwerner wrote…
Here's a PR that exposes what I think you want: #67075
@knz: Another problem with keeping instance ids tied up to deactivated state is that we cannot reuse them. This is not a problem for node ids which we don't reuse. Which is why I wonder whether we also need a background cleanup thread that periodically resets http_addr and session information for dead instances.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 46 at r2 (raw file):
Previously, ajwerner wrote…
My sense is that the layering here could be clearer. I like that you've implemented the storage layer as its own component. I don't think this struct needs to be involved in caching. We're going to do something smarter for caching above with the rangefeed and we can use the
sqlliveness.Reader
to determine whether claims are now invalid. Just having this package serve as the raw access to the durable state seems good.What I'd suggest is that you make this layer a dependency of the "provider" which manages the current node's instance and of a "reader" which manages tracking the whole set. In the first pass, the "reader" just reads from the storage on every call (no caching) but does do filtering based on the liveness of the relevant sessions. Then, after that, we can add rangefeeds to bootstrap and watch the cache as well as some cleanup logic. What do you think?
I think that seems reasonable. I'll restructure this to split the read functionality into a separate reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dhartunian, and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Which is why I wonder whether we also need a background cleanup thread that periodically resets http_addr and session information for dead instances.
I don't think I understand this. We will definitely need something to revoke instances from dead sessions. There are different shapes that that can take. My first pass would be to have that happen when looking for a new instance ID. I suspect that you could drive any other cleanup off of the caching layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dhartunian, and @rimadeodhar)
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, ajwerner wrote…
Which is why I wonder whether we also need a background cleanup thread that periodically resets http_addr and session information for dead instances.
I don't think I understand this. We will definitely need something to revoke instances from dead sessions. There are different shapes that that can take. My first pass would be to have that happen when looking for a new instance ID. I suspect that you could drive any other cleanup off of the caching layer.
Yes, thats what I meant. That we need to ensure that we have a mechanism to reuse instance ids from dead sessions esp. if they haven't been shut down cleanly. Background thread would just be one way of doing it. Doing it at startup or within the caching layer should be fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dhartunian, and @rimadeodhar)
pkg/sql/catalog/systemschema/system.go, line 375 at r1 (raw file):
CREATE TABLE system.sql_instances ( id INT NOT NULL PRIMARY KEY, http_addr STRING,
Had a realization as I was integrating with this code. We should be storing the sql/grpc address here, not http. The fan-out happens over GRPC with non-tenant nodes so I'm assuming we'll be keeping the same architecture on the tenant.
My intention in the fan-out PR is to add multiplexed GRPC on the SQL port just like we do on the dedicated clusters so the instances will need to record that address to facilitate fan-out.
For example: the existing dialNode
function relies on AddressForLocality
to get an addr to dial and that call uses and Address
field on NodeDescriptor
. https://github.com/cockroachdb/cockroach/blob/master/pkg/roachpb/metadata.go#L563-L581
@rimadeodhar maybe the best thing to do here is to call it addr
and not http_addr
and set it to be the SQL address in your PR.
acadef3
to
400cf04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/server/server_sql.go, line 542 at r18 (raw file):
Previously, knz (kena) wrote…
see my comment above about ensuring the container is always allocated.
Done.
pkg/server/server_sql.go, line 765 at r18 (raw file):
Previously, knz (kena) wrote…
ditto
Done.
pkg/server/server_sql.go, line 769 at r18 (raw file):
Previously, knz (kena) wrote…
Maybe actually the
reporter
should store a reference to the container, so that it can pick up the initialization even if it occurs late.
The reporter holds a reference to the InstanceID method in any case, so it will work fine now that the nodeIDContainer is always initialized.
pkg/server/server_sql.go, line 844 at r18 (raw file):
Previously, knz (kena) wrote…
See discussion above: if all these data structures hold a reference to the already-defined container, then it becomes possible to initialize the instance ID inside the container with just 1 call and all the other structures will pick it up directly.
Done.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 90 at r18 (raw file):
Previously, knz (kena) wrote…
log.Ops.Infof
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 190 at r18 (raw file):
Previously, ajwerner wrote…
// getAllInstanceRows decodes and returns all instance rows from the system.sql_instances table func (s *Storage) getAllInstanceRows( ctx context.Context, txn *kv.Txn, ) (instances []instancerow, _ error) {
I don't think I get the purpose of paginating if you're going to buffer them all in ram without any filtering.
I have removed the pagination since the data size will likely be small anyway.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 195 at r18 (raw file):
Previously, ajwerner wrote…
I learned recently about
txn.Iterate
, it may be valuable.
I'm no longer using pagination so I'm using txn.Scan but thanks for the reference to txn.Iterate. This method looks useful.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 205 at r18 (raw file):
Previously, knz (kena) wrote…
Don't call
String()
here. This will muck up with redactability of the key.
Use%v
as format.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage_test.go, line 38 at r2 (raw file):
Previously, knz (kena) wrote…
I don'tsee it?
Not sure why it doesn't show up, if you compare the file version with base you should be able to see it.
pkg/jobs/registry.go, line 166 at r18 (raw file):
Previously, knz (kena) wrote…
yes
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 26 of 58 files at r19, 1 of 16 files at r21, 4 of 28 files at r22.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 30 at r15 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
This was intentional to keep the instanceID fields read-only. Since instanceID fields should be initialized once and never edited after, I kept the visibility of this field at instancestorage package level to make that explicit. This way only instancestorage package can write to them which is how it should be.
As for exposing sessionID, I'm not sure if any upstream code would need access to that. I have used it in some of my tests, but I can always move the getter to a test helper to avoid leaking it elsewhere.
I remain a tad skeptical but I defer to you. Generally there's an understanding that a struct which has public fields passed around by value is itself a value.
pkg/sql/sqlinstance/sqlinstance.go, line 35 at r22 (raw file):
// AddressResolver exposes API for retrieving the instance address and all live instances for a tenant. type AddressResolver interface { GetInstanceAddr(context.Context, base.SQLInstanceID) (string, error)
Is there some error that indicates that such an instance does not exist? Might want to put that in the API.
pkg/sql/sqlinstance/sqlinstance.go, line 51 at r22 (raw file):
Quoted 4 lines of code…
// NewSQLInstanceInfo constructs a new InstanceInfo struct. func NewSQLInstanceInfo( instanceID base.SQLInstanceID, addr string, sessionID sqlliveness.SessionID, ) *InstanceInfo {
nit: this is a small struct that, as you noted, should never be modified after it has been constructed. Should we just pass it around by value and call this make?
pkg/sql/sqlinstance/test_helpers.go, line 17 at r22 (raw file):
// GetSessionIDForTest returns the sqlliveness.SessionID associated with the SQL Instance // for test purposes. func (i *InstanceInfo) GetSessionIDForTest() sqlliveness.SessionID {
I think we're going to want to end up exposing this.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 79 at r22 (raw file):
Quoted 10 lines of code…
p.mu.Lock() defer p.mu.Unlock() select { case <-p.stopper.ShouldQuiesce(): return base.SQLInstanceID(0), stop.ErrUnavailable case <-ctx.Done(): return base.SQLInstanceID(0), ctx.Err() default: } p.mu.initOnce.Do(func() {
something in all of this synchronization smells. I don't think you need this all. What's especially alarming is that that call to Session
might block and go to the network, in the meantime, subsequent callers will block on the mutex. Also, if there is an error, then subsequent callers will not see it. What about something like:
type provider struct {
sqlinstance.AddressResolver
storage writer
stopper *stop.Stopper
initOnce sync.Once
initialized chan struct{}
// below fields are populated before the above channel is closed.
instanceID base.SQLInstanceID
err error
}
func (p *provider) Instance(ctx context.Context) (base.SQLInstanceID, error) {
p.maybeInitialize()
select {
case <-ctx.Done():
return 0, ctx.Err()
case <-p.initialized:
return p.instanceID, p.initializationErr
}
}
func (p *provider) maybeInitialize() {
p.initOnce.Do(func() {
ctx := context.Background() // TODO: add log tags
if err := p.stopper.RunAsyncTask(ctx, "initialize-instance", func(ctx context.Context) {
p.err = p.initialize(ctx)
close(p.initialized)
}); err != nil {
p.err = err
close(p.initialized)
}
})
}
func (p *provider) initialize(ctx context.Context) (err error) {
// Create a new instance if instanceInfo has not been initialized.
session, err := p.mu.session.Session(ctx)
if err != nil {
return errors.Wrap(err, "constructing session")
}
instanceID, err := p.storage.CreateInstance(ctx, session.ID(), p.mu.instanceAddr)
if err != nil {
return errors.Wrap(err, "creating instance")
}
p.instanceInfo = sqlinstance.NewSQLInstanceInfo(instanceID, p.mu.instanceAddr, session.ID())
return nil
}
Anyway, I'm not saying use the above code verbatim, it doesn't deal with shutdown, but this code as is does not seem like the right balance of synchronization.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 51 at r22 (raw file):
} if !sessionAlive { return "", errors.Newf("inactive instance ID %d", instanceID)
My sense is that "inactive" is an implementation detail. I don't see a lot of value in differentiating between inactive and the row not existing at all. Also, my sense is that this API needs a way to indicate that there is no such instance.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 76 at r22 (raw file):
func (r *Reader) filterInactiveInstances( ctx context.Context, instanceRows []instancerow, ) (instances []instancerow) {
this all feels pretty complex. what about:
func (r *Reader) filterInactiveInstances(
ctx context.Context, rows []instancerow,
) ([]instancerow, error) {
// Filter inactive instances.
{
truncated := rows[:0]
for _, row := range rows {
isAlive, err := r.slReader.IsAlive(ctx, row.sessionID)
if err != nil {
return nil, err
}
if isAlive {
truncated = append(truncated, row)
}
}
rows = truncated
}
sort.Slice(rows, func(i, j) int {
if rows[i].addr == rows[j].addr {
return rows[j].timestamp.Less(rows[i].timestamp) // decreasing timestamp order
}
return rows[i].addr < rows[j].addr
})
// Only provide the latest entry for a given address.
{
truncated := rows[:0]
for i := 0; i < len(rows)-1; i++ {
if i == 0 || rows[i].addr != rows[i-1].addr {
truncated = append(truncated, rows[i])
}
}
rows = truncated
}
return rows, nil
}
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 71 at r22 (raw file):
// associates it with its SQL address and session information. func (s *Storage) CreateInstance( ctx context.Context, sessionID sqlliveness.SessionID, addr string,
One thing with these session IDs is that, in principle, they should have an expiration. I think when you use this, to be safe, we should be propagating an expiration timestamp so set as the expiration for the writing transaction.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 79 at r22 (raw file):
return base.SQLInstanceID(0), errors.New("no session information for instance") } instanceID := base.SQLInstanceID(1) // Starter value
I don't think you're using the "Starter value" any longer.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 92 at r22 (raw file):
return err } return s.db.Put(ctx, key, &value)
This needs to use the transaction.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 150 at r22 (raw file):
err = s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { k := s.makeInstanceKey(instanceID) row, err := s.db.Get(ctx, k)
You're not using the transaction, and I don't think you need it.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 221 at r22 (raw file):
func (s *Storage) ReleaseInstanceID(ctx context.Context, id base.SQLInstanceID) error { err := s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { key := s.makeInstanceKey(id)
you can construct the key above the closure
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 222 at r22 (raw file):
err := s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { key := s.makeInstanceKey(id) return s.db.Del(ctx, key)
This is not using the transaction, and, indeed, I don't think you need one here for this single row delete.
Previously, the Statements endpoint implementation was purely local in its execution and reported only in-memory data on the current tenant. This change initializes a gRPC and gRPC-gateway server on the tenant, much like we do on the hosts already, and uses the newly added InstanceID subsystem for tenants to implement a gRPC-based fanout for the Statements endpoint implementation. The fanout is done much in the same manner as on hosts and we expose the status server endpoints via HTTP on the tenant as well. This change is necessary to support serverless observability features and provide our UIs access to the Statements endpoint. Future work may move this API to SQL-only Resolves cockroachdb#64477 Release note (api change): tenant pods now expose the Statements API at `/_status/statements` on their HTTP port. REVIEWER NOTE: This change is based on cockroachdb#66600 which is still in progress, please only review the final commit. Once cockroachdb#66600 is merged, only the final commit will remain on rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 71 at r22 (raw file):
Previously, ajwerner wrote…
One thing with these session IDs is that, in principle, they should have an expiration. I think when you use this, to be safe, we should be propagating an expiration timestamp so set as the expiration for the writing transaction.
Can you elaborate on this? Whats the API to set expiration for a writing transaction? Would it be through UpdateDeadline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 71 at r22 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Can you elaborate on this? Whats the API to set expiration for a writing transaction? Would it be through UpdateDeadline?
Yes, UpdateDeadline
is the API for setting the deadline for the transaction to the lease expiration.
43439c1
to
0757e85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 30 at r15 (raw file):
Previously, ajwerner wrote…
I remain a tad skeptical but I defer to you. Generally there's an understanding that a struct which has public fields passed around by value is itself a value.
Done. I ended up making the fields public since it seemed fine to pass by value.
pkg/sql/sqlinstance/sqlinstance.go, line 35 at r22 (raw file):
Previously, ajwerner wrote…
Is there some error that indicates that such an instance does not exist? Might want to put that in the API.
I have added a new error specifically to indicate a non-existent instance.
pkg/sql/sqlinstance/sqlinstance.go, line 51 at r22 (raw file):
Previously, ajwerner wrote…
// NewSQLInstanceInfo constructs a new InstanceInfo struct. func NewSQLInstanceInfo( instanceID base.SQLInstanceID, addr string, sessionID sqlliveness.SessionID, ) *InstanceInfo {
nit: this is a small struct that, as you noted, should never be modified after it has been constructed. Should we just pass it around by value and call this make?
Done.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 79 at r22 (raw file):
Previously, ajwerner wrote…
p.mu.Lock() defer p.mu.Unlock() select { case <-p.stopper.ShouldQuiesce(): return base.SQLInstanceID(0), stop.ErrUnavailable case <-ctx.Done(): return base.SQLInstanceID(0), ctx.Err() default: } p.mu.initOnce.Do(func() {
something in all of this synchronization smells. I don't think you need this all. What's especially alarming is that that call to
Session
might block and go to the network, in the meantime, subsequent callers will block on the mutex. Also, if there is an error, then subsequent callers will not see it. What about something like:type provider struct { sqlinstance.AddressResolver storage writer stopper *stop.Stopper initOnce sync.Once initialized chan struct{} // below fields are populated before the above channel is closed. instanceID base.SQLInstanceID err error } func (p *provider) Instance(ctx context.Context) (base.SQLInstanceID, error) { p.maybeInitialize() select { case <-ctx.Done(): return 0, ctx.Err() case <-p.initialized: return p.instanceID, p.initializationErr } } func (p *provider) maybeInitialize() { p.initOnce.Do(func() { ctx := context.Background() // TODO: add log tags if err := p.stopper.RunAsyncTask(ctx, "initialize-instance", func(ctx context.Context) { p.err = p.initialize(ctx) close(p.initialized) }); err != nil { p.err = err close(p.initialized) } }) } func (p *provider) initialize(ctx context.Context) (err error) { // Create a new instance if instanceInfo has not been initialized. session, err := p.mu.session.Session(ctx) if err != nil { return errors.Wrap(err, "constructing session") } instanceID, err := p.storage.CreateInstance(ctx, session.ID(), p.mu.instanceAddr) if err != nil { return errors.Wrap(err, "creating instance") } p.instanceInfo = sqlinstance.NewSQLInstanceInfo(instanceID, p.mu.instanceAddr, session.ID()) return nil }Anyway, I'm not saying use the above code verbatim, it doesn't deal with shutdown, but this code as is does not seem like the right balance of synchronization.
I have updated it to use async channels. I think at this point, we should not need a mutex since initError and instanceID fields are only read within shutdown code while releasing the instance.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 51 at r22 (raw file):
Previously, ajwerner wrote…
My sense is that "inactive" is an implementation detail. I don't see a lot of value in differentiating between inactive and the row not existing at all. Also, my sense is that this API needs a way to indicate that there is no such instance.
Done.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 76 at r22 (raw file):
Previously, ajwerner wrote…
func (r *Reader) filterInactiveInstances( ctx context.Context, instanceRows []instancerow, ) (instances []instancerow) {
this all feels pretty complex. what about:
func (r *Reader) filterInactiveInstances( ctx context.Context, rows []instancerow, ) ([]instancerow, error) { // Filter inactive instances. { truncated := rows[:0] for _, row := range rows { isAlive, err := r.slReader.IsAlive(ctx, row.sessionID) if err != nil { return nil, err } if isAlive { truncated = append(truncated, row) } } rows = truncated } sort.Slice(rows, func(i, j) int { if rows[i].addr == rows[j].addr { return rows[j].timestamp.Less(rows[i].timestamp) // decreasing timestamp order } return rows[i].addr < rows[j].addr }) // Only provide the latest entry for a given address. { truncated := rows[:0] for i := 0; i < len(rows)-1; i++ { if i == 0 || rows[i].addr != rows[i-1].addr { truncated = append(truncated, rows[i]) } } rows = truncated } return rows, nil }
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 91 at r3 (raw file):
Previously, knz (kena) wrote…
here and all the comments below, I think the capital was not needed in this place.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 119 at r3 (raw file):
Previously, knz (kena) wrote…
ditto for "instance" but id -> ID
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 133 at r3 (raw file):
Previously, knz (kena) wrote…
no capital in warnings
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 144 at r3 (raw file):
Previously, knz (kena) wrote…
no capitals needed here
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 162 at r3 (raw file):
Previously, knz (kena) wrote…
no capitals in errors
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 203 at r6 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
No. The reasoning here is that when we release the instance id, we simply set the SQL address and session id associated with them to null so that we can reuse the instance id for future SQL pods.
Switched to use s.db.Del now instead of marking it as null.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 229 at r18 (raw file):
Previously, knz (kena) wrote…
Instance -> instance
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 71 at r22 (raw file):
Previously, ajwerner wrote…
Yes,
UpdateDeadline
is the API for setting the deadline for the transaction to the lease expiration.
Okay, I have modified the API to take in the session expiration which is used to update the txn deadline.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 79 at r22 (raw file):
Previously, ajwerner wrote…
I don't think you're using the "Starter value" any longer.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 92 at r22 (raw file):
Previously, ajwerner wrote…
This needs to use the transaction.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 150 at r22 (raw file):
Previously, ajwerner wrote…
err = s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { k := s.makeInstanceKey(instanceID) row, err := s.db.Get(ctx, k)
You're not using the transaction, and I don't think you need it.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 221 at r22 (raw file):
Previously, ajwerner wrote…
you can construct the key above the closure
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 222 at r22 (raw file):
Previously, ajwerner wrote…
This is not using the transaction, and, indeed, I don't think you need one here for this single row delete.
Done.
pkg/sql/sqlinstance/instancestorage/test_helpers.go, line 90 at r3 (raw file):
Previously, knz (kena) wrote…
id -> instanceID
Done.
pkg/sql/sqlliveness/sqlliveness.go, line 71 at r3 (raw file):
Previously, knz (kena) wrote…
I think it's worth summarizing what the status of the discussion here instead of fulling deferring to the external document.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the server bits and the overall API. I did not review the instance provisioning code (trusting Andrew on that bit).
All my comments have been addressed and I don't have further comments.
You'll need to adjust the expected output for a few tests as indicated in CI, then this is good to go when Andrew is also happy about it.
Reviewed 20 of 58 files at r19, 6 of 33 files at r23, 4 of 13 files at r24, 16 of 20 files at r27.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod nits and some minor concerns
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 39 at r27 (raw file):
// Returns a NonExistentInstanceError if an instance with the given ID // does not exist. GetInstanceAddr(context.Context, base.SQLInstanceID) (string, error)
nit: just return an InstanceInfo
for symmetry?
pkg/sql/sqlinstance/sqlinstance.go, line 47 at r27 (raw file):
type Provider interface { AddressResolver Instance(context.Context) (base.SQLInstanceID, error)
my sense is that we're going to want a separate interface for the Instance
method as sql may use it but we can leave that for later.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 31 at r27 (raw file):
type writer interface { CreateInstance(context.Context, sqlliveness.SessionID, hlc.Timestamp, string) (base.SQLInstanceID, error)
nit: can you either name the last to arguments to clarify the interface?
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 89 at r27 (raw file):
func (p *provider) maybeInitialize() { p.initOnce.Do(func() { ctx := context.Background()
nit: add some log tags to this context
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 116 at r27 (raw file):
// shutdownSQLInstance shuts down the SQL instance. func (p *provider) shutdownSQLInstance(ctx context.Context) { err := p.storage.ReleaseInstanceID(ctx, p.instanceID)
I'm a bit anxious about this call in the case where initialization failed or is in progress. Perhaps the following and some testing:
go func() {
p.initOnce.Do(func() { p.initErr = errors.New("never started"); close(p.initialized) }
}()
select {
case <-ctx.Done():
return
case <-p.initialized
}
if p.initError != nil { return }
pkg/sql/sqlinstance/instanceprovider/instanceprovider_test.go, line 80 at r27 (raw file):
return err == stop.ErrUnavailable }, time.Second, 10*time.Millisecond,
my guess is that you'll want to make the timeout longer for stressrace. By default we use 45 seconds, see testutils.DefaultSucceedsSoonDuration
. Also, eventually is sort of worse than testutils.SucceedsSoon
because it doesn't back off and it doesn't log when it fails.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 110 at r27 (raw file):
} rows = truncated }
I wonder if we want to sort it back by instance id before returning
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 190 at r27 (raw file):
} // getAllInstanceRows decodes and returns all instance rows from the system.sql_instances table
Can you add a TODO to provide a mechanism for locking? If we don't lock, this is going to lead to some thrashing at startup as all of the instances read the same data and then try to claim the same ID.
pkg/sql/sqlinstance/instancestorage/row_codec.go, line 52 at r27 (raw file):
func (d *rowCodec) encodeRow( addr string, sessionID sqlliveness.SessionID, ) (v roachpb.Value, err error) {
nit: this is called encodeRow
but it only encodes the value whereas decodeRow
takes the KeyValue
. For symmetry, can you have this take the instanceID and return the KeyValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @andy-kimball, @Azhng, @dhartunian, @knz, and @rimadeodhar)
pkg/migration/migrations/migrations.go, line 70 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Done.
Done.
pkg/sql/catalog/systemschema/system.go, line 375 at r1 (raw file):
Previously, knz (kena) wrote…
yep good
Done.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 39 at r3 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Done.
Done.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 121 at r3 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Yes, thats what I meant. That we need to ensure that we have a mechanism to reuse instance ids from dead sessions esp. if they haven't been shut down cleanly. Background thread would just be one way of doing it. Doing it at startup or within the caching layer should be fine too.
Done.
pkg/sql/sqlinstance/instanceprovider/instanceprovider.go, line 116 at r27 (raw file):
Previously, ajwerner wrote…
I'm a bit anxious about this call in the case where initialization failed or is in progress. Perhaps the following and some testing:
go func() { p.initOnce.Do(func() { p.initErr = errors.New("never started"); close(p.initialized) } }() select { case <-ctx.Done(): return case <-p.initialized } if p.initError != nil { return }
I did think about that case but it feels unrealistic as this method will only be called once an instance has been initialized as this callback is bound to the session only after instance initialization. Anyway, I'll add some safeguards.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 110 at r27 (raw file):
Previously, ajwerner wrote…
I wonder if we want to sort it back by instance id before returning
Personally, I don't think it's necessary. Feels like unnecessary overhead, the client can always sort if need be. Also, we can update this if it turns out later we have sufficient need for sorting.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 37 at r2 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Done.
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 190 at r27 (raw file):
Previously, ajwerner wrote…
Can you add a TODO to provide a mechanism for locking? If we don't lock, this is going to lead to some thrashing at startup as all of the instances read the same data and then try to claim the same ID.
Good point, will add a TODO. I will have a follow up PR to add caching through rangefeeds, will fix the TODO then.
Previously, the Statements endpoint implementation was purely local in its execution and reported only in-memory data on the current tenant. This change initializes a gRPC and gRPC-gateway server on the tenant, much like we do on the hosts already, and uses the newly added InstanceID subsystem for tenants to implement a gRPC-based fanout for the Statements endpoint implementation. The fanout is done much in the same manner as on hosts and we expose the status server endpoints via HTTP on the tenant as well. This change is necessary to support serverless observability features and provide our UIs access to the Statements endpoint. Future work may move this API to SQL-only Resolves cockroachdb#64477 Release note (api change): tenant pods now expose the Statements API at `/_status/statements` on their HTTP port. REVIEWER NOTE: This change is based on cockroachdb#66600 which is still in progress, please only review the final commit. Once cockroachdb#66600 is merged, only the final commit will remain on rebase.
0757e85
to
ac31018
Compare
In this commit, I create a new system table called sql_instances which will store the instance ID, SQL address and session ID for all instances associated with a tenant. If the instance is currently active, the SQL address and session ID fields will be non-null. For inactive instances, the SQL address and session ID fields will be populated. An instance ID can be reused if it is no longer being used by an active SQL pod. Release note: None
In this commit, I implement the KV API for accessing and updating the sql_instances table. We need to use KV API for the sqlinstance subsystem as this table will be used before the initialization of the SQL server. Release note: None
In this commit, I create and implement the provider API to create and manage instance information corresponding to invidual SQL pods. The provider takes care of initializing a unique instance ID using the sql_instances system table. It also propagates SQL address and session information corresponding to a SQL pod using the sql_instances system table. This system is dependent on the sqlliveness subsystem. Release note: None
ac31018
to
b143d46
Compare
TFTR! bors r+ |
Build succeeded: |
Previously, the Statements endpoint implementation was purely local in its execution and reported only in-memory data on the current tenant. This change initializes a gRPC and gRPC-gateway server on the tenant, much like we do on the hosts already, and uses the newly added InstanceID subsystem for tenants to implement a gRPC-based fanout for the Statements endpoint implementation. The fanout is done much in the same manner as on hosts and we expose the status server endpoints via HTTP on the tenant as well. This change is necessary to support serverless observability features and provide our UIs access to the Statements endpoint. Future work may move this API to SQL-only Resolves cockroachdb#64477 Release note (api change): tenant pods now expose the Statements API at `/_status/statements` on their HTTP port. REVIEWER NOTE: This change is based on cockroachdb#66600 which is still in progress, please only review the final commit. Once cockroachdb#66600 is merged, only the final commit will remain on rebase.
This change adds a new system table, sql_instances,
for storing SQL instance information in a multi-tenant
environment. This table will be used to store and
propagate unique instance ids and network address
information on individual SQL pods.
This change also implements a storage layer for
accessing the sql_instances system table using
the KV API. We need to use KV API as this table
will be accessed and updated prior to the construction
of a SQL server for the tenant. It also implements the
provider interface for using this new sqlinstance
subsystem to assign unique instance ids per pod and
propagate their network information.
This is a part of #62818,
resolves #64475
Release Note (general change): Add sqlinstance subsystem for creating
unique instance ids per SQL pod and for propagating network address
for SQL pods within a single tenant.