-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: ccl/sqlproxyccl: add CIDR Ranges and Private Endpoints ACL support #103751
release-23.1: ccl/sqlproxyccl: add CIDR Ranges and Private Endpoints ACL support #103751
Conversation
Fixes cockroachdb#76839 and cockroachdb#71365. This commit rewrites the test to use the static directory server instead of the dynamic one. The old one is susceptible to GRPC port reuse and deadlock issues. With this change, only directory_cache_test.go is relying on the dynamic directory server. Once we rewrite that, we can remove the entire test directory implementation. Release note: None
The Load field has been deprecated for months, and is no longer being used in production. The tenant directory implementation is no longer emitting pod loads. This commit addresses an old TODO. Release note: None
This commit adds a new WatchTenants API to the directory server interface. The goal is for this API to stream tenant metadata change notifications. At the same time, this commit adds a new Tenant proto message to represent tenant fields instead. This would allow us to add more fields to the tenant object (e.g. IP allowlist and private endpoint entries). Release note: None
This commits adds a new tenant metadata watcher mechanism to the directory cache. This will be used to receive notifications about tenants being created, modified, or deleted. Previously, we were only calling GetTenant once throughout the lifetime of the tenant entry. In this new model, GetTenant is only called if the tenant entry isn't valid. Release note: None
This commit includes two new fields in the tenant proto: ConnectivityType and PrivateEndpoints. The first will be used to indicate how the cluster can be accessed (i.e. public only, private only, or both), whereas the second will be used to indicate which endpoints can access the cluster, assuming that the cluster allows private connectivity. A subsequent commit will implement the necessary ACLs. Release note: None
Previously, the CheckConnection API in the AccessController interface took in a TimeSource object, and that seemed a little awkward. Given that the timeSource field is only used by the denylist ACL, we will plumb this into the denylis ACL. At the same time, we plumb context.Context into the CheckConnection API. Once we add a new PrivateEndpoints ACL, that will have to use a context.Context object for methods on the directory cache interface. Release note: None
This commit adds a new PrivateEndpoints ACL support to the access controller. This ACL will only apply if the cluster allows private connectivity. If the connection does not have an EndpointID, it is assumed to be a public connection, and the ACL is a no-op, which is the current behavior. A subsequent commit will extract the private endpoints from the PROXY header, and populate the EndpointID for the connection. Release note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Note that this backport adds new functionalities to the proxy. That said, all these will be rolled out to CC through our automation. |
This commit parses the private endpoint IDs within the PROXY header. Doing this would also mean that the ACL for private endpoints will be enforced under the right conditions. The parsing logic assumes that there can only be one type of endpoint ID within the PROXY header (i.e. AWS, GCP, or Azure). Release note: None
There is a requirement of automatically disconnecting existing connections if the ACL rules no longer match the connection's state. For the file-based ACLs, we poll every 1 minute, read the file, and check on every single connection. Here, we implement a similar mechanism for private endpoint updates. This is reasonable for now as: 1. Calls to LookupTenant are cached most of the time. 2. This is only applied to existing connections, and a polling interval of 1 minute isn't too bad. 3. We are already iterating all the connections for the other types ACLs That said, this approach isn't efficient and we're limited by the existing design of AccessController. Checking all the connections each time the watch ticks isn't ideal. In fact, we're checking three times here, one for each ACL. The directory cache already knows which tenants were updated through WatchTenants, and we can definitely do better here. A follow up TODO has been added to refactor AccessController in a way that allows updates to be batched. We should also only check connections only for tenants that have been updated. Release note: None
Previously, we were only validating cluster names when LookupTenantPods get called within the connector, which happens after the ACL check. The rationale behind that is that we didn't want a malicious actor who's iterating through all tenant IDs spinning up pods for them. The cluster name check ensures that the incoming connection knows something about the tenant. Now that we have introduced LookupTenant within the ACL logic, it is possible for a malicious actor to iterate through all the tenant IDs, and figure out which tenant IDs are in use (since it returns "connection refused" if the tenant exists). To address that, we will start validating cluster names before running the ACL check (i.e. at the start of the proxy handler) before proceeding. This ensures that we will return a NotFound error if the tenant doesn't exist, or there's a cluster name mismatch. At the same time, the clusterName parameter has been removed from LookupTenantPods since that is no longer needed. Release note: None Epic: none
…ities Previously, we had introduced 3 states (ALLOW_NONE, ALLOW_PUBLIC, and ALLOW_PRIVATE) with the intention that ALLOW_NONE would be useful. Thinking about it further, it does not make sense for a cluster to support ALLOW_NONE (i.e. what would that actually mean?). This commit updates the connectivity types to support ALLOW_ALL, ALLOW_PUBLIC_ONLY, and ALLOW_PRIVATE_ONLY. By default, ConnectivityType=0, which maps to ALLOW_ALL. Release note: None Epic: none
1b28e58
to
b5a711e
Compare
Previously, the proxy only supported file based IP allowlist entries. This commit adds a new ACL for allowed CIDR ranges, and will only apply if the allowlist file isn't supplied to the proxy. Unlike the file-based ACL, this makes use of the per-tenant metadata to store allowed CIDR ranges (i.e. in the new AllowedCIDRRanges field). By default, if there are no CIDR ranges, it is expected to block all incoming connections. Release note: None Epic: none
This commit renames PrivateEndpoints to AllowedPrivateEndpoints in the tenant proto message to be consistent with AllowedCIDRRanges. In theory, we could have done AllowedEndpoints, but we'll leave "Private" in to be more explicit. Release note: None Epic: none
Fixes cockroachdb#103494. This commit fixes a flake on TestWatchTenants. There's a possibility where the cache invalidation logic races with the watcher termination logic in the test. This commit updates the test to wait for the cache invalidation before proceeding. Release note: None Epic: none
b5a711e
to
768d496
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes are in the SQL Proxy, which is only used in Serverless clusters. This patch will have no effect on Dedicated or self-host clusters.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson)
TFTRs! |
Backport 14/14 commits from #101864, #103070, #103479, #103487, #103550, and #103625.
/cc @cockroachdb/release
See individual commits for more information. On a high-level overview, this backport introduces two new ACL mechanisms to the proxy: CIDR Ranges and Private Endpoints. For that to work, we reworked the tenant directory to watch tenant metadata changes, which were previously constant.
Release note: None
Epic: none
Release justification: SQL proxy ACL change that is needed for CC.