-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Centralize SNI config #6237
Conversation
TODO: - thread `CompileRequest.CurrentTrustDomain` through other tests and real call sites - add `Compile` tests for other permutations of overridden SNI - fix xDS package to use the new `DiscoveryTarget.SNI` field where it can. Probably needs some of the other SNI stuff put back to cover code paths not using discovery chain yet. - maybe fix other tests that broke in API etc?
#6242 introduces a |
} | ||
|
||
func (c *compiler) sniForTarget(t structs.DiscoveryTarget, externalSNI string) string { | ||
return connect.ServiceSNI(t.Service, t.ServiceSubset, t.Namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function return externalSNI
if it's not ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess that is still a WIP part :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple thoughts on this.
-
I think the correct integration point for proxies (including if we were to split the xDS stuff out of the Consul binary) would be to provide an agent endpoint to pull the generic proxy config that looks something like the
proxycfg.ConfigSnapshot
struct (probably not identical because that has a lot of tracking information in it for watches currently). This endpoint would need to return all the listener, cluster, endpoint and routing information that the proxy would need to configure itself. -
Yes much of this will need to be put back in place for all the things which do not use the discovery chain (all of Mesh Gateways and prepared query upstreams to name two of them).
|
||
// Work out if SNI have been overridden for this service | ||
externalSNI := "" | ||
if serviceDefaults != nil && serviceDefaults.ExternalSNI != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not going to allow modifying the SNI of something at a service instance level? I haven't thought through completely whether that would be good or bad (I have a hunch that instance level SNI is probably not a good UX).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now because in our own model SNI needs to be varied based on the discovery path taken (e.g. to identify the intended subset) which an individual instance can't know about so it seems like it would get horribly confusing.
Hey @mkeeler I'm trying to catch up with your thought process on those comments.
I think this is in response to "I spent a long time thinking about where semantically this should be exposed in our API.". The integration point for external proxies that want to support L7 features (including routing via gateways) we settled on is the new HTTP API RB added to the RC which is effectively a cleaned up Compile endpoint. That leaves proxy integrations doing a bunch of the I think for now the HTTP API for the compiled chain at least makes it possible for external integrations to have equivalent functionality to Envoy. Long term I think we want to make that a more concrete plugin API that would not necessarily use HTTP and so could bypass the potential efficiency concern above. It would also need to treat each watched resource more independently than the whole proxy snapshot to avoid needing to resend the complete thing on each update though? In the interest of making it possible to do this for the upcoming integration deadline, are you happy with moving forward with adding it here?
Does your PR fix this RB? |
@banks That was in response to the "how should we expose this in our API" bit that I saw. The discovery chain is fine as an integration point for things that want to support L7 features. However if someone wanted to implement their own Mesh Gateway then that endpoint is useless. None of the Mesh Gateway stuff cares about the discovery chain (nor should it). Thats why exposing the proxycfg somehow seems like the correct place. Its one place that can do both normal proxies and mesh gateways. I don't think that has a big impact on this PR (maybe none at all). One problem I can see that the PR doesn't address yet (it is WIP) is how Mesh Gateways should route traffic with overridden SNI values.
Maybe we just say that if you are using an external SNI then you cannot have the same service in multiple datacenters? Another alternative would be to define some sort of routing rules so that we could send a specific SNI to a specific datacenter even when it hasn't encoded the datacenter in the name. |
One piece of awkwardness about the compiler-based approach: the normal trust domain has to be known before the discovery chain compiler request is triggered. This means that the initial watch on the compiler output has to wait until we first receive the roots back from that notify call before we can even flight a request for the discovery chain. Since those two would be interlinked we'd need to track the |
Good questions @mkeeler. My thinking was that the real use case here was only for external services and that we'd never try to route them through our own mesh gateways. e.g. assume that anything with an ExternalSNI set is going directly to some SNI gateway outside of Consul. In that sense our gateways don't need to change at all. That said, in the current formulation here there is nothing stopping you setting ExternalSNI on a regular service which does indeed have confusing semantics when it comes to gateways and routing. I'd be happy to find the least-bad minimal change we can make so we don't have to support any customisation of SNI through mesh-gateways for now and still keep the UX not terrible. Perhaps we can validate that ExternalSNI is only set for external services somehow at config write time. Or perhaps we can define that setting that value explicitly declares that service to be external and fundamentally alters how we handle it - i.e. we explicit treat all instances returned by the resolver as external gateways so they never go via our own gateways and never failover etc? @rboyer interesting. I don't have good ideas about how to fix that really. Do you? |
Currently we rely on "integration" code xDS package "knowing" how to build the right SNI for a given cluster. This is kinda gross because that same logic would need to be copied into any other proxy integration in the future. We should centralize it and so make integrations able to just consume it.
I spent a long time thinking about where semantically this should be exposed in our API. Initially it seemed like it could be exposed during service discovery, but we can't because the service discovery query itself doesn't know specifically which
subset
is being addressed currently. We also need to resolve exactly one SNI value for all instances per cluster which means it would really need to appear at the top-level of the response but our HTTP API currently is a list at the top level 😭 .This means that tools that federate workloads from external systems will need to register the service instances in Consul and then create or modify the
service-defaults
for the service to set theExternalSNI
param such that outbound connections will get the correct (non-connect) SNI value set. For now we assume that any single service (even if there are multiple instances of that service or multiple gateways through which it can be accessed) will have a single SNI value to address them which seems inline with other mesh implementations to date.TODO:
CompileRequest.CurrentTrustDomain
through other tests and real call sitesCompile
tests for other permutations of overridden SNIDiscoveryTarget.SNI
field where it can. Probably needs some of the other SNI stuff put back to cover code paths not using discovery chain yet.