Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split auth.AccessPoint into variant specific interfaces #8471

Merged
merged 3 commits into from
Nov 4, 2021

Conversation

rosstimothy
Copy link
Contributor

Having one interface for all cache variants makes requesting resources which are not replicated to that variant possible. This produces subtle bugs in code, which may look correct, but won't behave as expected at runtime. For example, if you called GetKubeServices() on a Node's cache even though the Node doesn't replicate that resource the code would still compile but you would never receive any KubeServices from the call. One shared interface also makes it challenging to know which variants never utilize the data being replicated to it. Attempting to stop replicating a resource to a particular cache variant becomes a game of mental gymnastics instead of easily checking the cache specific variant.

Splitting up the auth.AccessPoint interface into several interfaces alleviates the issues mentioned above. As long as the cache interfaces match the configured resource replication lists, the compiler can be leveraged to ensure no extraneous resources can be loaded for a cache. It also makes reasoning about code much more manageable since you don't have to guess if the resource you are trying to load will exist in a particular cache.

@rosstimothy rosstimothy force-pushed the tross/reduce-network-load branch 3 times, most recently from 7e9e16a to b72200b Compare October 8, 2021 16:24
@rosstimothy rosstimothy marked this pull request as ready for review October 8, 2021 16:24
@russjones
Copy link
Contributor

@fspmarshall @andrejtokarcik Can you review?

@rosstimothy rosstimothy removed the request for review from andrejtokarcik October 18, 2021 13:07
lib/auth/api.go Outdated Show resolved Hide resolved
lib/auth/api.go Show resolved Hide resolved
lib/auth/api.go Outdated Show resolved Hide resolved
func ExtractFromCertificate(access UserGetter, cert *ssh.Certificate) ([]string, wrappers.Traits, error) {
// For legacy certificates, fetch roles and traits from the services.User
// object in the backend.
if isFormatOld(cert) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we safely remove this? IIRC @russjones mentioned at some point that some customer may still be using old-style certificates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run this by @russjones and he seemed to think it would be okay to remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to remove. I think RHEL 5 was the main reason we were supporting these, but we can drop now.

lib/srv/ctx.go Outdated Show resolved Hide resolved
}

return proxyHost
//TODO(tross): Get the proxy address somehow - types.KindProxy is not replicated to Nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how was this working before? What is this used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets consumed by teleport status here and as far as I can tell has never returned the actual proxy address. @russjones and @fspmarshall suggested that replicating types.KindProxy to nodes could be detrimental given the frequency of the resource. This should definitely be investigated and addressed in a separate PR.

Copy link
Contributor

@russjones russjones Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at some point it did used to work, in-fact, I remember it actually working. I just don't know how/when it changed.

With the current permission model, not sure how it could work since we build it from types.KindProxy which the node does not have access to. I'm actually not that opposed to allowing nodes to read proxies because I don't think it would generate too much load, but will defer to @fspmarshall here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of replicating proxy resources to nodes. We've seen folks running clusters with 8 proxies and 30k nodes, and we want to push scaling further in the future. It seems to me that we should avoid replicating any resource that gets frequently updated to nodes if we can avoid it, and proxy resources get updated every heartbeat.

IMO its fine to leave this behavior as-is for now. If we wanted to start displaying the correct address again in the future, we could have the auth server sync the public address info to a separate resource that only gets updated on change.

@@ -739,13 +747,11 @@ func getPAMConfig(c *ServerContext) (*PAMConfig, error) {
environment["TELEPORT_LOGIN"] = c.Identity.Login
environment["TELEPORT_ROLES"] = strings.Join(roleNames, " ")
if localPAMConfig.Environment != nil {
user, err := c.srv.GetAccessPoint().GetUser(c.Identity.TeleportUser, false)
traits, err := services.ExtractTraitsFromCert(c.Identity.Certificate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work for trusted clusters? I don't know if the certificate here would contain user's traits if the leaf cluster's server is accessed. @russjones might know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run this by @russjones as well. Not sure if he thought of that use case though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not work this way, but I think we can still do it. c.Identity here should contain the traits: https://github.com/gravitational/teleport/blob/master/lib/auth/permissions.go#L243

So we should just be able to pull them out directly since they correctly get mapped when the connection is established.

@r0mant what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.Identity is not a tlsca.Identity though, it is a srv.IdentityContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosstimothy We need to do something similar to what we do in auth/permissions.go here but for SSH certificates: https://github.com/gravitational/teleport/blob/master/lib/srv/authhandlers.go#L108-L149.

@r0mant can you help with this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosstimothy @russjones Actually, looking at CreateIdentityContext, I think it may be fine as it is. It basically fills the identity.Certificate field with user certificate so extracting traits there would be the same as extracting them here (what is being done here). It might make sense to run a quick test (even if using just some print statements) but I think it's ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spun up a root and leaf cluster locally and verified that the traits were the same when connecting to nodes in the root and leaf clusters. Connected to the leaf cluster node directly through the leaf cluster and through the root cluster.

@rosstimothy rosstimothy force-pushed the tross/reduce-network-load branch 3 times, most recently from 00c5f41 to 3a78d9d Compare October 22, 2021 18:05
@rosstimothy rosstimothy force-pushed the tross/reduce-network-load branch 3 times, most recently from 489624d to 97e1371 Compare November 3, 2021 13:56
@rosstimothy rosstimothy merged commit 5cd7c3c into master Nov 4, 2021
@rosstimothy rosstimothy deleted the tross/reduce-network-load branch November 4, 2021 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants