Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use auth_servers when proxying an auth connection #13310

Merged
merged 9 commits into from
Jun 10, 2022

Conversation

espadolini
Copy link
Contributor

@espadolini espadolini commented Jun 8, 2022

As a followup to #12862, this PR makes it so that every instance of dialing an auth through a proxy (not just the reverse tunnel's @remote-auth-server) will select one random entry from the proxy's own auth_servers configuration rather than checking the auth server heartbeats in the backend. This is preferrable as the backend info could be out of date (10 minutes for expiration plus backend expiration and cache propagation delays), and doesn't allow for any flexibility in choosing a specific load balancer or running health checks.

note: backporting this requires backporting #12862 at the same time.

@espadolini espadolini force-pushed the espadolini/static-authservers branch 4 times, most recently from ab272c5 to 631e253 Compare June 9, 2022 14:10
@espadolini espadolini force-pushed the espadolini/static-authservers branch from 631e253 to 7625562 Compare June 9, 2022 14:28
@espadolini espadolini marked this pull request as ready for review June 9, 2022 14:51
@espadolini espadolini changed the title espadolini/static-authservers Always use auth_servers when proxying an auth connection Jun 9, 2022
Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This isn't a part of Teleport that I usually inhabit, so something might escape me, but I like that the change is both simpler and makes a clearly random choice.

lib/reversetunnel/localsite.go Show resolved Hide resolved
lib/srv/alpnproxy/auth/auth_proxy.go Show resolved Hide resolved
lib/srv/alpnproxy/auth/auth_proxy_test.go Outdated Show resolved Hide resolved
lib/utils/utils.go Outdated Show resolved Hide resolved
lib/srv/alpnproxy/auth/auth_proxy.go Outdated Show resolved Hide resolved
lib/srv/alpnproxy/auth/auth_proxy_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@LKozlowski LKozlowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'm not very familiar with this part of code base, but just one thing came to my mind.

In the previous version we were checking authorization servers for connection one by one. Now, we are picking up one server randomly and if it fails then we throws and error. There's no fallback or checking the other servers from the list. Isn't it going to affect the availability?

@espadolini
Copy link
Contributor Author

@LKozlowski

There's no fallback or checking the other servers from the list. Isn't it going to affect the availability?

See #13310 (comment), the tl;dr is that it should be quite rare and only for tsh - if it turns out to be a problem, any retry logic should be in tsh itself rather than hidden away in the proxy. We never noticed problems because the usual setup in kubernetes will notify the proxy of a failure to connect to auth really quickly, and the retry logic masked the design flaw of using the backend info over the explicit user configuration.

@espadolini
Copy link
Contributor Author

@codingllama

makes a clearly random choice

I specialcased length 1 in the utility function specifically because in almost every configuration (and in all supported ones) there will only be a single address in auth_servers; the random choice is just there to still guarantee the correct load balancing behavior for old and/or weird setups in which proxies have a hardcoded list of more than one auth, but we don't really expect it to happen.

For a DNS load balancer setup, net.Dial itself will connect to the resolved DNS entries (if there's more than one) in order, but that's because the entries from the DNS server are supposed to be in random order.

@LKozlowski
Copy link
Contributor

@LKozlowski

There's no fallback or checking the other servers from the list. Isn't it going to affect the availability?

See #13310 (comment), the tl;dr is that it should be quite rare and only for tsh - if it turns out to be a problem, any retry logic should be in tsh itself rather than hidden away in the proxy. We never noticed problems because the usual setup in kubernetes will notify the proxy of a failure to connect to auth really quickly, and the retry logic masked the design flaw of using the backend info over the explicit user configuration.

Thanks for the explanation! Makes sense now. I must have written my comment when you were writing yours as I didn't notice it before sending my feedback 😄

espadolini and others added 2 commits June 10, 2022 12:22
Co-authored-by: Alan Parra <alan.parra@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
@espadolini espadolini enabled auto-merge (rebase) June 10, 2022 12:26
@espadolini espadolini disabled auto-merge June 10, 2022 12:32
@espadolini espadolini enabled auto-merge (squash) June 10, 2022 12:32
@espadolini espadolini merged commit 9bb07ea into master Jun 10, 2022
@espadolini
Copy link
Contributor Author

Holding off on the v7 backport.

espadolini added a commit that referenced this pull request Jun 13, 2022
* Don't GetAuthServers in transport.start

* Don't GetAuthServers in AuthProxyDialerService

* Don't GetAuthServers in localSite

* Fix lib/web tests

* Review comments

Co-authored-by: Alan Parra <alan.parra@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
espadolini added a commit that referenced this pull request Jun 13, 2022
* Don't GetAuthServers in transport.start

* Don't GetAuthServers in AuthProxyDialerService

* Don't GetAuthServers in localSite

* Fix lib/web tests

* Review comments

Co-authored-by: Alan Parra <alan.parra@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants