Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mTLS certificate check on agent to agent RPCs #11998

Merged
merged 2 commits into from
Feb 5, 2022

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Feb 3, 2022

PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:

  1. client-only endpoints did not accept server certificates so the
    request would fail when forwarded from one server to another.
  2. the certificate was being checked after the request was forwarded,
    so the check would happen over the server certificate, not the
    actual source.

This PR checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.

#11956 was never released, so no CHANGELOG needed.

PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:

  1. client-only endpoints did not accept server certificates so the
     request would fail when forwarded from one server to another.
  2. the certificate was being checked after the request was forwarded,
     so the check would happen over the server certificate, not the
     actual source.

This commit checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.
@@ -1186,7 +1186,7 @@ func TestRPC_TLS_Enforcement_RPC(t *testing.T) {
name: "local server/clients only rpc",
cn: "server.global.nomad",
rpcs: localClientsOnlyRPCs,
canRPC: false,
canRPC: true,
Copy link
Contributor Author

@lgfa29 lgfa29 Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused problem 1 to go unnoticed. Server certificates should be able to call client RPCs

if tc.canRPC {
if err != nil {
require.NotContains(t, err, "certificate")
for _, srv := range []*Server{tlsHelper.mtlsServer1, tlsHelper.mtlsServer2} {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests problem 2 by using a two server cluster and making requests to both. One of the requests will need to be forwarded.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@lgfa29 lgfa29 merged commit 29ffa02 into main Feb 5, 2022
@lgfa29 lgfa29 deleted the fix-rpc-tls-certificate-check branch February 5, 2022 01:35
@samed
Copy link

samed commented Feb 14, 2022

@schmichael @lgfa29 With a recent upgrade we've lost our tls verification between servers and clients. Normally we're using vault's mtls certificate endpoint with the alt_names option, which covered our certificate needs. But after upgrading to 1.2.5 (and also noticed on 1.2.6), we're getting unauthorized raft connection until we set tls_verify to false.

This is from consul-template, it's obvious that alt_names cover *.eu-central-1.nomad

{{ with secret "mtls_int/issue/nomad" "common_name=server.global.nomad" "ttl=43800h" "alt_names=*.eu-central-1.nomad,*.eu-west-1.nomad,*.service.consul" "ip_sans=127.0.0.1"}}

And this is what I got from nomad log:

nomad.rpc: unauthorized raft connection nomad.rpc: unauthorized raft connection
addr=10.2.7.239:49834 required_hostname=server.eu-central-1.nomad found=["*.eu-central-1.nomad,*.eu-west-1.nomad,*.service.consul"]

There's certainly a problem with the recent changes, I'm just trying to figure out what caused this :)

@lgfa29
Copy link
Contributor Author

lgfa29 commented Feb 14, 2022

Hi @samed

This PR hasn't been released yet so it can't be causing that error (the error message is also slightly different in main).

Did anything change in your mTLS setup? From the error message it seems like you may be using a wildcard certificate in your servers?

@samed
Copy link

samed commented Feb 14, 2022

Thanks for the reply. We haven't changed anything in mtls setup but the upgrade was from 1.1.2 to 1.2.5, so I'll be checking for diffs between those :)

Note on slightly different log, I tried to merge/combine several lines from my clipboard, but the exact error was that.

@lgfa29
Copy link
Contributor Author

lgfa29 commented Feb 14, 2022

but the upgrade was from 1.1.2 to 1.2.5

Ah OK, so this check was introduced in 1.1.4, that's why you weren't seeing it before.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants