-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS cert provisioning #5597
TLS cert provisioning #5597
Conversation
9516acd
to
750c2dd
Compare
7e2d9fd
to
b4a9d88
Compare
3cc0983
to
ab9bcb5
Compare
4093d23
to
e9d841a
Compare
ff421f1
to
db86194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, this looks good to me. 🎉
Just have minor comments inline.
agent/agent.go
Outdated
} | ||
a.serviceManager = NewServiceManager(a) | ||
|
||
if err := a.initializeACLs(); err != nil { | ||
return nil, err | ||
} | ||
|
||
a.logger = logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come the logger isn't added above when a
is initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
agent/agent.go
Outdated
@@ -433,6 +436,18 @@ func (a *Agent) Start() error { | |||
// populated from above. | |||
a.registerCache() | |||
|
|||
if a.config.AutoEncryptTLS && !a.config.ServerMode { | |||
var reply *structs.SignResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you're declaring the reply/err variables here rather than using the short variable declaration :=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I did it so that I can use the if on the same line. But I guess it doesn't matter as much.
agent/consul/auto_encrypt.go
Outdated
retryJitterWindow = 30 * time.Second | ||
) | ||
|
||
func (c *Client) AutoEncrypt(servers []string, port int, token string, interruptCh chan struct{}) (*structs.SignResponse, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions feels like it should have a different name. From what I understand it's just making a request to sign a certificate, and then returning that certificate.
Maybe something like RequestCert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I settled on RequestAutoEncryptCerts
.
// exactly what is needed. | ||
c := &ConnectCA{srv: a.srv} | ||
|
||
rootsArgs := &structs.DCSpecificRequest{Datacenter: args.Datacenter} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where initializing directly to a pointer looks a bit odd to me.
I don't have strong feelings about it, but I would prefer to use &rootsArgs
and &roots
as the inputs for c.Roots
.
Same below in c.Sign
with &cert
. Then we can do reply.IssuedCert = cert
rather than reply.IssuedCert = *cert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it to what you suggested.
root := "../../test/ca/root.cer" | ||
badRoot := "../../test/ca_path/cert1.crt" | ||
|
||
variants := []variant{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One improvement that could be made here is to add a subtests where the name of each subtest is something like VerifyOutgoing fails
.
Here is a short read on table driven tests with subtests:
https://blog.golang.org/subtests
There's a few benefits to table driven subtests:
- You can run individual subtests on their own:
go test -run TestAutoEncryptSign/VerifyOutgoing_fails
- The test case information is displayed in the test output, which makes it easier to line up what you expect vs what you got
Here's an example from envconsul where the name of the subtest is embedded in the test case struct:
https://github.com/hashicorp/envconsul/blob/master/runner_test.go#L181
This talk from Mitchell covers them as well:
https://www.youtube.com/watch?v=8hQG7QlcLBk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
agent/retry_join_test.go
Outdated
expected []string | ||
} | ||
|
||
variants := []variant{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in TestAutoEncryptSign
agent/consul/status_endpoint_test.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
wrap := configurator.OutgoingRPCWrapper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming this to wrapper
might be a little clearer, since wrapper is also used in DialTimeoutInsecure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
80fde56
to
20b9752
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. Just the one minor comment/question.
20b9752
to
541424a
Compare
cc9b56c
to
7332a40
Compare
@@ -853,6 +853,15 @@ default will automatically work with some tooling. | |||
contents of each entry. | |||
|
|||
|
|||
* <a name="auto_encrypt"></a><a href="#auto_encrypt">`auto_encrypt`</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page is (mostly) alphabetized, can we move this just under Autopilot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
* <a name="tls"></a><a href="#tls">`tls`</a> makes the client request CA and certificates for encrypting RPC communication from the servers provided in `-join` or `-retry-join`. It only works if one of these provided addresses is a server `auto_encrypt.allow_tls` has to be enabled on the server in order to activate the RPC endpoint and `auto_encrypt.tls` on the client that wants to provision certificates automatically. If the `-server-port` is not the default one, it has to be provided to the clients as well. Usually this is discovered through serf, but `auto_encrypt` is happening earlier and cannot use that information. The most secure `auto_encrypt` setup is when the client is provided with the CA, `verify_server_hostname` is turned on, and when ACL are enabled and a token with `node.write` permissions. It is also possible to use `auto_encrypt` with a CA and ACL, but without `verify_server_hostname`, or only with a ACL enabled, or only with CA and `verify_server_hostname`, or only with a CA, or finally without a CA and without ACL enabled. In any case, the communication to the `auto_encrypt` endpoint is always TLS encrypted. Defaults to false. | ||
|
||
* <a name="allow_tls"></a><a href="#allow_tls">`allow_tls`</a> enables connect and the endpoint that is serving the CA and the certificates for `auto_encrypt` to the clients. If disabled, the endpoint is not available and a client configured with `auto_encrypt.tls` cannot start. If enabled, the server starts to accept the manual and the connect CA for incoming connections as well as their corresponding certificates. It will always only present its manually configured CA and certificate though, which the client can verify using the CA it got from `auto_encrypt` endpoint. Defaults to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this setting above auto_encrypt.tls
since auto_encrypt.tls
will be affected by auto_encrypt.allow_tls
.
|
||
The following sub-keys are available: | ||
|
||
* <a name="tls"></a><a href="#tls">`tls`</a> makes the client request CA and certificates for encrypting RPC communication from the servers provided in `-join` or `-retry-join`. It only works if one of these provided addresses is a server `auto_encrypt.allow_tls` has to be enabled on the server in order to activate the RPC endpoint and `auto_encrypt.tls` on the client that wants to provision certificates automatically. If the `-server-port` is not the default one, it has to be provided to the clients as well. Usually this is discovered through serf, but `auto_encrypt` is happening earlier and cannot use that information. The most secure `auto_encrypt` setup is when the client is provided with the CA, `verify_server_hostname` is turned on, and when ACL are enabled and a token with `node.write` permissions. It is also possible to use `auto_encrypt` with a CA and ACL, but without `verify_server_hostname`, or only with a ACL enabled, or only with CA and `verify_server_hostname`, or only with a CA, or finally without a CA and without ACL enabled. In any case, the communication to the `auto_encrypt` endpoint is always TLS encrypted. Defaults to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if users don't have a server listed in join
or retry_join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't work. Meaning your agent won't be able to start up ever, and it keeps retrying indefinitely.
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intro
This PR introduces automatic certificate provisioning for consul clients to communicate securely via RPC. The goal is to not provision client certificate and key, but get that from the servers instead. Depending on the client configuration, the setup is as secure as manually provisioning certificates or less secure.
verify_server_hostname
enabledverify_server_hostname
not providedEven with an insecure configuration, the setup ends up being more secure because TLS is enabled nevertheless.
Implementation
New RPCType
When calling
AutoEncrypt.Sign
, the client has a CA at best, maybe not even that. But this endpoint has to have TLS enabled, no matter what. Since this is different than every other existing RPC a new RPCType was necessary:RPCTLSInsecure
. On the server side every TLS connection is accepted and no client cert is required. The client will establish the TLS connection even though it might not have the server CA and certainly doesn't have a client cert.New RPCEndpoint
There is a new Endpoint
AutoEncrypt.Sign
which returns the combined result ofConnectCA.Roots
andConnectCA.Sign
. This is the only endpoint so far that always requires TLS and that accepts TLS connections without a client certificate.Client Startup
If
auto_encrypt.tls
is set on the client, it will retry indefinitely to get certificates from the server. If it fails, it will never continue startup. Because the operator set the configuration, we assume TLS is wanted and the client won't proceed without it. The client does not rely on discovering servers with serf, but will use every address in-join
and-retry-join
. That means one of these addresses needs to be a server in order to allow the client to fetch certificates.Also the Consul startup banner now announces that the agent is about to start instead of the successful start. I had to change that in order to be able to display the attempts to reach the
AutoEncrypt
endpoint.Configuration
Prerequisites
TLS needs to be turned on for RPC on the server. (has now its own warning, so that this configuration is invalid)
New options
This PR introduces a new option for servers:
And one for clients:
Enabling
auto_encrypt.tls
will have the following implications:verify_outgoing
is assumedverify_outgoing
is enabledverify_server_hostname
and the client will turn that on if it enabled in the response. if it is turned on on the server, it will be turned on for the clients.If the
server_port
is not the default8300
it has to be provided to the agent. Sinceauto_encrypt
is potentially happening before the agent joins the serf cluster, it cannot rely on that data coming from serf.Example configurations
Most secure auto_encrypt setup with ACL token and CA for the client.
Less secure setup without ACL, but still with CA:
Least secure setup without ACL and without CA:
verify_incoming
This PR sets up TLS between client and server, it only affects
verify_outgoing
on the client side.Todo
Server
maybe choose a better name forAutoEncrypt.Sign
andInsecure
verify_server_hostname
configurationClient
if CA present, enableverify_server_hostname
make sureverify_incoming_rpc
andverify_incoming_https
have a higher precedence than the "imported"verify_incoming
.start watching only after serf came upverify_server_hostname
withca_{file,path}
works.queryServer
intoclient.go
and useCallWithCodec
agent.Start()
which is too late. Needs thinking how to solve that.Docs