Modifying Connect CA config attribute results in Envoy TLS connection failures #16580
Labels
theme/certificates
Related to creating, distributing, and rotating certificates in Consul
theme/consul-vault
Relating to Consul & Vault interactions
Overview of the issue
I am attempting to modify the connect CA attribute "CSRMaxPerSecond" from the default value of 50. Our clusters use vault as the connect CA, and Envoy as the proxy. Attempts to modify this setting have left the system in a state where any new registrants to the mesh after the config change are unable to establish TLS connections with peers.
It is possible that I have the format/syntax incorrect for that settting, but I'll also note that re-applying the original ca config without the CSRMaxPerSecond line does not fix the condition.
Consul Info
consul version: 1.14.4
envoy version: 1.23.1
Reproduction Steps
This is the process I have followed:
consul connect ca get-config > config.json
to get the existing configuration"CreateIndex"
and"ModifyIndex
" fields from the json file."Config"
block of the json file with"CSRMaxPerSecond": 100
consul connect ca set-config -config-file config.json
This is the config.json file being set with the updated CSRMaxPerSecond setting:
The only logs generated from the leader server when this is applied are:
connect.ca: CA provider config updated
and
connect.ca.vault: Successfully renewed token for Vault provider
From that point forward, any new registrants to the connect service mesh cannot establish TLS connections with other mesh participants, all connection attempts result in Envoy generating an error:
SSL routines:OPENSSL_internal:TLSV1_ALERT_UNKNOWN_CA
.I have pulled the Envoy configs from a working instance and a broken instance, and they have the same certificate chain info in the
common_tls_context -> validation_context -> certificate_chain -> inline_string
field. Consul server and client agents don't generate any error logs from what I can see.This was attempted on two distinct consul/mesh clusters with the same results. All consul clusters are operated as a single datacenter.
Rollback Steps
In an attempt to rollback the steps below attempted in the following order:
consul connect ca set-config
passing in the configuration generated from my originalget-config
request, withoutCSRMaxPerSecond
. This did not work.set-config
that includedCSRMaxPerSecond
. This did not work.If the syntax I used for
CSRMaxPerSecond
is incorrect I would appreciate guidance on that, however the fact that a snapshot restore and leadership change is required to restore service is not ideal.Operating System
Centos 7, Servers running on EC2 and clients a mix of EC2 and EKS pods
The text was updated successfully, but these errors were encountered: