Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying Connect CA config attribute results in Envoy TLS connection failures #16580

Closed
jbrandhorst opened this issue Mar 9, 2023 · 4 comments · Fixed by #16592
Closed

Modifying Connect CA config attribute results in Envoy TLS connection failures #16580

jbrandhorst opened this issue Mar 9, 2023 · 4 comments · Fixed by #16592
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions

Comments

@jbrandhorst
Copy link
Contributor

Overview of the issue

I am attempting to modify the connect CA attribute "CSRMaxPerSecond" from the default value of 50. Our clusters use vault as the connect CA, and Envoy as the proxy. Attempts to modify this setting have left the system in a state where any new registrants to the mesh after the config change are unable to establish TLS connections with peers.

It is possible that I have the format/syntax incorrect for that settting, but I'll also note that re-applying the original ca config without the CSRMaxPerSecond line does not fix the condition.

Consul Info

consul version: 1.14.4
envoy version: 1.23.1

Reproduction Steps

This is the process I have followed:

  1. consul connect ca get-config > config.json to get the existing configuration
  2. Remove the "CreateIndex" and "ModifyIndex" fields from the json file.
  3. Insert a line into the "Config" block of the json file with "CSRMaxPerSecond": 100
  4. Apply the new configuration with consul connect ca set-config -config-file config.json

This is the config.json file being set with the updated CSRMaxPerSecond setting:

{
	"Provider": "vault",
	"Config": {
            "Address": "https://vault.hostname.goes.here:8200",
            "CSRMaxPerSecond": 100.0,
            "IntermediateCertTTL": "8760h",
            "IntermediatePKIPath": "intermediate-path-goes-here",
            "LeafCertTTL": "72h",
            "RootPKIPath": "root-path-goes-here",
            "RotationPeriod": "2160h",
            "Token": "token-goes-here"
	},
	"State": null,
	"ForceWithoutCrossSigning": false
}

The only logs generated from the leader server when this is applied are:

connect.ca: CA provider config updated
and
connect.ca.vault: Successfully renewed token for Vault provider

From that point forward, any new registrants to the connect service mesh cannot establish TLS connections with other mesh participants, all connection attempts result in Envoy generating an error:
SSL routines:OPENSSL_internal:TLSV1_ALERT_UNKNOWN_CA.

I have pulled the Envoy configs from a working instance and a broken instance, and they have the same certificate chain info in the common_tls_context -> validation_context -> certificate_chain -> inline_string field. Consul server and client agents don't generate any error logs from what I can see.

This was attempted on two distinct consul/mesh clusters with the same results. All consul clusters are operated as a single datacenter.

Rollback Steps

In an attempt to rollback the steps below attempted in the following order:

  1. Run consul connect ca set-config passing in the configuration generated from my original get-config request, without CSRMaxPerSecond. This did not work.
  2. Forced a consul leadership change. This did not work.
  3. Restore from a consul snapshot taken before the original set-config that included CSRMaxPerSecond. This did not work.
  4. Forced a consul leadership change. This finally did work, any new mesh registrants can connect.

If the syntax I used for CSRMaxPerSecond is incorrect I would appreciate guidance on that, however the fact that a snapshot restore and leadership change is required to restore service is not ideal.

Operating System

Centos 7, Servers running on EC2 and clients a mix of EC2 and EKS pods

@jkirschner-hashicorp jkirschner-hashicorp added the theme/certificates Related to creating, distributing, and rotating certificates in Consul label Mar 9, 2023
@kisunji kisunji added the theme/consul-vault Relating to Consul & Vault interactions label Mar 9, 2023
@kisunji
Copy link
Contributor

kisunji commented Mar 9, 2023

Thanks for raising this issue.

For reference, could you share your Vault version as well?

After some investigation I found a related issue Updating connect Vault CA configuration drops intermediate certificates which seems to show the same behaviors after modifying the Vault Provider's CA config.

We will get this triaged with the team and update you on its status.

@jbrandhorst
Copy link
Contributor Author

Excellent, thank you @kisunji. We are running Vault v1.9.3.

@kisunji
Copy link
Contributor

kisunji commented Mar 14, 2023

This should be resolved with the next patch releases for v1.15.2, v1.14.6, v1.13.8 which should be released sometime this week.

@jkirschner-hashicorp
Copy link
Contributor

Hi @jbrandhorst,

Since v1.15.2, v1.14.6, and v1.13.8 have been released with a fix, I'm closing this issue for now. If you try one of those releases out but the behavior persists, please let us know and re-open this issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-vault Relating to Consul & Vault interactions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants