Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault default parallelism with Consul backend exceeds new Consul 1.6.3 default limits #7284

Closed
ncabatoff opened this issue Feb 12, 2020 · 2 comments · Fixed by #7289
Closed
Assignees
Labels
type/bug Feature does not function as expected
Milestone

Comments

@ncabatoff
Copy link

Overview of the Issue

Since #7159, there's now a default limit of 100 simultaneous HTTP connections coming from a single IP. When Vault is using Consul as a backend, by default its max_parallel setting allows it to open up to 128 connections to Consul. When Vault exceeds the connection limit, it sees it as a backend failure, leading it to sealing itself.

Reproduction Steps

Steps to reproduce this issue:

I haven't tried to reproduce it directly, though it's my analysis of logs after a failure that led me to conclude this is the problem. I expect it could be reproduced by inducing a high number of parallel requests to Vault with the default Consul/Vault settings. When I observed it was during the lease restoration phase of Vault startup, which runs 64 parallel requests. I imagine there must have been additional traffic to account for the remaining 37 requests that would be needed to exceed the limit of 100, but I don't know yet where that's coming from, since Vault isn't unsealed while the restoration is ongoing. Either way I'm confident that this new 1.6.3 behaviour will be problematic for some existing Vault+Consul users.

Consul info for both Client and Server

Client info

Vault 1.2.3+ent.

Server info

I don't have direct access to the system in question, but this was observed multiple times with Consul 1.6.3, and does not manifest with Consul 1.6.1.

Operating system and Environment details

Best I can do right now is "linux, amd64".

Log Fragments

vault: 2020-02-11T12:46:53.984-0600 [ERROR] expiration: error restoring leases: error="failed to read lease entry XXXX: Get http://127.0.0.1:8500/v1/kv/vault/sys/expire/id/XXXX: read tcp 127.0.0.1:50460->127.0.0.1:8500: read: connection reset by peer"
vault: 2020-02-11T12:38:36.634-0600 [ERROR] expiration: error restoring leases: error="failed to read lease entry YYYY: Get http://127.0.0.1:8500/v1/kv/vault/sys/expire/id/YYYY: EOF"
@banks banks added the type/bug Feature does not function as expected label Feb 12, 2020
@banks banks added this to the 1.7.x milestone Feb 12, 2020
@banks
Copy link
Member

banks commented Feb 12, 2020

FYI @ncabatoff if you can access Consul logs while this is happening we explicitly log when we reset connections due to that limit which could confirm your hypothesis further.

That said, I'm pretty sure you're right and we should increase the default to allow this case.

A work around for the interim is that operators can change config in Consul (and presumably Vault) to match but we should make the compatible by default.

@ncabatoff
Copy link
Author

I didn't see that in the logs, but then there were sporadic gaps due to Suppressed 2317 messages from /system.slice/consul.service so who knows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants