Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support Let's Encrypt secret engine #4950

Closed
tamalsaha opened this issue Jul 18, 2018 · 39 comments
Closed

Feature Request: Support Let's Encrypt secret engine #4950

tamalsaha opened this issue Jul 18, 2018 · 39 comments
Labels
community-sentiment Tracking high-profile issues from the community core/secret feature-request

Comments

@tamalsaha
Copy link

tamalsaha commented Jul 18, 2018

We would like to be able to issue ssl certificates from Let's Encrypt using Vault and auto refresh then when certs are about to expire. Do you think Vault can support this as a secret engine?

We will be interested in contributing this feature if acceptable.

@jefferai
Copy link
Member

How exactly would this work? What will Vault use to authorize all of the different names to LE?

@tamalsaha
Copy link
Author

tamalsaha commented Jul 19, 2018

@jefferai, for DNS challenges Vault can do all the parts. This is also required for issuing wild card certificates. User provides the domain names and the DNS provider (Route53, etc) credential. Vault setups the TXT record needed to pass domain validation.

For HTTP challenge, I am not sure. The domain validation happens via responding to a well-known path . Vault could potentially expose this path and user's have to manually setup their LB (nginx/haproxy) to expose under their domains . If user is running in Kubernetes, they have to just configure their Ingress object to pass well known paths to Vault.

We are using https://github.com/xenolf/lego project as a library in our Voyager . We struggled with dealing with LE's rate limiting. But I think we have a good handle on it now.

My general motivation is to make Vault the center piece for secret management for Kubernetes (not official project). We have written an operator, making Vault as db secret manager, looking into adding a CSI driver and handle TLS secrets. Kubernetes has a crazy maze of mTLS. I feel life will be easier for ops folks if there was a central place to deal with TLS stuff.

@munnerz
Copy link

munnerz commented Jul 28, 2018

I am not a Vault expert, but from what I understand, no other secret backend has to deal with a similar type of authorisation in order to issue secrets.

As you've already noted @tamalsaha, it can be difficult to deal with rate limits properly & the wide variety of HTTP servers, and even differing APIs and authentication mechanisms involved with ACME can quickly become very complex.

From what I understand, vault tries to keep this kind of code, either listening for unauthenticated connections or calling out to many different types of APIs to a minimum, given the sensitivity of data stored in Vault. Solving HTTP at least would involve exposing Vault on the public internet (for LE/public acme servers)

Beyond that, I think waiting until the ACME draft spec has stabilised would be wise - there are currently still a number of divergences between Let's Encrypt and the ACME spec (these are gradually going away!) which could cause quite a lot of backwards compatibility pain.

My general motivation is to make Vault the center piece for secret management for Kubernetes (not official project) ....
I feel life will be easier for ops folks if there was a central place to deal with TLS stuff.

Regarding kubernetes specifically, and a shameless self plug, but that is one of the main drivers/goals for cert-manager, albeit we don't have a well defined charter that notes this anywhere 😄 although we don't yet have a CSI plugin or flex volume plugin.

@jefferai
Copy link
Member

All well said, @munnerz . We generally take the approach of making Vault a one stop shop for secrets needs within your Intranet, and as such what we often find are people using Vault for easy deployment of certificates to apps and machines communicating with each other and LE for anything on the edge. It's a reasonable model given the different goals of the product.

FWIW we have many endeavors going on to make it easier to use Vault when on Kubernetes. The automatic authentication provided by Vault Agent that was just released is a step towards that. We plan to eventually have a sidecar and/or CSI container mixing Vault and Consul-Template to make it easy to do the rest.

@tamalsaha
Copy link
Author

tamalsaha commented Jul 31, 2018

@jefferai , Does this mean that LE based certificate issuing should be outside the scope of Vault ?

Regarding We plan to eventually have a sidecar and/or CSI container mixing Vault and Consul-Template to make it easy to do the rest. Do you have any issue that we can follow? We have been interested in that too kubernetes/kubernetes#66362 . If Vault project wants to do this, we can wait :) .

@jefferai
Copy link
Member

@tamalsaha They're really quite different paradigms. I don't know the ACME spec well enough to know whether it's feasible at all in Vault, but we generally have viewed Vault's PKI capability and LE as complementary, rather than replacements for each other. Vault is significantly more flexible but not really suited for issuing publicly-trusted certs, and vice versa.

Re an init/sidecar/CSI container, there's no issue to follow. All I can tell you right now is that it will happen, but I can't give any concrete timeline. The release of Vault Agent was a step made very specifically in that direction, however (as well as other directions, of course) as it's a key component of any such solution.

@jefferai
Copy link
Member

To clarify: an init/sidecar container is definitely on our roadmap. CSI is something we'd like to support but as you are probably aware from kubernetes/kubernetes#64984 there are still gaps.

@tamalsaha
Copy link
Author

Thanks @jefferai. If there is any way we can contribute, we will be interested.

@remilapeyre
Copy link
Contributor

Hi, I'm having the same issue as the author, we would like to use Vault to distribute Let's Encrypt signed certificates to our services. Using Vault's PKI engine means we would need to distribute the root CA to all our users which would be OK for our servers but is pretty inconvenient and difficult to do securely for our users.

While we could use certbot, this would mean we would have to give the API key to our DNS provider to every service that need a TLS cert but as CloudFlare (and many others) does not support granular permissions we would like to avoid that.

We are currently using https://www.terraform.io/docs/providers/acme/index.html which solves the signing part but not how to distribute the cert to the service nor how to renew them.

It seems to me that making Vault use the DNS Acme challenge would be great for this, it should be easy enough as most of the work has already be done in lego and the Acme terraform provider.

I don't think implementing the HTTP challenge would be a great idea, in most of the deployment Vault is not accessible from the Internet, does not listen to 80/443 and this would require cooperation from the load-balancers which would make the configuration complex.

If this seems good to you, I will start working on an implementation for this.

@remilapeyre
Copy link
Contributor

BTW #4362 is related to this and has 46 👍

@mholt
Copy link

mholt commented Oct 16, 2019

May I recommend using CertMagic for this?

It's the same well-vetted library used by the Caddy Web Server, and it is the most mature ACME client implementation in Go that is available.

It also supports pluggable storage, so you could have CertMagic store certificates directly in Vault.

It also coordinates certificate management in a cluster, as long as all instances in the cluster use the same storage configuration (i.e. a Vault instance).

You could also side-car Caddy if you prefer an external solution, but I'd strongly recommend bundling it directly into the application as a library whenever possible.

@p3lim
Copy link

p3lim commented Oct 21, 2019

I'm currently developing a secrets plugin for Vault using lego. In my brief overview of certmagic I found that the pluggable storage adds unnecessary complexity to the logic. It seems more targeted at web developers, just wrapping the logic of lego (no offense intended, it looks like a great project, just not suited for this in my opinion).

I'm currently writing tests for the plugin, the basic functionality (register, obtain, renew and revoke) is in. It only supports dns-01 right now, and I'm unsure about supporting other challenges (like http-01 and tls-alpn-01) due to the fact that you'd have to expose the server running Vault.

@mholt
Copy link

mholt commented Oct 21, 2019

@p3lim Lego is a raw ACME client library -- it simply facilitates the ACME protocol for you. It has methods like Register() and Obtain(). The difference is that CertMagic is all about keeping certificates renewed in the long run: while your server/process is running, ensure certificates stay renewed and that your TLS config can always serve the current certificates, without downtime.

So for anything long-running (and not a once-and-done command, for instance), use CertMagic.

In my brief overview of certmagic I found that the pluggable storage adds unnecessary complexity to the logic.

Can you explain what you mean? CertMagic's storage interface basically just requires Load, Store, List, Delete, and Lock/Unlock.

@remilapeyre
Copy link
Contributor

Hi everybody, I published my current implementation for this at https://github.com/remilapeyre/vault-acme.

It does most of what was asked for in this thread and we've been using it for some days and it seems to work fine.

Regarding CertMagic, I'm not sure I see what it may add to Lego that we need. One thing I needed from Lego that is not implemented is the ability to update an ACME account but I should be able to contribute that during next week. Lego is also the library used by the Terraform ACME provider and it may make maintenance easier to share the same library in Terraform and Vault

A few notes regarding the current plugin (I'm probably will forget plenty of things and will complete this later):

  • Contrary to the other secret backends in Vault, the ACME provider may implement arbitrary rate limits, Let's Encrypt document those at https://letsencrypt.org/docs/rate-limits/. This means that we may have to implement some cache for this backend to reduce the CSR sent to the provider, during the development I did hit those limits a few time. This would also help regarding the latency as settings the DNS records, waiting for the DNS propagation, asking the ACME provider to validate them, generating the CSR and sending it to the provider can take some time. During our tests and in our internal deployment it has been can fast though, generally a few seconds.
  • I did not implement support tls-alpn-* challenges as those where withdrawn from the RFC after some vulnerabilities where found in them.
  • I did not implement support for HTTP challenges as this would require to expose Vault to the Internet or have some convoluted way to set an external webserver. If we wish to add support for something like this, I think we may implement a way to use an external program to do this like the external program provider does.

I'm looking forward for feedback regarding this first implementation, let me now if you try it :)

@mholt
Copy link

mholt commented Oct 26, 2019

@remilapeyre Great, I am looking forward to taking a closer look when I have a chance.

A few points of initial feedback:

  • I'm glad to see you're using lego at least, that's definitely the best low-level ACME implementation in Go.

This means that we may have to implement some cache for this backend to reduce the CSR sent to the provider, during the development I did hit those limits a few time.

  • I'd highly recommend reading this draft of an upcoming document which advises best practices for ACME clients: https://github.com/https-dev/docs/blob/master/acme-ops.md -- it's written in collaboration with Google, Let's Encrypt, the EFF, and some others. It will especially help with regards to rate limits, error handling, and improving uptime.

I did not implement support tls-alpn-* challenges as those where withdrawn from the RFC after some vulnerabilities where found in them.

  • I strongly advise you support the TLS-ALPN challenge. It is the only challenge that works over port 443 (the TLS port), and is required if port 80 is not available or if the HTTP challenge has trouble getting a certificate. Lego does not support the TLS-SNI challenge, which is deprecated.

I did not implement support for HTTP challenges as this would require to expose Vault to the Internet or have some convoluted way to set an external webserver.

  • You'll have to expose something on port 443 for the TLS-ALPN challenge, too. With either the HTTP or TLS-ALPN challenges, you have to expose your ACME client on port 80 or 443, respectively. The only challenge type that does not require any external access is the DNS challenge, which requires integration with the DNS provider to automatically change zone files (or do it manually).

If you have in fact disabled the HTTP and TLS-ALPN challenges, it sounds like the only challenge that is enabled is the DNS challenge (?), which is fine if necessary, but it's also the challenge that requires the most configuration and has the most moving parts. Users should be aware that this DNS integration is required.

Does vault-acme automatically renew certificates too? Or does an outside program/user/lib trigger the reload?

@remilapeyre
Copy link
Contributor

Thanks @mholt, I will update my secret backend based on your feedbacks.

I'd highly recommend reading this draft of an upcoming document which advises best practices for ACME clients: https://github.com/https-dev/docs/blob/master/acme-ops.md

Here's what will need to be changed to match the recommendations:

  • Upon obtaining a certificate, immediately write it to persistent storage.
    Currently each client requesting a certificate for foo.com will get a new one. This is the usual behavior for Vault secrets backends and this mean I didn't need to save the certificate.

  • If a valid certificate already exists in storage, use that one instead of obtaining a new one.
    Since I didn't save certificates on persistent storage, I could not do this. This is in line with the rest of the secrets engine that give a new secret on each call so their usage can be tracked and they can be revoked separately. Since this new secrets engine uses resources from the ACME provider, it makes sense to change the behavior and have caching. I will add a disable_cache attribute on the acme/roles/:role resource that defaults to false so cache will be enabled by default, and users that want to disable it can do so.

  • Renew certificates after ⅔ of usable lifetime.
    Currently, the TTL of the lease associated to the certificate is min(TTL of certificate, max TTL of the backend). When the secret is actually renewed depends on the client, for example Consul Template will renew the secret when it reaches 90% of the lease TTL which is more than the 2/3 recommended. To improve this I will add a cache_duration_ratio that defaults to 0.7 and make the lease TTL min(cache_duration_ratio * TTL of certificate, max TTL of the backend). This should make clients renew the certificate when it reaches around 2/3 of its lifetime.

I strongly advise you support the TLS-ALPN challenge. It is the only challenge that works over port 443 (the TLS port), and is required if port 80 is not available or if the HTTP challenge has trouble getting a certificate.

I mistook TLS-ALPN for TLS-SNI as I had not seen the RFC that introduced TLS-ALPN.

Vault won't be able to handle those without substantial changes as secrets engine cannot use the .well-known/acme-challenge/ our set the content-type and the response body. Furthermore it is not recommended for Vault to listen for external traffic.

I think this could be achieved with a simple sidecar that handles .well-known/acme-challenge/ and looks in Vault to respond to the challenge. I will write it in the next week.

Does vault-acme automatically renew certificates too? Or does an outside program/user/lib trigger the reload?

Vault ACME does not renew certificates before a client asks for it when a lease expires. All secrets given by Vault have an associated lease though so they should already know to get a new secret when the current one expires.

@mholt
Copy link

mholt commented Oct 31, 2019

Vault ACME does not renew certificates before a client asks for it when a lease expires. All secrets given by Vault have an associated lease though so they should already know to get a new secret when the current one expires.

I see, so this essentially acts as liaison between Vault clients and an ACME service, establishing 1:1 functions for Obtain() and Renew() (etc).

In other words, it is up to the client to take care of certificate management, and this tool only provides the functions to do ACME transactions. (Correct?)

If so, then hopefully, whatever clients use this tool will abide the best practices. :)

I think this could be achieved with a simple sidecar that handles .well-known/acme-challenge/ and looks in Vault to respond to the challenge. I will write it in the next week.

Any way that Caddy or CertMagic can help here? Their storage implementation is pluggable, so they could dump the certificates directly into Vault.

@remilapeyre
Copy link
Contributor

In other words, it is up to the client to take care of certificate management, and this tool only provides the functions to do ACME transactions. (Correct?)

This is the idea yes.

If so, then hopefully, whatever clients use this tool will abide the best practices. :)

The Vault operator can still enforce some of those at the server level and I added cache support in remilapeyre/vault-acme@bd95891

Any way that Caddy or CertMagic can help here?

I don't think so, a huge advantage of having the plugin generate the cert is that it is mlock-ed and only speak to Vault through a secure channel with temporary credentials only usable once. We would loose some of the benefits by using Caddy or CertMagic.

@remilapeyre
Copy link
Contributor

@mholt I've finally added support for the HTTP-01 and TLS-ALPN-01 challenges in remilapeyre/vault-acme@6ac7a95 so all basic functionality should now be present.

We are not using TLS-ALPN-01 and HTTP-01 in our infra so they are less tested than DNS-01. We've been using DNS-01 for a few weeks now without issues, except when we hit Let'sEncrypt rate limits once before I implemented the cache.

I will look at the code in the next days to see what can be simplified, better documented but I think it may be ready for a cursory review.

I also plan to make a few changes in the next days:

  • There can only be one contact for an account and it must be an email, this is too restrictive according to the RFC, I need to make changes in Lego for this which I will do once Add support to update account go-acme/lego#1002 is merged.
  • The DNS-01 challenge solver currently support one DNS provider, I would be useful to make it work with multiple DNS providers e.g. cloudflare,route53. This is already the behavior of the Terraform ACME provider so this should not be an issue.
  • I'll implemented the support to update accounts once Add support to update account go-acme/lego#1002 is merged.

@heri16
Copy link

heri16 commented Nov 18, 2019

Looking forward to support for more DNS providers. What's needed to get it working?

@remilapeyre
Copy link
Contributor

All DNS providers supported by Lego are supported, I need to document this, you can look at the list at https://www.terraform.io/docs/providers/acme/dns_providers/index.html. So far it's only possible to use environment variables for the credentials.

If you are using a provider not yet configured, adding it to go-acme/lego would be the path forward.

@heri16
Copy link

heri16 commented Nov 19, 2019

Not sure how the environment variables are passed into vault. The relevant document is here?

https://github.com/remilapeyre/vault-acme/blob/master/website/source/docs/secrets/acme/index.html.md

@remilapeyre
Copy link
Contributor

Vault's environment variables are given to the plugin, so it depends on the way you are running them but doing export AKAMAI_ACCESS_TOKEN=... before running Vault, using AKAMAI_ACCESS_TOKEN=... vault server -config-file=vault.hcl or docker run -e AKAMAI_ACCESS_TOKEN=... vault should work.

@binlab
Copy link
Contributor

binlab commented Dec 24, 2019

@remilapeyre nice work!

@karl-tpio
Copy link

karl-tpio commented Mar 9, 2020

@remilapeyre Thank You. We use Auto Scaling EXTENSIVELY here and all "people facing" applications use Lets Encrypt for certificates. For many reasons, these application hosts can't use the web based challenge for Lets Encrypt, so we rely exclusively on the DNS based challenge.

It's unacceptable to give each auto-scale host permission to edit the entire DNS Zone and even if they all did have this permission, we'd quickly hit the rate-limits (i think it's 7 certs in 7 days?) due to auto scale activity.

I was about to embark on building a Lets Encrypt cache service using a small client based daemon i'd have to write and MQTT to link that client-daemon to some small service that i'd also have to write which would use S3/KMS and or Secure Parameter Store to cache the certificates and do the DNS challenge.

I'm so glad I checked to see if there was already any work around this subject involving Vault. You've saved me a ton of headache w/r/t re-inventing the wheel!

@kfox1111
Copy link

kfox1111 commented Mar 9, 2020

we are using acme-dns (https://github.com/joohoi/acme-dns) to restrict external dns for handshaking to limited records to individual users in a private environment. Seems to work ok.

@remilapeyre
Copy link
Contributor

@karl-tpio thanks for the feedback. Keep in mind that while there is tests and that I should fix all bugs shortly, it has not been reviewed so far. I would love to get your feedback and fix any issue you find though.

@weitzj
Copy link

weitzj commented Jul 15, 2020

Hi. I am looking for the other side:

Having vault implement the acme server protocol. So I can just use Caddy/Traefik/... with vault as an ACME server to issue certificates from a private CA.
Does anybody have a hint for me?

@remilapeyre
Copy link
Contributor

I think the support of the ACME protocol as a server was previously discussed and deemed out of scope as it is very different that the way Vault currently work. ACME suppose that you are unauthenticated and use a DNS or HTTP protocol to make sure you have access to the domain names you claimed while Vault has authentication built-in and only has an API.

I think the best way to use the Vault private CA with Traefik and Caddy might be to use vault agent template to fetch and renew the SSL certificates.

@nvx
Copy link
Contributor

nvx commented Jul 15, 2020

At any rate #8690 would be the issue to track for that

@mholt
Copy link

mholt commented Jul 15, 2020

@remilapeyre @weitzj Even if Vault does not become an ACME server, I can suggest a couple of ways Vault can be used with Caddy, at least (this thread is two years old; and things have advanced now that Caddy 2 is released):

There are numerous options here to support all sorts of use cases. I'm sure Vault could be compatible with at least one or two of these.

@mohag
Copy link

mohag commented Aug 19, 2020

@remilapeyre's vault-acme looks like what I want, except that the DNS provider credentials should be inside vault...

My main reason for this is that my DNS provider is Cloudflare, which has very wide API permissions - I really don't want a token / API key lying around on several servers with the ability to edit / delete every DNS record on a domain. Keeping it in vault is one option, just getting the certificate out is better....

(I don't care about HTTP-01 or TLS-APLN-01 support. In cases that that is an option, certbot or cert-manager (on Kubernetes) tends to work fine. The use cases with traffic actually hitting vault for the hostname also seems quite limited. DNS-01 is much more useful in a more centralised (onto vault) application)

@remilapeyre
Copy link
Contributor

@remilapeyre's vault-acme looks like what I want, except that the DNS provider credentials should be inside vault...

Hi @mohag, I'm not sure what issue you are referring to exactly. Vault plugins store their data in the Vault secure storage. You may be referring to the fact that the configuration is made using the environment variables, since the v0.0.6 release that went out yesterday, you can now set them in the provider_configuration map when creating the account so if you don't give read permissions to acme/account/name nobody should be able to access them.

The use cases with traffic actually hitting vault for the hostname also seems quite limited

For the HTTP-01 and TLS-ALPN-01 challenges, they are not answered by the Vault directly but by a sidecar utility that you can deploy on the edge of your network and that connect back to Vault to answer the requests.

Let me know if I missed something and this was not the information you were looking for.

If you have some issues while using this backend, please open a new bug report at https://github.com/remilapeyre/vault-acme/

@mohag
Copy link

mohag commented Aug 19, 2020

@remilapeyre Ah, that sounds like a good solution. (I saw some notes about env vars either here or in the docs and was a bit worried about that. With that resolved, it it seems as close to an ideal solution as something that is not built-in can get...)

@OmegaRogue
Copy link

I'm currently in the process on switching my environment from a bodgy combination of proxmox and kubernetes (most things are running in lxc containers in proxmox, and a few things are running in vms in proxmox, for example a microk8s single node cluster running ingress-nginx as a gateway to the outside world, and cert-manager, which at the moment both manages an internal CA chain and provides TLS certs from letsencrypt to ingress-nginx) to mostly hashicorp tools and then maybe a kubernetes cluster for kubernetes workloads (consul instead of ingress-nginx and an attempt to get istio working in the environment, and vault for certificates), because the kubernetes VM currently uses more resources than all the other things running in lxc combined.

For the HTTP-01 and TLS-ALPN-01 challenges, they are not answered by the Vault directly but by a sidecar utility that you can deploy on the edge of your network and that connect back to Vault to answer the requests.

I was wondering if it were possible to optionally use consul to answer HTTP-01, but maybe I'm too used to cert-manager handling everything to not see why that wouldn't work

@heatherezell heatherezell added the community-sentiment Tracking high-profile issues from the community label Oct 12, 2021
@jamroks
Copy link

jamroks commented May 20, 2022

@remilapeyre hello thanks for the work done so far, is it something you're still working on ?

@cipherboy
Copy link
Contributor

Hashicorp Vault doesn't currently support this functionality and has no plans to support Let's Encrypt ACME integration in the near future.

As discussed above in the thread, vault-acme is a community-maintained, third-party plugin that provides the requested functionality. We suggest individuals looking for this functionality consider evaluating this plugin for their use.

@F21
Copy link
Contributor

F21 commented Sep 25, 2023

Any interest in revisiting this and bringing this into vault core? I note that the PKI backend now supports ACME clients, so it would not be a stretch to have vault issue certificates via ACME from letsencrypt and other issuers.

@nneul
Copy link

nneul commented Sep 25, 2023

@F21 A bit off thread, but I recently contributed "acme proxy" capability to the Serles acme server. We considered going with the https://github.com/dvtirol/serles-acme.git but decided having more vanilla clients was a better model. You basically set up serles as an acme client using certbot with DNS validation, and then point other servers to it, using http-01 based validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-sentiment Tracking high-profile issues from the community core/secret feature-request
Projects
None yet
Development

No branches or pull requests