Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LETSENCRYPT_MIN_VALIDITY variable #485

Merged
merged 3 commits into from
Jan 8, 2019
Merged

Add LETSENCRYPT_MIN_VALIDITY variable #485

merged 3 commits into from
Jan 8, 2019

Conversation

Greek64
Copy link
Contributor

@Greek64 Greek64 commented Dec 22, 2018

This PR adds another environment variable called LETSENCRYPT_MIN_VALIDITY, that can be used to set the minimum validity of certificates per container.

Closes #477

@buchdag
Copy link
Member

buchdag commented Dec 23, 2018

LETSENCRYPT_MIN_VALIDITY should be capped a bit lower than 90 days, because capping at 90 days will have the same end result as no capping at all: the container will attempt to renew the certificate on every run of the service loop and will hit LE rate limits pretty fast.

Considering LE rate limiting on this is 5 renewal / week for a given cert, 7655040 seconds (7776000 - ( ( 24 * 7 * 60 * 60 ) / 5 ) seems to be the highest sensible capping.

88 days / 7603200 seconds, ie taking a small security margin over LE rate limit and rounding it to the nearest whole day would also make sense.

README.md Outdated Show resolved Hide resolved
@buchdag buchdag changed the title Add LETSENCRYPT_MIN_VALIDIDTY variable Add LETSENCRYPT_MIN_VALIDITY variable Dec 23, 2018
@Greek64
Copy link
Contributor Author

Greek64 commented Dec 23, 2018

LETSENCRYPT_MIN_VALIDITY should be capped a bit lower than 90 days, because capping at 90 days will have the same end result as no capping at all: the container will attempt to renew the certificate on every run of the service loop and will hit LE rate limits pretty fast.

Yeah, originally I wanted to cap it to a lower value for the stated reasons, but then comes the problem of testing.
How can the test unit test this feature by "simulating" a passing of 48 hours in a reasonable time?
I could add special exception cases for the test case (not a fan of this), but then who tests this exception code?

Considering LE rate limiting on this is 5 renewal / week for a given cert

According to their site, what you are referring to are the Duplicate Certificate rate limits. But they also have a Renewal Exemption (which applies to Duplicate Certificates).
I quote:

To make sure you can always renew your certificates when you need to, we have a Renewal Exemption to the Certificates per Registered Domain limit. Even if you’ve hit the limit for the week, you can still issue new certificates that count as renewals.

I also originally wanted to add also a lower bound, to prevent unwanted downtime due to misconfiguration, since the check is triggered hourly and the user could give a period smaller than 3600 seconds. But I wanted to first discuss this with you.

@buchdag
Copy link
Member

buchdag commented Dec 23, 2018

My understanding is that Duplicate and Renewal are considered to be the same from a rate limit perspective:

This is the same definition used for the Duplicate Certificate limit described above. Renewals are still subject to the Duplicate Certificate limit.

So (from what I understood) each pre-existing discrete certificate (same set of domains) can be renewed up to five times a week. @cpu did I get this right ?

Yeah, originally I wanted to cap it to a lower value for the stated reasons, but then comes the problem of testing. How can the test unit test this feature by "simulating" a passing of 48 hours in a reasonable time?

Rigth, I did not think about this. User adjustable MAX_VALIDITY perhaps ?

@Greek64
Copy link
Contributor Author

Greek64 commented Dec 23, 2018

So (from what I understood) each pre-existing discrete certificate (same set of domains) can be renewed up to five times a week. @cpu did I get this right ?

From what I understood, the first 5 renewals count as Duplicates and the rest as normal renewals. (In the sense that Duplicate certificates can coexist [new key], but renewed certificates not [same key]).
Else this example given further down does not really make sense to me:

You can issue 50 certificates in week 1, 50 more certificates in week 2, and so on, while not interfering with renewals of existing certificates.

Nevertheless I would be interested in what @cpu has to say.

Rigth, I did not think about this. User adjustable MAX_VALIDITY perhaps ?

Well, the thing is that LE CA always gives certificates with a lifespan of 90 days (not adjustable through simp_le). So even when adjusting MAX_VALIDITY we would have to somehow trick simp_le to think that the period given through --min_valid has been reached.

Nevertheless I was about to suggest user adjustable MAX_VALIDITY for an entirely different reason.
Namely, as it currently stands, we make the assumption that all ACME compliant CAs issue certificates with a lifespan of 90 days. The --force-renew function is based upon this assumption.
But as we allow the user to define an ACME CA other than LE through the ACME_CA_URI environment variable - and we cannot guarantee what lifetimes these CAs will issue - a user may run into unexpected behavior when doing --force-renew on a different CA.
So my suggestion would be to have a user adjustable MAX_VALIDITY variable that must be set by the user whenever the ACME_CA_URI is set.

@Greek64
Copy link
Contributor Author

Greek64 commented Dec 23, 2018

Another idea would be to modify the localtime of the letsencrypt-companion docker container. (e.g. by mounting a custom localtime inside the container, that we can modify from the outside)
This would not affect anything on the host system.

Merry Christmas, btw (Or Happy Holidays)

@buchdag
Copy link
Member

buchdag commented Dec 24, 2018

Well, the thing is that LE CA always gives certificates with a lifespan of 90 days (not adjustable through simp_le). So even when adjusting MAX_VALIDITY we would have to somehow trick simp_le to think that the period given through --min_valid has been reached.

You might have mixed up things here, or I did. If the MAX_VALIDITY is adjustable, it can default to say 88 days and we can manually set it to 90 days for the test so that the rest of the existing test unit will work as is (and we don't have to fiddle with the container localtime).

Maybe MAX_VALIDITY should be renamed MIN_VALIDITY_CAP, that might be a bit clearer.

Regarding the improved support of other ACME CA I was about to answer that I'm not aware of any other CA offering free certificate through ACME v1 protocol with http-01 challenge, but a quick research seems to indicate that there is now at least one candidate.

@Greek64
Copy link
Contributor Author

Greek64 commented Dec 24, 2018

If the MAX_VALIDITY is adjustable, it can default to say 88 days and we can manually set it to 90 days for the test so that the rest of the existing test unit will work as is (and we don't have to fiddle with the container localtime).

Oh, yes of course. That would work.
But that would mean, that we trust the user to give a value that has already taken account the rate limits (e.g. 88 days instead of 90), IF he sets the variable manually and doesn't use the default value.
Either way it's his fault at this point, as long as we document it.

Regarding the improved support of other ACME CA I was about to answer that I'm not aware of any other CA offering free certificate through ACME v1 protocol with http-01 challenge, but a quick research seems to indicate that there is now at least one candidate.

And according to them they give certificates with a lifetime of 180 days.
So, what do you say? (I could integrate the check in this PR)

@buchdag
Copy link
Member

buchdag commented Dec 26, 2018

we trust the user to give a value that has already taken account the rate limits (e.g. 88 days instead of 90), IF he sets the variable manually and doesn't use the default value.

Maybe it should be documented that this value is (for now) only adjustable for test purpose.

And according to them they give certificates with a lifetime of 180 days.
So, what do you say? (I could integrate the check in this PR)

They also seems to have a limit of one domain per certificate:

This is a Domain Validated (DV) certificate including a single domain name per certificate.

So we'll probably have to add checks for that to if we intend to support this ACME CA as well. I'll do some tests with Buypass Go as soon as possible and we'll handle the "other ACME CA" compatibility in another, later PR.

app/letsencrypt_service Outdated Show resolved Hide resolved
@Greek64
Copy link
Contributor Author

Greek64 commented Dec 27, 2018

Did all the mentioned changes.
What's left now is to discuss the aforementioned lower bound

I also originally wanted to add also a lower bound, to prevent unwanted downtime due to misconfiguration, since the check is triggered hourly and the user could give a period smaller than 3600 seconds.

@buchdag
Copy link
Member

buchdag commented Dec 29, 2018

I'm okay with the lower bound too. 7200 seconds ? More ?

@Greek64
Copy link
Contributor Author

Greek64 commented Dec 29, 2018

The absolute minimum would be 3600 + worst case execution time of update_certs.
So I think 7200 are more than enough.

I will add the lower bound some time later today in this PR.

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 4, 2019

Done
(Just in case you didn't get a push notification)

@buchdag
Copy link
Member

buchdag commented Jan 4, 2019

@Greek64 thanks for the notification. I'm currently bed ridden due to flu, so merge will have to wait a bit. Don't want to merge stuff while my brain is half functional.

Additionally I thought ~ one week ago that maybe the adjustable MIN_VALIDITY_CAP for test purpose wasn't the cleanest idea : we could just change the test boulder config to issue certificates with a 88 days validity. We're already touching the config to change ports for the http-01 challenge back to the usual 80/443 and whitelist the test domains, so adding another change should be easy.

Your thought on this ?

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 4, 2019

I'm currently bed ridden due to flu

I'm sorry to hear that.
Get well soon.

we could just change the test boulder config to issue certificates with a 88 days validity

That seems indeed to be the cleanest way. Didn't even think of touching the Boulder...
So, I will redo this PR in the next few days.

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 4, 2019

Although, now that I think about it again, we still have a problem.
If we edit the ca-a.json and ca-b.json the same way we modify the va.json for the ports, the change would be made with sed during the test setup phase.

That means that all tests will get certificates with a validity of 88 days.
And that will conflict with the force_renew test, which assumes a 90 day validity in order for it to work.

If we want to make it with the boulder change we either have to add an exception code inside letsencrypt-service to use a 88 day validity for the force renew command during a test, or restart the boulder with different settings in between the tests.

@cpu
Copy link

cpu commented Jan 4, 2019

I'm catching up on email/github from the holiday period and only quickly skimmed this thread. Apologies if I'm misunderstanding something.

From what I understood, the first 5 renewals count as Duplicates and the rest as normal renewals. (In the sense that Duplicate certificates can coexist [new key], but renewed certificates not [same key]).

The public key of the certificate doesn't have any bearing on whether or not Let's Encrypt considers it a renewal. Certificate B is considered a renewal of Certificate A if and only if the set of subject identifiers (CN and SANs) between A and B are an exact match. If Certificate A and B have the same names but different public keys Certificate B is still a renewal for rate limiting purposes. If Certificate B added or removed a name it would not be considered a renewal.

If we want to make it with the boulder change we either have to add an exception code inside letsencrypt-service to use a 88 day validity for the force renew command during a test, or restart the boulder with different settings in between the tests.

I think testing this in an end-to-end integration testing scenario with Boulder will be tricky for the reasons you describe.

It might be helpful to know that Boulder's existing integration tests take the same approach you describe here (starting Boulder with one set of configs, doing some setup, tearing Boulder down, changing the config, and then restarting Boulder) to simulate the passage of time. See: https://github.com/letsencrypt/boulder/blob/9afa0f7f1967fd4b8c63f4a100a2795461b36609/test/integration-test.py#L722-L737 (setup_seventy_days_ago and setup_twenty_days_ago() run Boulder with the FAKECLOCK configured accordingly).

@buchdag
Copy link
Member

buchdag commented Jan 7, 2019

And that will conflict with the force_renew test, which assumes a 90 day validity in order for it to work.

Hmmm as I view it the test will still work, /app/force_renew script only assume that the default cert validity is X (seconds days or whatever) and run simp_le on the existing certificates with a minimum required validity greater than or equal to X, hence forcing the renewal.

The minimum validity of the /app/force_renew script might as well be 300 days or even 10 years, as long as it's above the default validity of the issued certs (wether 90 days for real LE certs or 88 days for our boulder certs), it would work the same.

We just have to make sure that changes made still allow the /app/force_renew to use a --min-validity of 90 days or above.

This seems to be the same type of confusion as #485 (comment). The different time value relationship and current values (in seconds) is as follow:

0 < A (7200) B (LETSENCRYPT_MIN_VALIDITY) C/D (7603200) < E (7776000) F (7776000)

  • A: lower bound to LETSENCRYPT_MIN_VALIDITY (not user defined)
  • B: user defined or default LETSENCRYPT_MIN_VALIDITY value
  • C: upper bound to LETSENCRYPT_MIN_VALIDITY (not user defined)
  • D: default validity of cert issued by the test suite's Boulder
  • E: default validity of cert issued by Let's Encrypt
  • F: value of the --min-validity flag used by /app/force_renew

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 7, 2019

This seems to be the same type of confusion as #485 (comment).

Yes indeed.
Dunno what's the problem with me...
Thanks for the clear up.

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 8, 2019

Done.
Interestingly some times a wrong certificate is issued during tests and fail, which is fixed by a re-run. (Not related to this PR)
Because the first travis run failed, I just force pushed again to re-trigger travis and the checks now pass.

@buchdag
Copy link
Member

buchdag commented Jan 8, 2019

Yes, I'm aware of this, and I haven't managed to understand why this is happening yet.

@buchdag
Copy link
Member

buchdag commented Jan 8, 2019

Oh, one last thing, I've noticed that for some reason your commit messages are prefixed with a star and a space (* ), I'd prefer that to be removed if possible.

  Allows to specify the minimum validity of certificates
  per container.
Modified Boulder to issue certificates with 88 day lifetime
@Greek64
Copy link
Contributor Author

Greek64 commented Jan 8, 2019

I've noticed that for some reason your commit messages are prefixed with a star and a space (* )

That's just the commit message convention that I use.

I'd prefer that to be removed if possible.

Done

Add MIN_VALIDITY_CAP info to the README
@buchdag
Copy link
Member

buchdag commented Jan 8, 2019

Thanks for the good work 👍

@buchdag buchdag merged commit 7dd2cd6 into nginx-proxy:master Jan 8, 2019
@buchdag
Copy link
Member

buchdag commented Jan 12, 2019

@Greek64 the test unit fails quite often, if you have some spare time could you try to investigate and see if it could be made a bit more reliable ?

edit : it seems to be failing way more often on the three containers setup than on the two containers setup.

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 12, 2019

Will do. (although until now I had it never fail before... Then again I'm mostly testing with the two container setup...)

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 12, 2019

@buchdag
As a quick fix, do you want to try if increasing the sleep time in Line 63 to e.g. sleep 15 increases reliability in your setup?

@Greek64
Copy link
Contributor Author

Greek64 commented Jan 12, 2019

I ran the certs_validity test 10 consecutive times in a three container setup, and it passed every time.

It will be difficult for me to debug this.
Nevertheless I have two possible solutions, although you would have to test them.

First a small Explanation of what the tests does

The test unit creates 3 containers of which only one (le3.wtf) has its LETSENCRYPT_MIN_VALIDITY set to something less than the lifetime of the issued certificates (specifically 10 seconds less).
After waiting for the symlinks of the certificates and parsing the current lifetime of the issued certificates, the test unit waits 10 seconds, then retriggers update_certs, reparses the lifetime of the certificates and checks if the certificate for le3.wtf was renewed.

Possible Solution 1

Since the LETSENCRYPT_MIN_VALIDITY is set to a value 10 seconds less than the lifetime of the issue certificates, if the "waiting for symlinks" and "parsing lifetimes of issued certificates" takes less than a second the update_certs script will be triggered too close to the threshold and may not renew the certificate. Thus increasing the sleep timeout (like described above) should fix that.

Possible Solution 2

The certificates are parsed again immediately after the triggering of the update_certs script for their lifetimes, which may be too soon (i.e. before the certificate was actually renewed). Putting a sleep between the two calls should mitigate this (even better would be a wait function that waits for the cert to be renewed). Though if this is the fix, we should probably also update the force_renew test, since it does exactly the same.

@buchdag
Copy link
Member

buchdag commented Feb 15, 2019

I haven't seen CI errors related to this in weeks, so I guess this is no longer relevant. The errors are probably more Travis related than anything.

@buchdag buchdag added the kind/feature-request Issue requesting a new feature label Oct 10, 2019
buchdag added a commit to buchdag/acme-companion that referenced this pull request Oct 8, 2020
This feature is not supported in this form by acme.sh

This reverts commit 7dd2cd6, reversing
changes made to 6a90d53.
buchdag added a commit to buchdag/acme-companion that referenced this pull request Oct 9, 2020
This feature is not supported in this form by acme.sh

This reverts commit 7dd2cd6, reversing
changes made to 6a90d53.
buchdag added a commit to buchdag/acme-companion that referenced this pull request Oct 20, 2020
This feature is not supported in this form by acme.sh

This reverts commit 7dd2cd6, reversing
changes made to 6a90d53.
buchdag added a commit to buchdag/acme-companion that referenced this pull request Nov 27, 2020
This feature is not supported in this form by acme.sh

This reverts commit 7dd2cd6, reversing
changes made to 6a90d53.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature-request Issue requesting a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic Renewal
3 participants