Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error JWS has invalid anti-replay nonce #318

Closed
emilevauge opened this issue Nov 15, 2016 · 20 comments
Closed

Error JWS has invalid anti-replay nonce #318

emilevauge opened this issue Nov 15, 2016 · 20 comments

Comments

@emilevauge
Copy link

Hello

As seen on the Gitter chan, lego should manage errors of type JWS has invalid anti-replay nonce returned by Let's Encrypt.

@xenolf

Well the easy fix is for me to add an error type for this nonce error and return that.

This would be better than nothing :)

The more involved fix is to find why the nonces are invalid.
I had a look around the boulder source and it would be nice to know which case is encountered here

Sadly, I don't know the reason why nonces are invalid in our case. We only get this log:

time="2016-10-25T15:34:59Z" level=error msg="Error renewing certificate: acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce UQ9kdNZTAFPRYEaNVmIEWCEBW3.............."

@xenolf
Copy link
Member

xenolf commented Nov 18, 2016

@emilevauge Is it possible to have someone who has this problem test the code in the fix_nonce_error branch?

@emilevauge
Copy link
Author

@xenolf thanks a lot for helping on this :) I will make some tests with your branch and give you some feedback.

@tribou
Copy link

tribou commented Nov 27, 2016

@xenolf just wanted to report I'm getting the nonce error consistently the past couple days (haven't been able to get a cert using the prod ACME URL yet with numerous retries). I'm using https://github.com/PalmStoneGames/kube-cert-manager which returns errors like these:

2016/11/27 17:18:16 [INFO][example.com] acme: Obtaining bundled SAN certificate
2016/11/27 17:18:17 Error while processing certificate during sync: Error while obtaining certificate for new domain example.com: acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce nMkkF2_SoyGYBSUU7obiZ4h0612cH25ldAzL6Rcphow

2016/11/27 17:18:47 [INFO][example.com] acme: Obtaining bundled SAN certificate
2016/11/27 17:18:47 Error while processing certificate during sync: Error while obtaining certificate for new domain example.com: acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce bGRoCwSZ7ZSijBAuZpgEOhNZRUm_LYMw-MDsS4-4x0U

However, using the staging ACME URL works fine for the same domains. I'm guessing we won't know any more details until a test can be made with the fix_nonce_error branch?

@mholt
Copy link
Contributor

mholt commented Nov 27, 2016

If you're using the same account on the live endpoint as you are on staging, you'll get this error. I think.

@cpu
Copy link
Contributor

cpu commented Nov 28, 2016

If you're using the same account on the live endpoint as you are on staging, you'll get this error. I think.

That's correct - nonce's are not transportable between the two environments.

@jipperinbham
Copy link
Contributor

I'm hitting this problem as well and seems to be limited to cases where we're issuing a domain with a SAN and it always seems to throw an error when calling getChallenges for 1 of the 2 domains.

@tribou
Copy link

tribou commented Nov 28, 2016

Thanks for the info so far, everyone!

To give some more backstory on my project, I'm currently trying to migrate from letsencrypt-express to kube-cert-manager. So one of the domains I attempted already has a working prod cert with letsencrypt-express. However, the second domain was just a one-off test that I only ran in kube-cert-manager; but I did use the same email initially to test staging and then prod for that one-off test.

For clarification, are the nonce's stored on the Let's Encrypt side and associated by email? Or are they stored by the client library's implementation (boltdb used by kube-cert-manager), and perhaps I just need to wipe out the boltdb database and try again for just prod?

...or *gulp*, will I need to find some way to sync the letencrypt-express account meta files with the kube-cert-manager meta info stored in boltdb?

@tribou
Copy link

tribou commented Nov 28, 2016

Wiping out the existing staging meta data worked!

So for anyone hitting this issue in the future, this is what worked for me:

  1. Test using https://acme-staging.api.letsencrypt.org/directory (staging certs) to make sure your LE implementation is correct.
  2. Find where your LE library stores its account metadata and delete that file/directory (kube-cert-manager uses a data.db boltdb file).
  3. Update to using the https://acme-v01.api.letsencrypt.org/directory prod URL, and redeploy/re-run your library's cert acquisition.

This should force the Let's Encrypt negotiation process to regenerate new account info for prod.

@mholt
Copy link
Contributor

mholt commented Nov 28, 2016

Sounds like your application was using the wrong account for the transaction ;) Glad you figured it out.

@mholt mholt closed this as completed Nov 28, 2016
@BusyBusinessCat
Copy link

I have also encountered this issue, but it was a little different from what I've been reading here.

At the beginning, I've issued a staging cert to check my lego installation and app configuration was correct. As everything was ok, I switched to standard mode with success, just removing the "--server staging-url" from my automatic lego call (and thus, keeping the same email address) allowed me to get a LE certificate.

Things started to begin strange, right after the expiration of the staging cert, at the next renew of my cert.

I was having the acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce at each call. Reading this thread, I tried to remove the old staging account info (and the standard one too), it didn't help.

Changing the email to one that I have never used with LE worked directly.

I can assume that, for a reason I don't know, we cannot use the same email in staging and in classic LE, but in that case I do not understand why it was perfectly working in the first place.

Maybe someone here can explain me what happened ? Or maybe I missed something ?

@cpu
Copy link
Contributor

cpu commented Jan 9, 2017

Hi folks, just passing by & wanted to answer a few Q's since my last reply on-thread.

I can assume that, for a reason I don't know, we cannot use the same email in staging and in classic LE

There's no constraint like this from the Let's Encrypt side. You can use the same email for staging and production without causing errors.

For clarification, are the nonce's stored on the Let's Encrypt side and associated by email? Or are they stored by the client library's implementation (boltdb used by kube-cert-manager)

Nonces aren't associated with an email/account on the server side. Roughly speaking, for a given environment (staging/prod) they are simply a number given to a client and noted on a list. There's no additional metadata. As mentioned earlier they do not work across environments (e.g. a nonce from staging is unknown to the prod environment), each env maintains an independent nonce list.

I'm not familiar with how Lego stores its nonces. Internally it could be using its own binding with the account email address - if so then it seems like it would be a bug if that nonce can end up reused across a switch from staging to prod.

@BusyBusinessCat
Copy link

BusyBusinessCat commented Jan 9, 2017

Did lego store account info somewhere else than in the working folder (inside the "account" folder) ?

I managed to remove everything related to accounts when I was trying to keep the same bogus mail, but I always ended with the nonce error (it was a different nonce each time, by the way).

I can try to reproduce it, putting some extra logs where I can if someone think it's useful.

@cpu Thanks for clarification about the LE accounting, in fact that's what I was expecting from LE, so I really don't understand why changing email make it work for me.

@mithrandi
Copy link

The nonces are only valid for an hour or so, as far as I know; if they are being stored for longer than that, or the server-side nonce store is purged (eg. I think this happens when Boulder is restarted) the stored nonce will be invalid. I think the easiest way to handle this is to retry any request that fails with an "invalid nonce" error, using the new nonce returned along with the response; this should handle pretty much all of the common scenarios without any complicated logic required.

@emilevauge
Copy link
Author

I think the easiest way to handle this is to retry any request that fails with an "invalid nonce" error, using the new nonce returned along with the response; this should handle pretty much all of the common scenarios without any complicated logic required.

I agree with this. We are still getting this error once in a while on Traefik and I would love lego to retry this kind of request ;)

@xenolf
Copy link
Member

xenolf commented Jan 27, 2017

I will implement this over the weekend.

@emilevauge
Copy link
Author

@xenolf awesome 😍

@ubershmekel
Copy link

@mholt or @xenolf can you reopen this issue? Has the retry been implemented? I got this nonce problem when using traefik. The problem was repeating itself until I added a dot to my gmail address to cause a fresh transaction to take place.

time="2017-02-19T02:13:17Z" level=error msg="map[www.example.com:acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce 91WN0.....nE0SindU]" 
time="2017-02-19T02:13:17Z" level=error msg="Error getting ACME certificates [www.example.com] : Cannot obtain certificates map[www.example.com:acme: Error 400 - urn:acme:error:badNonce - JWS has invalid anti-replay nonce 91WN0.....nE0SindU]+v" 

@mholt
Copy link
Contributor

mholt commented Feb 19, 2017

I'm still not convinced this is not a cross-account reuse problem. Yes, of course you can use the same email address with different ACME CAs, but you cannot use the same account credentials between them.

@ubershmekel See, your error and how you fixed it seems to reinforce this idea. By changing your email address, the client created a new account with the server with new credentials instead of reusing an account created for another server.

@ubershmekel
Copy link

ubershmekel commented Feb 19, 2017

It seems this may be an issue with lets encrypt itself.

We just finished up reverting a CDN config change that was causing this problem. There was caching in some places where there should not be.

...

I also found that I had to change email address to get letsencrypt to give me new certificates.

letsencrypt/boulder#1217

@cpu
Copy link
Contributor

cpu commented Feb 20, 2017

@ubershmekel I followed up on the linked Boulder #1217 thread from Jan 2016 - this is not a related issue with Let's Encrypt itself. As @mholt mentioned changing contact information is not a fix but likely ends up producing a fresh nonce or somehow otherwise working around the true underlying issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

9 participants