Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

Closed
sblair-metrostar opened this issue Dec 28, 2023 · 2 comments
Assignees
Labels
project: JATIC Work item needed for the JATIC project type: bug 🐛 Something isn't working

Comments

@sblair-metrostar
Copy link
Contributor

sblair-metrostar commented Dec 28, 2023

Describe the bug

Traefik documentation warns to persist the acme.json certificate storage to avoid being rate limited by Let's Encrypt, but Nebari's deployment isn't configuring that. If the traefik pod is restarted enough times within a given week without this setup, Let's Encrypt will rate limit the certificate issue request and cause the site to default back to self-signed.

This has happened to me many, many times when I'm developing or testing changes to the platform or dealing with conda storage related node crashes, usually requiring a workaround involving installation of cert-manager and switching to zerossl or another cert issuing service until the Let's Encrypt backoff period elapses.

Expected behavior

traefik pod restarts should not result in new certificates being issued when configured for lets-encrypt.

Options:

  1. Configure a persistent volume mount for the acme.json storage path.
  2. Replace Traefik's native Let's Encrypt integration with cert-manager for more issuer options and improved scalability.

OS and architecture in which you are running Nebari

Linux, x64

How to Reproduce the problem?

  1. Configure the Nebari certificate with lets-encrypt
certificate:
  type: lets-encrypt
  acme_email: notreal@example.com
  acme_server: https://acme-v02.api.letsencrypt.org/directory
  1. Delete the traefik pod 5 or more times in 1 week.

Command output

No response

Versions and dependencies used.

Nebari 2023.11.1

Compute environment

AWS

Integrations

No response

Anything else?

No response

@sblair-metrostar sblair-metrostar added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Dec 28, 2023
@kcpevey kcpevey added the project: JATIC Work item needed for the JATIC project label Jan 31, 2024
@kcpevey kcpevey added this to the 2024.2.1 milestone Jan 31, 2024
@Princess4ogb
Copy link

Princess4ogb commented Feb 12, 2024

I am working on this ticket, Please find the gitlab issue as below.
https://gitlab.jatic.net/jatic/team-metrostar/t-e-platform/-/issues/347

@pavithraes pavithraes removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Feb 14, 2024
@pavithraes pavithraes moved this from New 🚦 to In progress 🏗 in 🪴 Nebari Project Management Feb 14, 2024
@pavithraes pavithraes modified the milestones: 2024.2.1, Release Q2 2024 Feb 16, 2024
@kcpevey
Copy link
Contributor

kcpevey commented Apr 11, 2024

Resolved by #2352

@kcpevey kcpevey closed this as completed Apr 11, 2024
@github-project-automation github-project-automation bot moved this from In progress 🏗 to Done 💪🏾 in 🪴 Nebari Project Management Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: JATIC Work item needed for the JATIC project type: bug 🐛 Something isn't working
Projects
Development

No branches or pull requests

5 participants