[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

sblair-metrostar · 2023-12-28T20:56:39Z

Describe the bug

Traefik documentation warns to persist the acme.json certificate storage to avoid being rate limited by Let's Encrypt, but Nebari's deployment isn't configuring that. If the traefik pod is restarted enough times within a given week without this setup, Let's Encrypt will rate limit the certificate issue request and cause the site to default back to self-signed.

This has happened to me many, many times when I'm developing or testing changes to the platform or dealing with conda storage related node crashes, usually requiring a workaround involving installation of cert-manager and switching to zerossl or another cert issuing service until the Let's Encrypt backoff period elapses.

Expected behavior

traefik pod restarts should not result in new certificates being issued when configured for lets-encrypt.

Options:

Configure a persistent volume mount for the acme.json storage path.
Replace Traefik's native Let's Encrypt integration with cert-manager for more issuer options and improved scalability.

OS and architecture in which you are running Nebari

Linux, x64

How to Reproduce the problem?

Configure the Nebari certificate with lets-encrypt

certificate:
  type: lets-encrypt
  acme_email: notreal@example.com
  acme_server: https://acme-v02.api.letsencrypt.org/directory

Delete the traefik pod 5 or more times in 1 week.

Command output

No response

Versions and dependencies used.

Nebari 2023.11.1

Compute environment

AWS

Integrations

No response

Anything else?

No response

Princess4ogb · 2024-02-12T18:03:35Z

I am working on this ticket, Please find the gitlab issue as below.
https://gitlab.jatic.net/jatic/team-metrostar/t-e-platform/-/issues/347

kcpevey · 2024-04-11T13:37:14Z

Resolved by #2352

sblair-metrostar added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Dec 28, 2023

github-project-automation bot added this to 🪴 Nebari Project Management Dec 28, 2023

github-project-automation bot moved this to New 🚦 in 🪴 Nebari Project Management Dec 28, 2023

kcpevey added the project: JATIC Work item needed for the JATIC project label Jan 31, 2024

kcpevey added this to the 2024.2.1 milestone Jan 31, 2024

pavithraes removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Feb 14, 2024

pavithraes moved this from New 🚦 to In progress 🏗 in 🪴 Nebari Project Management Feb 14, 2024

pavithraes modified the milestones: 2024.2.1, Release Q2 2024 Feb 16, 2024

kenafoster mentioned this issue Mar 20, 2024

PVC for Traefik Ingress (prevent LetsEncrypt throttling) #2352

Merged

8 tasks

kcpevey assigned kenafoster Apr 11, 2024

kcpevey closed this as completed Apr 11, 2024

github-project-automation bot moved this from In progress 🏗 to Done 💪🏾 in 🪴 Nebari Project Management Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

sblair-metrostar commented Dec 28, 2023 •

edited

Loading

Princess4ogb commented Feb 12, 2024 •

edited

Loading

kcpevey commented Apr 11, 2024

[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

[BUG] - Traefik rate limited by Let's Encrypt after several pod restarts #2174

Comments

sblair-metrostar commented Dec 28, 2023 • edited Loading

Describe the bug

Expected behavior

OS and architecture in which you are running Nebari

How to Reproduce the problem?

Command output

Versions and dependencies used.

Compute environment

Integrations

Anything else?

Princess4ogb commented Feb 12, 2024 • edited Loading

kcpevey commented Apr 11, 2024

sblair-metrostar commented Dec 28, 2023 •

edited

Loading

Princess4ogb commented Feb 12, 2024 •

edited

Loading