-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accumulating new AuthRequest objects can cause a denial of service on storage. #1292
Comments
Thanks for reporting this, and including all that information. 👍
I suppose that making the "totally arbitrary value" configurable is a great idea, and a low-hanging fruit. OTOH, it's also a bit of a band-aid... Looking at the GC code, it think this is an issue specific to the k8s CRD usage -- the SQL query is likely much more efficient. Also, I know next to nothing about CRD, so I don't know if there's ways to tweak its garbage collection code in any way...as you mention, pagination could be one thing (no idea if that's possible). I think some kind of rate limiting would be good, even regardless of this specific bug. (However, I also wonder if this wouldn't often be handled by some reverse proxy sitting in front of Dex.) For the AuthRequest generation, the references issue above that totally arbitrary value seems related, too: #646. If less AR were generated, this could relieve the pressure, too. |
I think the crux of the issue is "prevent unauthenticated users from creating too much backend state." Rate limiting would be good, but overall we probably want to tie the auth request to an IP or something. The proposal in #646 is one way, but doesn't stop a motivated attacker from messing with your cluster. Removing "help wanted" for now. I don't think there's a clear enough solution to warrant the label. |
TBH I would prefer a band-aid over downtime. |
@mxey |
Make expiry of auth requests configurable This is a band-aid against #1292 I did not change the default from 24h, but I think it should be much lower for safety.
I bet there's a way we can get away from storing state until we know a user's authenticated by using a cookie. The naive way would be to serialize the AuthRequest, sign it using dex's signing keys, and shoving it in the user's session: https://godoc.org/github.com/coreos/dex/storage#AuthRequest Then only persist the AuthRequest once we get a valid response from the backend (e.g. the LDAP server), once we'd fill in the Claims and ConnectorData. Overall, I think breaking up the AuthRequest into different classes of data would help forward that effort. Even just adding comments to the struct about what fields are generated by the initial request, filled in once the user's authenticated with a backend, and any other data that's accumulated on the way. It was probably wrong to lump all that data together in the first place. |
One thing to consider here is that cookies are per browser, not per tab. With the current way of redirecting with a As an alternative, we might be able to pass the authreq as a signed JWT instead -- so it wouldn't be |
I would expect the application you are logging in to with OIDC to be using session cookies anyway. |
@mxey indeed -- we do the same. But I'm a little uneasy with imposing this on everyone... On a related note, I don't think my previous proposal (JWT authreq) makes the need for rate limiting go away: right now, unauthenticated requests cause database entries; with the proposal, they'd cause some calculations (checksums for the JWT signatures). So, it's still worth limiting the impact of what an unauthenticated peer can do... 🤔 |
We're being hit by this now and then due to Apache Flink Admin UIs we have put behind authentication with oauth2proxy and dex, with a SAML IdP. The Admin UI will auto-refresh, and once all the authentication expires, it will continue to auto-refresh and cause a new authrequest every time it does so, eventually filling up etcd :-( Just my two cents on ways this bug makes life harder. |
The cookies also do not prevent suffering from a deliberate filling of etcd by unauthenticated users. In our case our own black box exporter had created over 500,000 auth entries in less than 24 hours. |
…expiry Make expiry of auth requests configurable This is a band-aid against dexidp#1292 I did not change the default from 24h, but I think it should be much lower for safety.
Slides for my failure story related to the default [dex](https://github.com/dexidp/dex/) configuration relating to storing authrequests as CustomResources and its potential for nuking your kubernetes control plane. Ref dexidp/dex#1292 Shared at https://www.meetup.com/Dutch-Kubernetes-Meetup/events/262313920/
* Add dex/CR/bad defaults failure story Slides for my failure story related to the default [dex](https://github.com/dexidp/dex/) configuration relating to storing authrequests as CustomResources and its potential for nuking your kubernetes control plane. Ref dexidp/dex#1292 Shared at https://www.meetup.com/Dutch-Kubernetes-Meetup/events/262313920/ * Move fullstaq story to top of the failure stack
We just got hit by this issue pretty bad using the kubernetes storage backend. In one of our cluster we ended up with 260K object for |
We have reached 500K objects of authrequests and issues started. |
The only way to clear the object was to act directly on the storage layer, in our case since it was kubernetes we just deleted all the in our case we had a "misconfiguration" on the healthcheck of some OIDC proxy that was causing every healthcheck to create an authrequest. Make sure you find the source of your issue or cleaning up will be pointless |
@primeroz |
Is there any real solution for this issue? |
Not a real solution but we just decided to move the storage away from kubernetes to its own backend, this way worst that can happen is the dex service get ddos and not the kubernetes cluster. |
FWIW i took similar precautions, moving storage to a dedicated etcd cluster (of 1 node) to restrain impact of this bug to that etcd cluster instead of the main k8s API. This works out for me so far. |
Anyone have some tips to locate whats hammering our dex? Tried to shut down the services I thought was the culprit, but to no help. |
@Kyrremann if you describe / look into the authrequest you will see informations about the client that created the request , that might help you identify the offender good luck! |
I've tried that, but maybe I misunderstand something, or is
I do have a lot of
in the logs from |
yeah the redirect URI is sort of the client that initiated the auth request , is the uri where an authenticated request will be redirected to |
I'm wondering if the issue is because we don't have internal load balancer, so we're redirecting between two ports externally when login (if I remember/understand our network correctly). So it may be the refresh token, or something similar who triggers all these calls. |
Is this still an active issue with the latest Dex code? |
we are still seeing this issue with v2.24.0 |
Hey, is it still the case? Anyone working on this? CC'ing @sagikazarmark for awareness. |
Was there any update or fix for this issue? Or maybe a workaround? Im facing a similar issue. |
The problem can occur only due to the high rate of authentication requests. The best way to solve the issue is to protect the authentication endpoint with the rate-limiting solution, e.g., a reverse proxy with the feature. There is an example of how you can protect your authentication endpoint with the ingress-nginx in Kubernetes: ---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dex
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
spec:
ingressClassName: nginx
tls:
- hosts:
- dex.example.com
secretName: ingress-tls
rules:
- host: dex.example.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: dex
port:
number: 443
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: dex-auth
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/limit-rpm: "20"
# Works only for ingress-controllers >=0.40. It is here to not forget to add the annotation after upgrading ingress controller.
nginx.ingress.kubernetes.io/limit-burst-multiplier: "2"
spec:
ingressClassName: nginx
tls:
- hosts:
- dex.example.com
secretName: ingress-tls
rules:
- host: dex.example.com
http:
paths:
- path: /auth
pathType: ImplementationSpecific
backend:
service:
name: dex
port:
number: 443 One more thing: do not forget to set the lifetime for auth requests ( |
@nabokihms, thanks for the proposed solution. Let me ask a few questions about this.
|
You need to use the port of your service. Two ingresses are required to distinct auth requests from other requests. As for certificates, it is up to you 🙂 |
We are running dex in a kubernetes cluster using kubernetes CRD's as our storage implementation. We recently had came across a scenario in our testing in which a misconfiguration resulted in the storage of a large number of authorization requests.
The implications of this scenario with our implementation was that:
Do you have any thoughts as to how this issue could be avoided?
Some possible ideas I was thinking which may help were:
The text was updated successfully, but these errors were encountered: