-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First request with 0 replicas sometimes takes ~20s #586
Comments
Okay, something that I've noticed is that it appears that if I restart the interceptors, it's fast again:
So maybe something with caching? Hmm |
Hi, |
Hi @JorTurFer! Yes, sorry, I should have highlighted that more in my original issue submission:
I'd love to contribute the fix. I'm working through trying to find the problem. If you have any tips or ideas, I'd love to hear them though. |
oh, sorry, I read it fast and I didn't get that part :( TBH, I'm not sure about what kind of cache could be in the interceptor |
The backoff is created in `interceptor/main.go` and `.Step()` is a pointer receiver, so `.Step()` will mutate the underling number of `.Steps` within the struct. When you have 0 replicas, the default 500ms timeout of the Dial is hit which basically guarantees that you will end up calling `.Step()`. The value inside the the struct will never be reset, and so every single time you're scaling from 0, you're guaranteed to reduce the number of `.Steps` by 1 and this will never reset until the application itself has been restarted. This change makes sure that every time we execute `DialContextWithRetry`, we start with a fresh `backoff` which will start the `.Steps` at 5 since it's a clone of the original `backoff`. This `backoff` is available in the context of the function returned by `DialContextWithRetry`, which will be the one that gets decremented, and then will be garbage collected and we get a brand new 5 steps the next time we execute `DialContextWithRetry`. Closes kedacore#586
@JorTurFer We did it! I found the problem that was causing this and opened a PR to fix the problem. Please take a look when you can! |
The backoff is created in `interceptor/main.go` and `.Step()` is a pointer receiver, so `.Step()` will mutate the underling number of `.Steps` within the struct. When you have 0 replicas, the default 500ms timeout of the Dial is hit which basically guarantees that you will end up calling `.Step()`. The value inside the the struct will never be reset, and so every single time you're scaling from 0, you're guaranteed to reduce the number of `.Steps` by 1 and this will never reset until the application itself has been restarted. This change makes sure that every time we execute `DialContextWithRetry`, we start with a fresh `backoff` which will start the `.Steps` at 5 since it's a clone of the original `backoff`. This `backoff` is available in the context of the function returned by `DialContextWithRetry`, which will be the one that gets decremented, and then will be garbage collected and we get a brand new 5 steps the next time we execute `DialContextWithRetry`. Closes kedacore#586 Signed-off-by: Aaron Batilo <AaronBatilo@gmail.com>
The backoff is created in `interceptor/main.go` and `.Step()` is a pointer receiver, so `.Step()` will mutate the underling number of `.Steps` within the struct. When you have 0 replicas, the default 500ms timeout of the Dial is hit which basically guarantees that you will end up calling `.Step()`. The value inside the the struct will never be reset, and so every single time you're scaling from 0, you're guaranteed to reduce the number of `.Steps` by 1 and this will never reset until the application itself has been restarted. This change makes sure that every time we execute `DialContextWithRetry`, we start with a fresh `backoff` which will start the `.Steps` at 5 since it's a clone of the original `backoff`. This `backoff` is available in the context of the function returned by `DialContextWithRetry`, which will be the one that gets decremented, and then will be garbage collected and we get a brand new 5 steps the next time we execute `DialContextWithRetry`. Closes kedacore#586 Signed-off-by: Aaron Batilo <AaronBatilo@gmail.com>
Report
Sometimes when I send a request to an application that I have configured with scale to 0, the first request will take around 20 seconds to respond, even though the pod has been scheduled and ready after only 2-3 seconds.
Expected Behavior
I would expect that if I send a request to an application with 0 replicas, that once the pod is up and healthy that the interceptor will return immediately
Actual Behavior
Instead, it might take between 20-25 seconds before a response is made
Steps to Reproduce the Problem
I have a website up that hosts just a stock boilerplate React application + Golang backed service. Feel free to hit it. It's a domain/codebase that I like to use for testing things.
Below is the entire YAML that I use for deploying the site:
Logs from KEDA HTTP operator
No logs about this as far as I can tell
What version of the KEDA HTTP Add-on are you running?
v0.4.0
Kubernetes Version
1.24
Platform
Amazon Web Services
Anything else?
I'd love to be a contributor and try to debug this on my own. I'm going to do my best to figure out what's happening but if anyone could give me ideas or point me in the right direction that'd be amazing too.
Thank you all for this project. It's really neat and I like it a lot.
The text was updated successfully, but these errors were encountered: