-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disabling GET requests on upstream retries logic makes upstream selection unreliable #5838
Comments
The logic for this is here: I don't understand why you would want to disable retries for GET, can you explain your usecase? |
Thanks for the work to document this.
I'm not so sure about THAT necessarily; but it DOES at least mean that the load balancer could not find an upstream within parameters (healthy, not too many connections, etc) to use for the request. I suppose I could run the repro -- thank you for providing that-- but if you have the (debug) logs handy, could you post them? I'm curious why the requests are failing in the first place, but I'm short on time 😅 so that may lead to faster resolution. |
I think your |
I thought that too, but the try_duration is set for 20s. (Haven't had a chance to look at the code yet, been in graphics/frontend world today) -- I imagine it should try again after half a second and then again after another half a second, and by then the other request should be done... 🤔 at some point within those 20s you'd think it'd consider the upstream available again. |
I don't think it does retry after an interval @mholt because of the |
Oh, right -- so, I mean, I feel like that's working as intended then yeah? Retries are disabled, the one upstream is unavailable, so... accurate error message. Right? |
Here is the log for 10 iterations of the loop (10 set of 2 simultaneous requests):
|
You haven't answered my question though:
That's key here to understanding the motivation here. As-is, this seems to be working as intended, but you're configuring Caddy in a way that doesn't make sense. |
I want GET requests to be handled exactly the same than POST/PUT/DELETE/etc. I want them to be:
I don't see why they should be handled differently than the rest of the methods. Besides, in my test case, I do not understand the "glitchyness" of the process: why would it "sometimes" succeed in finding an upstream for both requests and why "sometimes" not. This looks like a bug to me (the rest is probably me misunderstanding the docs). For the rest it might be a misunderstanding of the docs on my part, but broadly: I don't understand what is retried by default, what is not, and what is disabled when settings |
Because by the HTTP spec, GET is meant to be read-only and idempotent (which means safely retryable with the same result). They should not affect state on the server. If they do, then that's a bug in the application that needs to be fixed.
I think it's because of your Having such a restrictive limitation of 1 concurrent request seems very weird. Why would you ever want a server that can only handle one request at a time? That seems like an incredibly poor bottleneck. Browsers make multiple requests in parallel to load CSS/JS/image assets, so this would make it impossible to serve any kind of website for a single user, let alone multiple users.
I'm still looking into this, I agree that seems probably wrong. I think we might have unintentionally broken this with a refactor a few versions ago (rework of the proxy loop). |
This is why I specified "before a connection could be established to an upstream". Before a connection could be established to an upstream no request, regardless of its method, could after the state on the server... because it hasn't reached it. But to answer another possible angle of your question which could be "why would you want GET requests not to be retried after they reached an upstream": some of them fail because of an upstream issue (think an external API that we're calling ourselves) and we don't want to blindly retry. In those scenarii our clients are in charge of the retry (we inform them with a "Retry-After" header).
The first request will not be handled before the second comes in: in my test case the upstream server will sleep for 1s before answering (which gives plenty of time for the second request to comes in).
That's a pretty broad question to answer but I'll try:
|
Could you give #5841 a try? I think it might do what you expect. |
It works! Thanks a lot for your work on this issue! But re-reading this conversation and considering my test case, I think there are two issues:
In my test case, the following caddy logs should not happen, there is a single upstream with
|
If you'd rather have this conversation in a new issue I can open a new one. |
Oh, right -- this is indeed approximate currently -- it's not a data race because we use We could implement locking but I don't know the performance implications there. Do you require an exact restriction? I'm not sure I've heard of that use case before. |
Well I have mixed feelings on this question. It's not the end of the world if the max request count is "best effort" and is approximated, but:
Either way, it should be documented there. Thanks a lot for your time on this issue. |
I'd argue that no, you wouldn't want any added latency if you set the number to something still reasonably high (like 100+). And that's the usecase we intended with this, we didn't expect anyone to try it with very low amounts like 1 (and IMO I still think that's extremely weird). Also, we increase the request count even if max requests isn't configured, because we have the Since you're using Python, shouldn't you be using something like gunicorn to allow for multiple concurrent requests via a worker model? This seems like it would totally be a non-issue if you just used the right tools for the job. It looks like your example uses Flask, and their docs recommend using gunicorn as well: https://flask.palletsprojects.com/en/2.3.x/deploying/gunicorn/ |
Well, I'm not entirely sure if it would add any noticeable latency. Locks don't tend to increase throughput, however... I cannot think of a way to implement the lock right now in a way that would allow more than 1 request to be load-balanced at any given moment. At first I thought we could just lock within If there's an elegant solution for this I'd be down to try it. In the meantime we can document that it's approximate to maximize throughput. |
I agree.
For the first part, if you could document it that would be super useful to your users, otherwise others, like me, will "hold it wrong". For the second part, our industry is pretty vast, what is "extremely weird" to you could be someone else's day job.
Understood.
We do use At our scale Regardless, thank you for your work on #5841, hope it'll make its way into the mainline, it solves an important issue. |
Most people who use Python with Caddy tend to be using something like gunicorn, from what we've seen. So yes, that would be the way to go, IMO. Trying to configure Caddy to do gunicorn's job via
👍 |
To be clear, I want the count to be exact. 💯 ... if we can find a good way to do it. |
As I'm not familiar with caddy's sourcecode (yet), take the following with a huge grain of salt. It's only "food for thoughts" at this stage. I guess the "ongoing requests count for a given upstream" is increased after a successful ack from an upstream (and decreased after the response has been handed out to caddy). It would explain why I think you could introduce another count, representing "possibly incoming requests count for a given upstream" which would be increased when selecting the upstream and decreased when the connection with the upstream is settled (whether it's a success or a failure doesn't matter). When given a |
Yeah, I actually was thinking something similar, but I think it still gives us an approximation. I'll think on it more. But I might be busy for a few weeks with a baby on the way. |
In a reverse proxy scenario I don't want Caddy to retry GET requests when an upstream has accepted the request from caddy. As far as I understood this is doable in Caddy by setting the following config:
My problem is that, doing so, I get intermittent
no upstreams available
from Caddy. I've put a full repro on this repository.As
no upstreams available
suggest that the error occurred before a connection was established to the upstream, and reading this comment, I don't get why the request is not retried.The text was updated successfully, but these errors were encountered: