-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy (re)uses connection after receiving FIN from upstream #6815
Comments
I'm a bit confused at this trace. I'm reading "Upstream sends Envoy an GET" and "Envoy sends the upstream 1 200/OK". Do you have the names inverted in your trace? Assuming so, I believe there's always going to be a race condition (helped by but not entirely fixed by #2871) where Envoy may reuse the connection before processing the FIN. In your case it looks like you're sending HTTP/1.0 responses where the connection should not by default be reused. You could file a feature request for better upstream HTTP/1.0 handling, or you could configure retries to handle the race. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
I observed this issue in our production recently. We are using an older version, As you can see the packages, the node.js send a FIN and envoy already ack the FIN but still sending a new HTTP request to upstream and the node.js send with RST and terminate the request. The downstream of envoy will see Our cluster config is simple: name: local_cluster
connect_timeout: 15s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 127.0.0.1
port_value: 15225
health_checks:
- timeout: 5s
interval: 10s
healthy_threshold: 1
unhealthy_threshold: 2
reuse_connection: true
http_health_check:
path: /api/health I believe this is a race condition. |
we walk around this by adding: common_http_protocol_options:
# node.js is going to close the connection when idle 5s,
# we set it to 4s to let envoy close the conn early
idle_timeout: 4s But i believe this should be permanently fixed in Envoy |
hi @alyssawilk , would you review if the issue is till valid to you? |
Hey @winguse |
Thanks @alyssawilk
As we can see in the screenshot, we see envoy had acked the FIN and sending the request again after 1.4 second, so I guess this race condition can be avoidable. Is that valid to you? On the other hand, in our case, we find this happen quite frequently. |
Acked what fin? I don't see upstream sending a FIN. I had assumed Envoy initiated connection close - it's possible for a close to be FIN+Ack-of-prior-data. From the trace I'm not seeing this being [FIN + ack-of-peer-fin] What I see is Envoy sending an HTTP/1.0 response, without keep-alive, which means the connection will be implicitly closed. It then closes the connection with a FIN and the peer attempts to send an illegal follow-up request (/1) and gets RST. Not clear to me why Envoy is sending a 1.0 response to a 1.1 request, or why your Envoy is acting as a server to an upstream client (did you mean downstream?) but I don't see anything problematic at the TCP level. I think your peer needs to understand to not reuse a TCP connection if getting an HTTP/1.0 response without "connection: keep-alive" Are you using an older version of Envoy? There was a bug with HTTP/1.0 processing where it used to latch "speaking HTTP/1.0" and that caused some confusion, which might be causing the downgrade behavior I observe, and might clear up your problem if your peer isn't paying close attention to the version it's getting back. |
@alyssawilk sorry, it's my bad that I have not clear described my situation. See my #6815 (comment) , the last 4 packages: Envoy did acked the FIN but started to send another response after 1.4 second. And in my case, Envoy <--> Node.js are speaking HTTP/1.1 to each other. This issue was found in 1.9, I am not sure if this still happen to latest build of envoy. |
I see Envoy sending [FIN ACK]. That does not mean it's acking a fin. It can be sending a fin and acking prior data. I do not see node.js sending a FIN, so I believe the [FIN, ACK] is [FIN+prior-data-ack] not [FIN + ack-of-peer-fin] as I said above. If you're running 1.9 you are almost certainly encountering this |
a few notes:
In #6815 (comment) , Node.js is listening on port 15225, and Envoy start the connection from port 45146, which the But in anyway, if we have upgrade to latest Envoy, I will let you know if we still seeing this problem. Thank you! |
We are facing this issue on envoy 1.15, where server is sending FIN, but sidecar/envoy is not passing it back to downstream/client. |
Sorry, need a bit more detail to be able to help here. |
Description:
With Envoy serving as HTTP/1.1 proxy, sometimes Envoy tries to reuse a connection even after receiving FIN from upstream. In production I saw this issue even with couple of seconds from FIN to next request, and Envoy never returned FIN ACK (just FIN from upstream to envoy, then PUSH with new HTTP request from Envoy to upstream). Then Envoy returns 503 UC even though upstream is up and operational.
Steps to reproduce shows a slightly modified version of this issue, as to encourage Envoy to reuse a connection I set circuit breakers to low max_connections and max_requests. Instead of returning UO, Envoy try to reuse a connection that received FIN and so we get 503 UC.
Repro steps:
![image](https://user-images.githubusercontent.com/9298707/57228355-4a552500-701c-11e9-8d19-c8c120fcece8.png)
Upstream is a simple web server that sleeps 2s, then returns simple json, running as simple docker container in same network as Envoy. Envoy is running as container as well, using version
8c8b068/1.11.0-dev/Clean/RELEASE/BoringSSL
client just cURL running ~ concurenlty 7 times.
We can see the error:
In logs it looks like
of most interest is the 4th line, GET /1 HTTP/1.1" 503 UC
Config:
envoy config
Logs:
trace.log
The text was updated successfully, but these errors were encountered: