-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http2: read timeout applies to TCP connection and not requests; interrupts valid requests #1422
Comments
To be explicit: this happens rarely on my network connection (1 in hundreds), but 4 out of 5 times for another user. So it is a problem. I have not been able to reproduce it outside of the browser yet. It has been reproduced with clients on OSX and Windows and IE11 (suspected, since I wasn't able to inspect the network, it was a user report), as well as Chrome. |
I have tried setting |
Have you disabled timeouts? |
OK, well, previously it happened on the 4th attempt and the 15th attempt, and adding Does this mean there is a bug in the timeout functionality? |
I'm getting reports of it happening with HTTP1 as well. |
No, it just means that you have to increase or disable timeouts if your clients (or the server) holds legitimate connections open for longer. https://caddyserver.com/docs/timeouts |
The requests are lasting only a very short time <1s. The my understanding is that the connection timeout arrives at an arbitrary point during the HTTP pipeline, since timeouts are connection based, not request based. Is this plausible? |
Hmm, it's unclear to me from the docs. I think the timeouts set Deadline values which are changed per-request... |
I wrote a reproducer which proves it on localhost. It makes a series of requests, and fails after the timeout, even though no request lasts for the duration of the timeout. Caddyfile
slowserver.gopackage main
import (
"bytes"
"fmt"
"log"
"net/http"
"time"
)
const content = `<!DOCTYPE html>
<script>
i = 0;
function makeRequest() {
err = "<span style=\"color: red\">ERROR</span><br>";
r = new XMLHttpRequest();
r.onerror = function() { document.write(err); };
r.onload = function() {
if (r.status != 200) {
document.write(err);
return;
}
document.write("Loaded " + i + "<br>");
window.scrollTo(0,document.body.scrollHeight);
i++;
setTimeout(makeRequest, 1);
}
r.open("GET", "/slow")
r.send()
}
setTimeout(makeRequest, 1)
</script>
Starting..<br>
`
func main() {
x := bytes.Repeat([]byte("Z"), 1024)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, content)
})
http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
for i := 0; i < 950; i++ {
time.Sleep(1 * time.Millisecond)
w.Write(x)
}
})
log.Fatal(http.ListenAndServe(":8080", nil))
} |
Edit: I pasted the wrong version of the above slowserver.go originally, which was broken. I have fixed it. |
Huh, that is interesting. I do not experience problems when browsing sites normally where the browser reuses the same connection... I know there are people who know more about Go's net/http timeouts than I do 😬 and I would love to find out why this is behaving like this for you! In the meantime, disable timeouts... but you'll open yourself to certain kinds of attacks. |
Yeah the thing is, it works fine for me (99.9% of the time) because I have a good internet connection. However, for my client it happens 4 out of 5 times. It caused problems when they were showing the project to senior management. The other problem here is that there is no indication server side that there is a problem. It would be great to see timeouts in the logs at least, and then it would be clear what the problem was. |
I think it is this Go bug: golang/go#16450 (net/http: http.Server.ReadTimeout never reset in http2). It seems fixed in Go 1.8. I just built caddy with it and ran a test, now at 120 requests and counting without a problem. (By the way, when testing the above code, sometimes the bug is not triggered). By the way, the WriteTimeout has been disabled entirely in Go 1.8, it looks like: golang/go#18437 - and will be reintroduced in Go 1.9. |
A quick note about why this might not ordinarily manifest: it's quite hard to trigger this issue such that you would notice it unless you are unlucky. In our case, it was blindingly obvious when it failed because the asset being interrupted was the application js bundle built in development mode, which is ~900kiB. When it fails to load, nothing works. So it's obvious that it is broken. I guess ordinarily, your website would take less than 10 seconds to load, or at least you would not hit the case where you're transferring something in a window such that the 10 second connection timeout is about to hit. Or if you are, it's just a small asset like an image somewhere. If your assets are aligned poorly with the stars, however, you're out of luck. Because there are no log messages except on the client side, this could be happening all the time to our users who aren't on connections as good as ours and we wouldn't be any the wiser. |
The Go bug is unfortunate, we'll just have to live with that for now. In the meantime, are you sure you're not seeing anything in the logs? I get Thanks for diving into this and analyzing it! |
Aha! I was using Also, is it possible to configure that from the Caddyfile? |
Yes, sure. I can see why that's confusing. The request log is at the HTTP layer, not the transport layer, so it doesn't indicate when there are connection problems.
No, because the process log can span multiple different server types; also it's way simpler this way. :) Thanks for your patience and discussion about this! (I've made a note in the docs.) |
@mholt Where? I see nothing about timeouts here https://caddyserver.com/docs/log |
@elcore It's changed locally, I haven't deployed the updated site yet, I definitely will with a new Caddy release. |
1. What version of Caddy are you running (
caddy -version
)?2. What are you trying to do?
Serve a ~900kiB javascript file.
3. What is your entire Caddyfile?
4. How did you run Caddy (give the full command and describe the execution environment)?
Ubuntu 16.04
,Linux 4.4.0-62-generic
,amd64
.c4.xlarge
instance.Via systemd with the caddy file in
/home/ubuntu/.caddy/Caddyfile
.5. What did you expect to see?
The javascript file should be served successfully. OR: errors in the log indicating a problem.
6. What did you see instead (give full error messages and/or log)?
The log indicates no errors at all. However, the site intermittently does not load correctly. The network devtools indicate that the javascript started to be served, but some fraction (e.g, 400kiB or 700kiB) of the file is served before the devtools console shows
GET https://my.example.com/js/bundle.js net::ERR_CONNECTION_RESET
ornet::ERR_CONNECTION_CLOSED
.I have tried this from three different ISPs ("network views"). It happens at different rates depending on the network, but it happens even on a decent ISP with wired connections. The rate there is something like 1/100 times. The caddy log appears to show a 200 success, but potentially with a truncated filesize.
Running caddy with
-http2=false
appears to fix the problem, or at least make it much rarer.7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?
I'm unsure. It is a difficult to reproduce problem, on my good internet connection I couldn't reproduce it at all. However, a client who was trying to test the website couldn't get it to work, period. So I believe the rate at which the issue happens is dependent on the network quality.
I am not usually comfortable submitting bug reports without a reproduction case, but I searched through the bug tracker and saw nobody else has reported this issue yet, so I thought it was worth posting this.
I was unable to reproduce the issue with
curl
ornghttp
at the command line from my good internet connection even after many hundreds of requests.I have collected TCP dump logs and chrome://net-export which I will analyse and report back on if I am able to find the time.
Has anyone else encountered these issues? Is there a bug in the Go http/2 stack?
The text was updated successfully, but these errors were encountered: