Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http2: read timeout applies to TCP connection and not requests; interrupts valid requests #1422

Closed
pwaller opened this issue Feb 10, 2017 · 19 comments

Comments

@pwaller
Copy link

pwaller commented Feb 10, 2017

1. What version of Caddy are you running (caddy -version)?

Caddy 0.9.5

2. What are you trying to do?

Serve a ~900kiB javascript file.

3. What is your entire Caddyfile?

my.website.example.com
proxy / localhost:8080
tls {
    max_certs 1
}
log / stdout "{combined}"

4. How did you run Caddy (give the full command and describe the execution environment)?

  • Ubuntu 16.04, Linux 4.4.0-62-generic, amd64.
  • Amazon EC2 c4.xlarge instance.

Via systemd with the caddy file in /home/ubuntu/.caddy/Caddyfile.

[Unit]
Description=Caddy HTTP(s) server
After=network.target

[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/.caddy
ExecStart=/home/ubuntu/.local/bin/caddy -agree
LimitNOFILE=1000000

[Install]
WantedBy=multi-user.target

5. What did you expect to see?

The javascript file should be served successfully. OR: errors in the log indicating a problem.

6. What did you see instead (give full error messages and/or log)?

The log indicates no errors at all. However, the site intermittently does not load correctly. The network devtools indicate that the javascript started to be served, but some fraction (e.g, 400kiB or 700kiB) of the file is served before the devtools console shows GET https://my.example.com/js/bundle.js net::ERR_CONNECTION_RESET or net::ERR_CONNECTION_CLOSED.

I have tried this from three different ISPs ("network views"). It happens at different rates depending on the network, but it happens even on a decent ISP with wired connections. The rate there is something like 1/100 times. The caddy log appears to show a 200 success, but potentially with a truncated filesize.

Running caddy with -http2=false appears to fix the problem, or at least make it much rarer.

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

I'm unsure. It is a difficult to reproduce problem, on my good internet connection I couldn't reproduce it at all. However, a client who was trying to test the website couldn't get it to work, period. So I believe the rate at which the issue happens is dependent on the network quality.

I am not usually comfortable submitting bug reports without a reproduction case, but I searched through the bug tracker and saw nobody else has reported this issue yet, so I thought it was worth posting this.

I was unable to reproduce the issue with curl or nghttp at the command line from my good internet connection even after many hundreds of requests.

I have collected TCP dump logs and chrome://net-export which I will analyse and report back on if I am able to find the time.

Has anyone else encountered these issues? Is there a bug in the Go http/2 stack?

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

To be explicit: this happens rarely on my network connection (1 in hundreds), but 4 out of 5 times for another user. So it is a problem. I have not been able to reproduce it outside of the browser yet. It has been reproduced with clients on OSX and Windows and IE11 (suspected, since I wasn't able to inspect the network, it was a user report), as well as Chrome.

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

I have tried setting GODEBUG=http2debug=2 but this doesn't seem to work with Caddy. Is there a way to enable that?

@mholt
Copy link
Member

mholt commented Feb 10, 2017

Have you disabled timeouts? timeouts none

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

OK, well, previously it happened on the 4th attempt and the 15th attempt, and adding timeouts none appears to have fixed it, with >100 attempts working without a problem.

Does this mean there is a bug in the timeout functionality?

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

I'm getting reports of it happening with HTTP1 as well.

@mholt
Copy link
Member

mholt commented Feb 10, 2017

No, it just means that you have to increase or disable timeouts if your clients (or the server) holds legitimate connections open for longer. https://caddyserver.com/docs/timeouts

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

The requests are lasting only a very short time <1s. The my understanding is that the connection timeout arrives at an arbitrary point during the HTTP pipeline, since timeouts are connection based, not request based.

Is this plausible?

@mholt
Copy link
Member

mholt commented Feb 10, 2017

Hmm, it's unclear to me from the docs. I think the timeouts set Deadline values which are changed per-request...

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

I wrote a reproducer which proves it on localhost. It makes a series of requests, and fails after the timeout, even though no request lasts for the duration of the timeout.

broken

Caddyfile

*
proxy / localhost:8080
tls self_signed

slowserver.go

package main

import (
	"bytes"
	"fmt"
	"log"
	"net/http"
	"time"
)

const content = `<!DOCTYPE html>
<script>
i = 0;

function makeRequest() {
        err = "<span style=\"color: red\">ERROR</span><br>";
        r = new XMLHttpRequest();
        r.onerror = function() { document.write(err); };
        r.onload = function() {
                if (r.status != 200) {
                        document.write(err);
                        return;
                }

                document.write("Loaded " + i + "<br>");
                window.scrollTo(0,document.body.scrollHeight);
                i++;
                setTimeout(makeRequest, 1);
        }
        r.open("GET", "/slow")
        r.send()
}
setTimeout(makeRequest, 1)

</script>
Starting..<br>
`

func main() {
	x := bytes.Repeat([]byte("Z"), 1024)
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintln(w, content)
	})
	http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
		for i := 0; i < 950; i++ {
			time.Sleep(1 * time.Millisecond)
			w.Write(x)
		}
	})
	log.Fatal(http.ListenAndServe(":8080", nil))
}

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

Edit: I pasted the wrong version of the above slowserver.go originally, which was broken. I have fixed it.

@mholt
Copy link
Member

mholt commented Feb 10, 2017

Huh, that is interesting. I do not experience problems when browsing sites normally where the browser reuses the same connection... I know there are people who know more about Go's net/http timeouts than I do 😬 and I would love to find out why this is behaving like this for you!

In the meantime, disable timeouts... but you'll open yourself to certain kinds of attacks.

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

Yeah the thing is, it works fine for me (99.9% of the time) because I have a good internet connection. However, for my client it happens 4 out of 5 times. It caused problems when they were showing the project to senior management. The other problem here is that there is no indication server side that there is a problem. It would be great to see timeouts in the logs at least, and then it would be clear what the problem was.

@pwaller pwaller changed the title http2: intermittent net::ERR_CONNECTION_{CLOSED,RESET} in Chrome http2: read timeout applies to TCP connection and not requests; interrupts valid requests Feb 10, 2017
@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

I think it is this Go bug: golang/go#16450 (net/http: http.Server.ReadTimeout never reset in http2). It seems fixed in Go 1.8. I just built caddy with it and ran a test, now at 120 requests and counting without a problem. (By the way, when testing the above code, sometimes the bug is not triggered).

By the way, the WriteTimeout has been disabled entirely in Go 1.8, it looks like: golang/go#18437 - and will be reintroduced in Go 1.9.

@pwaller
Copy link
Author

pwaller commented Feb 10, 2017

A quick note about why this might not ordinarily manifest: it's quite hard to trigger this issue such that you would notice it unless you are unlucky. In our case, it was blindingly obvious when it failed because the asset being interrupted was the application js bundle built in development mode, which is ~900kiB. When it fails to load, nothing works. So it's obvious that it is broken.

I guess ordinarily, your website would take less than 10 seconds to load, or at least you would not hit the case where you're transferring something in a window such that the 10 second connection timeout is about to hit. Or if you are, it's just a small asset like an image somewhere. If your assets are aligned poorly with the stars, however, you're out of luck.

Because there are no log messages except on the client side, this could be happening all the time to our users who aren't on connections as good as ours and we wouldn't be any the wiser.

@mholt
Copy link
Member

mholt commented Feb 16, 2017

The Go bug is unfortunate, we'll just have to live with that for now. In the meantime, are you sure you're not seeing anything in the logs? I get i/o timeout in my process log when connections are closed due to timeout. (You need the -log flag.)

Thanks for diving into this and analyzing it!

@pwaller
Copy link
Author

pwaller commented Feb 16, 2017

Aha! I was using log stderr from the Caddyfile. It was not apparent that this was not enough to see such errors. Is it worth having a note in the log docs to that end?

Also, is it possible to configure that from the Caddyfile?

@mholt
Copy link
Member

mholt commented Feb 16, 2017

@pwaller

Is it worth having a note in the log docs to that end?

Yes, sure. I can see why that's confusing. The request log is at the HTTP layer, not the transport layer, so it doesn't indicate when there are connection problems.

Also, is it possible to configure that from the Caddyfile?

No, because the process log can span multiple different server types; also it's way simpler this way. :)

Thanks for your patience and discussion about this! (I've made a note in the docs.)

@mholt mholt closed this as completed Feb 16, 2017
@elcore
Copy link
Collaborator

elcore commented Feb 20, 2017

I've made a note in the docs

@mholt Where? I see nothing about timeouts here https://caddyserver.com/docs/log

@mholt
Copy link
Member

mholt commented Feb 20, 2017

@elcore It's changed locally, I haven't deployed the updated site yet, I definitely will with a new Caddy release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants