Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #1292 and resulting issues from #1300 #1309

Merged
merged 8 commits into from
Jan 1, 2017
Merged

Fixed #1292 and resulting issues from #1300 #1309

merged 8 commits into from
Jan 1, 2017

Conversation

lhecker
Copy link

@lhecker lhecker commented Dec 26, 2016

@mholt We already talked in length about the contents of this PR and here it is!

I tried to split up this PR in 4 commits to clearly separate the fixes. The commit messages probably say it all. 🙂

This issue was caused by connHijackerTransport trying to record HTTP
response headers by "hijacking" the Read() method of the plain net.Conn.
This does not simply work over TLS though since this will record the TLS
handshake and encrypted data instead of the actual content.
This commit fixes the problem by providing an alternative transport.DialTLS
which correctly hijacks the overlying tls.Conn instead.
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the test looks good. Thanks for working on this! And you've confirmed it fixes your use case as described in #1292 and #1300?

io.CopyBuffer(dst, src, buf.([]byte))

// `CopyBuffer` only uses `buf` up to it's length and
// panics if it's 0 => Extend it's length up to it's capacity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is hard to understand. Can it be phrased a little more clearly (and fix the gammar, haha)? Maybe something like:

CopyBuffer only uses buf up to its length and panics if it's 0, so we extend its length up to its capacity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I always forget about "its" because I pronounce it the same as "it's".

CopyBuffer only uses buf up to its length and panics if it's 0. Due to that we extend buf's length to its capacity here and ensure it's always non-zero.

Is that fine too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

if err := <-errc; err != nil {
plainConn.Close()
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errc can buffer 2 values but only 1 is ever consumed. And only the plainConn is closed if there's an error. Is that expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this straight out of the Go source code from here and thus didn't dare to fiddle with this rather sensitive piece of code.
I forgot to add a comment with the above URL though. I'll add one tomorrow.

errc probably has a capacity of 2 to ensure that the 2 concurrent operations (handshake and timeout) will never block while sending (and thus finishing the Goroutine as fast as possible), especially since the (memory) overhead of a small buffered channel is neglectible anyways.
I'm not entirely sure why the call to tlsConn.Close() is missing, but that's what every Go 1.7 app is running on so I guess what they're doing there is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine as-is then. Didn't realize this came from the std lib. (Wonder if we should just put a comment in saying that we've adapted this code from the standard library and modified it a bit.)

idx := strings.LastIndex(serverName, ":")
if idx != -1 {
serverName = serverName[:idx]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use net.SplitHostPort() instead.

Copy link
Author

@lhecker lhecker Dec 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... That's quite a method though.
The reason I wrote it myself is because I had to replace the call to cm.tlsHost() in the original Go source, which also didn't use net.SplitHostPort() for some reason (performance maybe?). It's written like this and my code was basically an inlined version of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that's a good question -- SplitHostPort handles IPv6 addresses. If this function doesn't see IPv6 addresses then I imagine it's fine as-is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You where right and the function was broken. I forgot about IPv6 addresses.
I fixed this problem and moved the code to a stripPort() method which is basically 4 LOC, easy to understand and makes the code quite clean now IMO.

transport.Dial = baseTransport.Dial
transport.DialTLS = baseTransport.DialTLS
transport.MaxIdleConnsPerHost = -1
if b, _ := base.(*http.Transport); b != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting; is there a specific reason you're not using the ok (the second value, which is currently _) for the check here?

Copy link
Author

@lhecker lhecker Dec 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! If ok is false (and the cast thus failed) b must be nil right? (As per Go spec.)

So if we have this code (which was the old code and what we want to do):

if base != nil {
    if b, ok := base.(*http.Transport); ok == true {
        ...
    }
}

It's the same as this:

if b, _ := base.(*http.Transport); b != nil {
}

because if base is nil it stays nil and if it's not a *http.Transport it will be converted to nil. I usually use this coding pattern in Go if I want to write sth. along the former snippet. Should I revert that or add a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I stand corrected. Check this out: https://play.golang.org/p/dWlvLPXab6

In other words, you can both have b == nil and ok == true. So if you want to check that the type is correct, use the ok value; if you want to check that a pointer is not nil, then do that separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand… Do you agree that the code is fine? (My English skills are seriously lacking lately 😩)

Basically what I said is that the first code snipped (the old code) above is the same as:

if b, ok := base.(*http.Transport); b != nil && ok == true {
}

But since b being non-nil must imply that ok is true we can drop that check and get the latter snipped (the new code). You can try that in your Playground by making it return e.g. an &sync.Mutex{} instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Your English is actually really good. Yes, you are right. Sorry for the confusion. Been a long day.

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

And you've confirmed it fixes your use case as described in #1292 and #1300?

Yeah proxying HTTP/2 requests as well as WebSockets generally works flawless now. I tested it using self signed certs and Go 1.8 beta 2.


I'm still having issues with my gRPC backend though, because requests and responses seem to buffered by caddy somehow. Maybe they're even dropped - I've got to properly investigate that first. Due to that my gRPC server seems to only receive the first and last stream chunks. Any idea why that could be the case?
I don't think it's because of the missing support for HTTP/2 push, because in my case it's a client sending a stream to the server! I really suspect some form of request caching or something...

@mholt
Copy link
Member

mholt commented Dec 28, 2016

I'm still having issues with my gRPC backend though, because requests and responses seem to buffered by caddy somehow. Maybe they're even dropped - I've got to properly investigate that first. Due to that my gRPC server seems to only receive the first and last stream chunks. Any idea why that could be the case?

Nope :( Although I wonder if there is something up with the buffering somewhere in the proxy. For example, in #1310 there is a report of high memory use when proxying for big files.

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

@sanderhahn I was able to reproduce your problem. wss: connections only properly work if you specify insecure_skip_verify too.

@mholt Due to this I'd consider this PR incomplete. I'll try to fix this but I'm not entirely sure yet where to start, since unlike before this does not cause any debug/error output to be generated by Go.

@mholt
Copy link
Member

mholt commented Dec 28, 2016

@lhecker I'm not entirely clear on what the error is, but maybe dumping the HTTP request would be a helpful start? See https://golang.org/pkg/net/http/httputil/.

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

@mholt I actually found the root cause again only minutes after writing the comment. It's because I only add the DialTLS method to the transport if a transport.TLSClientConfig is provided, but that's not the case for verified TLS connections (i.e. transport.TLSClientConfig is nil).

So… I fixed that and it kind of works, but now I get the next problem:

2016/12/28 15:59:35 http: TLS handshake error from [::1]:2816: EOF

(That's from the backend and not from Caddy!)

sigh…

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

AHA! I forgot to add the errors stderr directive to that particular domain I'm testing on. I can't claim that I particularly like that behavior (swallowing errors by default) and IMHO you might want to consider changing it to stderr as the safer default.

Anyways: Fix inbound.

@mholt
Copy link
Member

mholt commented Dec 28, 2016

@lhecker Nice work :) Sorry about the default of hiding errors. We can probably open a discussion on that in a new issue if you want.

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

@sanderhahn Do you mind trying out the newest commit?

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

@mholt I found the cause for #1310 and I think I fixed it for the most part. Unfortunately I'm still experiencing issues with my gRPC backend so I can't be fully sure yet. Do you mind if I attach the fix for that to this PR in case I finish with it before you merge this?

It's caused by this line. I fixed it by introducing a special case where the request body is not buffered if len(proxy.Upstreams) is 1 or less. If only one upstream exists we don't ever have to rewind the body and thus don't have to cache it right? That should really help improve performance (especially latency obviously) for the common proxy case!

@sanderhahn
Copy link

@lhecker Checked out latest commit b857265 and you have fixed the tls support for websockets in the reverse proxy: https://proxy.codevelop.nl/ Very nice :) Congrats!

@sanderhahn
Copy link

Additional note is that the -http2=false flag doesn't seem to be necessary anymore. The proxy works both with and without http2 enabled.

@mholt
Copy link
Member

mholt commented Dec 28, 2016

@lhecker Maybe make a new PR for the gRPC issue? I'll probably merge this soon unless somebody tells me to hold the presses.

Let's make you a collaborator on the project too so you can have push access. This way you can work in branches instead of forks and also help by reviewing and merging pull requests, etc. I really appreciate your efforts, you've done a great job!

@sanderhahn Thanks for testing it!

@mholt
Copy link
Member

mholt commented Dec 28, 2016

Oh, missed the second half of your comment @lhecker:

It's caused by this line. I fixed it by introducing a special case where the request body is not buffered if len(proxy.Upstreams) is 1 or less. If only one upstream exists we don't ever have to rewind the body and thus don't have to cache it right? That should really help improve performance (especially latency obviously) for the common proxy case!

I had never considered that. I guess not, then. I don't see where or why else we're using body other than to replay it for the next upstream.

I wonder if buffering has been causing some of other proxy-related issues too. It'd be interesting to see if their issues are fixed with that change too.

Let me know when all the changes for this PR are pushed. :)

@lhecker
Copy link
Author

lhecker commented Dec 28, 2016

Let's make you a collaborator on the project too so you can have push access. This way you can work in branches instead of forks and also help by reviewing and merging pull requests, etc. I really appreciate your efforts, you've done a great job!

Thanks! I probably won't be able to contribute much, because most of the current work resulted from me needing a viable nginx alternative as the frontend for a commercial product with HTTP/2 upstream support while being easy to mod. Caddy turned out to not be quite bug free yet so I simply contributed my fixes. It's quid pro quo right? 😅
Oh and I'll of course promise to handle in good faith with my new access rights to this repo though! And if I ever have a fix for caddy in the future I'll make sure to create a PR. 🙂

The next thing I'll do is try to get gRPC proxying working. The buffering issue is e.g. the first step in this case and I'll submit a PR (on a new branch here) soon.

Let me know when all the changes for this PR are pushed. :)

Since @sanderhahn confirmed that it's working for him too I'd consider this PR to be finished and you can merge it whenever you see fit. This PR turned out to be mostly a fix for #1292, but since the issues from #1300 are quite closely related I think it's okay we squashed both together.

@lhecker
Copy link
Author

lhecker commented Dec 29, 2016

@mholt I pushed one further change just now. Please review it carefully, because I don't know wether it has any implications:
I looked into the git history of those 4 removed lines in 533039e but never found an explanation as to why they where added originally. Well apart from outreq.Close = false which can be removed without any concerns, because false is the default anyways.

But I found that the removal of those 4 lines has no side effects for my sites and seemed to be essential to make gRPC proxying work properly, which is why I added this to my follow-up fixes for #1300.

@mholt
Copy link
Member

mholt commented Dec 29, 2016

@lhecker We added those lines originally because we borrowed the ServeHTTP for ReverseProxy in https://golang.org/src/net/http/httputil/reverseproxy.go?s=3924:3999#L134 of the Go standard library.

It's quite possible, given all the handling/configuration of transports and directors and such in our case that we don't need to set these there.

But it's good to hear that's what makes gRPC work! And the tests still pass, so that's promising. I'll take a look.

Others are welcome/invited to review carefully too. @nemothekid - hope you're having a happy holiday, I dunno where you're at but maybe you'd have a chance to look this over and see if you notice anything?

@lhecker
Copy link
Author

lhecker commented Dec 30, 2016

@mholt I found one more bug… 😕 You're not handling any trailers in the proxy! Seeing as trailers are gaining quite some interest with HTTP/2 I'll update this PR with a fix soon.

@lhecker
Copy link
Author

lhecker commented Dec 30, 2016

@mholt I pushed the support for HTTP trailers just now. As usual I made sure to add tests (I extended TestReverseProxy).
Again: Without the fix it doesn't work at all, but after applying it it works flawless (trailers in general as well as proxying gRPC metadata, which partially relies on it). I also tested everything on my staging system (basic REST, WebSockets, gRPC).

The code stems from the current reverseproxy.go code just like it did before. The big difference is that the version in caddy was outdated, missing the trailer feature as well as containing a trailer-related bug ("Trailers" != "Trailer").

@mholt
Copy link
Member

mholt commented Dec 30, 2016

That's awesome, thanks @lhecker . Can't wait to take a look this weekend. 👍

MaxVersion: cfg.MaxVersion,
CurvePreferences: cfg.CurvePreferences,
DynamicRecordSizingDisabled: cfg.DynamicRecordSizingDisabled,
Renegotiation: cfg.Renegotiation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note, I think this won't be needed in Go 1.8 as TLS configs have a Clone() method ^_^

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool. I'll take note of that and fix this after 1.8 is out.

I've seen a insane amount of code already which copies structs like tls.Config by value instead of creating a new proper clone and thus causing all sorts of very subtle issues by inproperly copying the embedded mutexes. Things like that really make me personally believe that Go might be our best tool but the worst language we've come up with. 😅

@mholt mholt merged commit 7cbbb01 into caddyserver:master Jan 1, 2017
@mholt
Copy link
Member

mholt commented Jan 1, 2017

Thank you @lhecker!! You are welcome to hack on Caddy at any time. 😄

@petey petey mentioned this pull request Jan 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants