Skip to content
This repository has been archived by the owner on Jun 6, 2023. It is now read-only.

Slow sending 50000+ push #31

Closed
aimalygin opened this issue Feb 18, 2016 · 29 comments · Fixed by #54
Closed

Slow sending 50000+ push #31

aimalygin opened this issue Feb 18, 2016 · 29 comments · Fixed by #54
Labels

Comments

@aimalygin
Copy link

for i := 0; i < len(js.Tokens); i++ {
tok := js.Tokens[i]
if tok != "" {
id, err := service.Push(token, nil, p)
}
}
Slow sending. What can you think up to speed sending?

@nathany
Copy link
Contributor

nathany commented Feb 18, 2016

Thanks for reporting this issue. I haven't run any benchmarks or profiling yet. Network latency is probably the bulk of it, which I don't have much control over, but it would be good to confirm my suspicion.

How long does it take to send the same notification to 50,000 tokens?

@aimalygin
Copy link
Author

I did not wait. About 30 seconds dispatched 2000 tokens. In my DB 67000 tokens.

@nathany
Copy link
Contributor

nathany commented Feb 23, 2016

Ouch. That's not good. Sorry I haven't had a chance to look into this yet.

What I would probably do is run it with an environment variable set. GODEBUG=http2debug=1 or GODEBUG=http2debug=2 for even more. That should give some insight into what's happening in the net/http2 layer.

Also, just to eliminate it as a factor, is your internet connectivity good?
http://www.speedtest.net/
I get 24 Mbps down and 66 Mbps up from here.

@aimalygin
Copy link
Author

I run it on the VPS from 123systems. Connection speed to the internet there is good.
There I have now works timehop very quickly. 67000 sends tokens for 5-6 seconds.

@bfitzsimmons
Copy link

That's much faster. Are you still using Buford? If so, what did you change to speed things up?

@nathany
Copy link
Contributor

nathany commented Feb 24, 2016

@Valrox So you switched to https://github.com/timehop/apns and it runs quickly?

I don't have a bunch of device tokens to test with, but I'm curious how long it would take to hit https://http2.golang.org/reqinfo a few thousands times.

@bfitzsimmons
Copy link

So, Timehop has its own push lib. I see. I'll just see myself out.

@nathany
Copy link
Contributor

nathany commented Feb 24, 2016

Yah, I used Timehop's push lib prior to writing Buford. It is based on Apple's old TCP API.

Unfortunately I don't have Buford in production yet, so I haven't run into issues like @Valrox mentioned. Hopefully I'll get some time to investigate this soon.

@aimalygin
Copy link
Author

So I use timehop. I've used it before buford. In timehop have a bug that I could not resolve. Sometimes one token comes several notifications at once. So I decided to try a new API on the Apple, just buford implements it. But here I was faced with the problem of speed.

@nathany
Copy link
Contributor

nathany commented Feb 25, 2016

Thanks for reporting the issue. My work on Buford is sponsored by RnP, but I'm on other projects right now. Hopefully i'll have some time to investigate soon, or if anyone else wants to look into this issue, please do.

@aagahi
Copy link

aagahi commented Mar 3, 2016

It seems that you create a connection at each push request: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L164
You could get benefit of http/2 multiplexing by keeping alive one connection and pipelining push messages.

@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

Hi Alexis. I did some testing with Wireshark a few weeks ago and was under the impression that net/http was managing the multiplexing internally. At least it looks very different over the wire when using the same client (as it is) vs. creating new http clients for each request.

In light of this issue, it certainly warrants some followup.

@aagahi
Copy link

aagahi commented Mar 3, 2016

Ok I'm going to investigate on my side and see if I can try to bring some enlightenment too.

@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

Thanks.

@nathany nathany added the bug label Mar 3, 2016
@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

It appears that new connections are being made, but not on every request.

I'm seeing this message about 5 times in the course of 20 requests:

http2: no cached connection was available
http2: Transport creating client conn to 130.211.116.44:443

GODEBUG=http2debug=1 go test -bench .
package main

import (
    "net/http"
    "testing"

    "golang.org/x/net/http2"
)

const proto = "HTTP/2.0"

func BenchmarkReqInfo(b *testing.B) {
    transport := &http.Transport{}
    if err := http2.ConfigureTransport(transport); err != nil {
        b.Fatal(err)
    }
    client := &http.Client{Transport: transport}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        fmt.Println("\n\nGET https://http2.golang.org/reqinfo\n\n")

        res, err := client.Get("https://http2.golang.org/reqinfo")
        if err != nil {
            b.Fatal(err)
        }
        res.Body.Close()
        if res.Proto != proto {
            b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
        }
    }
}

@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

This could be interesting

// TODO: add support for sharing conns based on cert names

https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L26

@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

I've tried ensuring that the Transport has everything a default transport does. Ive also tried always reading the response body like Buford does (could matter for http 1.0 but not for 2.0 afaik). Same issues.

@nathany
Copy link
Contributor

nathany commented Mar 3, 2016

I looked around in ClientConnPool and did some logging. It appears that the connection for the addr is being cleaned up and removed from the pool. I'm not sure what triggers the cleanup yet.

@nathany
Copy link
Contributor

nathany commented Mar 8, 2016

It’s quite possible that it’s Go doing something with the connection pool. I would trying and get a packet capture and see what’s actually happening from a TCP perspective. You hopefully will be able to tell which end is initiating the disconnect. - @mbrevoort

It's a good point. I'm not entirely sure if Apple or other servers are dropping the connection or if it's something on the client side.

@nathany
Copy link
Contributor

nathany commented Mar 17, 2016

There is another HTTP/2 client for Go (#23). I sent an email asking if they had experienced this issue. This is what Adam Jones had to say:

We are getting 8,000-11,000 pushes per second with HTTP/2 so far, but need to init the client and then push inside either a worker/channel or a goroutine. Crude example https://gist.github.com/sideshow/397f2d729fcf483ec304#file-example-go-L33 - @sideshow

I need to experiment with this.

@nathany
Copy link
Contributor

nathany commented Apr 2, 2016

@bradfitz says:

It stops caching the connection if the other side tells you to GOAWAY or closes the connection.
https://groups.google.com/forum/#!msg/golang-nuts/vF89LqoThDs/oujFln9rAwAJ

So I need to confirm that we are getting a GOAWAY or closed connection, and then the question becomes why?

For now I'm going to try the reqinfo test inside a goroutine based on @sideshow's suggestion. If I can get a bunch of device tokens, I can do a proper test hitting Apple's servers (I should be able to at work).

@bradfitz
Copy link

bradfitz commented Apr 2, 2016

Use Go's http2 debugging to watch what's happening on a live client or server. See the net/http package docs.

@nathany
Copy link
Contributor

nathany commented Apr 2, 2016

Hi @bradfitz. Right now I don't have a live environment to test this in, so I've resorted to simple tests like: #31 (comment)

@Valrox can you try your push with GODEBUG=http2debug=1 and report back? (see https://golang.org/pkg/net/http/)

In summary:

It sounds like the approach @sideshow took when using his https://github.com/sideshow/apns2 library is to fire off multiple requests in separate goroutines. If that's the best solution, perhaps a worker pool could be added to the Buford example as a suggestion.

The way @Valrox and I have been using Buford is to sequentially send each request. The old TCP-based API used by https://github.com/timehop/apns only has a single goroutine sending requests off to Apple and managed 67000 tokens in 5-6 seconds (as per #31 (comment)), but then those are via a single connection and never waited to be acknowledged. There's obviously going to be some overhead with HTTP/2.

What concerns me is that while investigating this issue with a more minimal test, the connection to https://http2.golang.org/reqinfo had to be reestablished more often than I would have expected. Though I have yet to confirm if the same behaviour occurs sending many requests to Apple's servers.

From the http2debug logs using #31 (comment), it's not entirely clear to me when the connection is closed, but it would seem that every few requests the connection closes down as the message "no cached connection was available" appears and the next request is back on stream 1.

nathany added a commit that referenced this issue Jun 6, 2016
@nathany
Copy link
Contributor

nathany commented Jun 6, 2016

In #49 there is a benchmark that pushes notifications to Apple using a single device token.

Using a command like this against the Apple sandbox is resulting in 75-100ms per notification from my MacBook:

❯ GODEBUG=http2debug=1 go test ./push -cert ../cert.p12 -token device-token -v -bench . -benchtime 30s

     500      96173515 ns/op

Across the 500 notifications sent, I'm seeing it establish a connection ~3 times.

Transport failed to get client conn for api.development.push.apple.com:443: http2: no cached connection was available

I'm not sure exactly why it's taking more than one connection, but I don't see any GOAWAYs. It's opening a new connection relatively few times though.

As far as speed, it's possible that Apple is limiting us somehow, whether throttling, or only providing a few concurrent streams per connection.

The APNs server allows multiple concurrent streams for each connection. The exact number of streams is based on server load, so do not assume a specific number of streams. Connections

So far I haven't done any profiling or determined whether the time is taken up on the client, the server, or network latency. But I'm seeing a similar ~77ms per request on a benchmark against https://http2.golang.org/reqinfo (#31 (comment)).

Even at an best time I've seen of 75ms, it would take over one hour to send 50,000 notifications over a single connection. Of course multiple connections can be opened to Apple.

nathany added a commit that referenced this issue Jun 6, 2016
@nathany
Copy link
Contributor

nathany commented Jun 6, 2016

In terms of the connection needing to be established multiple times, I did more experimentation with my reqinfo benchmark now that Go 1.6.2 upgrades clients to HTTP/2 properly.

package main

import (
    "io/ioutil"
    "net/http"
    "testing"
)

const proto = "HTTP/2.0"

func BenchmarkReqInfo(b *testing.B) {
    for i := 0; i < b.N; i++ {
        res, err := http.Get("https://http2.golang.org/reqinfo")
        if err != nil {
            b.Fatal(err)
        }
        _, err = ioutil.ReadAll(res.Body)
        if err != nil {
            b.Fatal(err)
        }
        res.Body.Close()
        if res.Proto != proto {
            b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
        }
    }
}

When I create the Transport myself and use http2.ConfigureTransport, it was establishing the connection multiple times, whereas a single connection was enough when using http.Get with the DefaultTransport. Perhaps there is something wrong with how I am doing the set up at: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L52

@nathany
Copy link
Contributor

nathany commented Jun 6, 2016

More experiments to try:

  • test the reqinfo benchmark with net/http2/h2demo running locally to get a baseline overhead of HTTP/2 client/server without network latency (or similar with go-apns-server)
  • try running the benchmarks from AWS / CI instead of my laptop
  • do the same tests with a different HTTP/2 implementation or language to see if response times are similar
  • review the approach taken in apns2. how well is it working for them? @sideshow
  • chat with Apple staff during WWDC 2016 (Paul Danbold?), hopefully learning more about stream limits, dev/prod differences, optimal/max connections to open @idyll

@nathany
Copy link
Contributor

nathany commented Jun 8, 2016

Running from Nitrous US West to Apple dev environment with -benchtime 10s.

BenchmarkPush 500 32169893 ns/op
ok github.com/RobotsAndPencils/buford/push 21.455s

32ms/notification

That's ~27 minutes for 50,000 notifications on a single connection. Make 20-30 connections to Apple and it's not so bad (the question is, how many connections should we be opening).

I did a test with Pigeon (Elixir) that was about 200ms/notification (timed with stopwatch) from my laptop.

nathany added a commit that referenced this issue Jun 9, 2016
nathany added a commit that referenced this issue Jun 9, 2016
@nathany
Copy link
Contributor

nathany commented Jun 9, 2016

Right now my examples of using Buford all wait for the response before sending another notification. As @sideshow pointed out some time ago, we need to send the notifications with a bunch of goroutines. For now I'll prepare an example on concurrent use and update the README.

After that's done, I'll be closing this issue.

@nathany
Copy link
Contributor

nathany commented Jun 9, 2016

Sending 1000 notifications using 20 workers with the example at #54 results in:

5.5 ms/notification from my laptop
2.5 ms/notification from Nitrous.io US West

2.5 ms/notification = 24,000 notifications per minute which IMO is pretty good. It may go faster still with more workers or more than a single HTTP/2 connection.

Closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants