Skip to content

Performance (v0.4 and older)

Jon Chambers edited this page Jun 4, 2016 · 1 revision

Please note: these performance notes relate to Pushy 0.4 and earlier. Pushy 0.5 and newer, which use Apple's HTTP/2-based APNs protocol, do not need to maintain parallel connections in quite the same way.

Characterizing Pushy's performance (or the performance of any APNs library) is a little tricky. There are two ways you could go about it:

  1. Actually blast messages through the [sandbox] APNs gateway and see how long it takes them all to go through. This is a little tricky in theory because in most cases, the results will be heavily dependent upon available bandwidth. In practice, it's even trickier because you need devices to send push notifications to (or you could send a ton of notifications to one device) and also because the APNs sandbox seems to rate-limit providers significantly.
  2. Use a mock server. This is tricky in theory because the provider (Pushy) and the mock server are competing for computational resources, so to some extent what you're really benchmarking is the mock server.

Because we know so little about how the APNs sandbox really operates, we decided that the most controlled way to go about benchmarking would be to use a local mock server.

Benchmark environment and setup

The benchmark (see BenchmarkApp) works by creating a MockApnsServer with its own event loop group and then sending notifications as quickly as possible through a PushManager with a separate event loop group. We loop through several iterations with different combinations of concurrent thread counts and numbers of threads in the PushManager's event loop group (the server always has two threads in its group). Tests where the number of concurrent connections would be less than the number of threads in the pool are skipped because Netty binds one "channel" to one thread, and the excess threads would never have been used.

For each test, we send 100,000 notifications and measure -- to the best of our ability -- the amount of time that passes between the first and last notification's arrival at the mock server.

These tests were run on a 2.3GHz i7 MacBook Pro.

Results

Without further ado, here are the benchmark results for Pushy v0.3:

Threads Connections Throughput
1 1 45.6 k/sec
1 2 46.7 k/sec
1 4 47.5 k/sec
1 8 46.8 k/sec
1 16 46.6 k/sec
1 32 44.6 k/sec
2 2 81.6 k/sec
2 4 78.1 k/sec
2 8 48.5 k/sec
2 16 47.1 k/sec
2 32 43.3 k/sec
4 4 65.7 k/sec
4 8 95.2 k/sec
4 16 63.1 k/sec
4 32 45.8 k/sec
8 8 82.0 k/sec
8 16 48.3 k/sec
8 32 47.9 k/sec
16 16 81.4 k/sec
16 32 47.8 k/sec

Unsurprisingly, we tend to achieve the highest throughput when client connections have a dedicated thread. Throughput deteriorates when lots of connections are fighting over threads (see the case with one thread and 32 connections for an extreme example).

These tests should be taken with a large grain of salt because the client and server were running on the same machine and because no matter how many connections we opened, we were always sending notifications to the same server instance (Apple claims that, in production, multiple connections may be routed to different servers to increase throughput). Still, there are some hopefully non-controversial conclusions to draw from these results:

  • Using more connections with more uncontested threads generally improves throughput
  • Increasing competition for threads generally decreases throughput
  • Maximum throughput generally appears to be well above the 9k notifications/second mentioned in TN2265 from Apple, though it's unclear where that number came from. It's likely that most users will be limited by bandwidth or by APNs gateway processing speed rather than by Pushy's outbound throughput.