Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request pool to improve delivery performance #10353

Merged
merged 13 commits into from
Jul 1, 2019
Merged

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented Mar 23, 2019

Fix #7909

Use keep-alive connections to each site, in connection pools, to reduce DNS/TCP/SSL handshake overhead. Keep connections for 300 idle seconds, then clean them up.

Benchmark: 400 ActivityPub::DeliveryWorker jobs addressed to the same inbox on 5 Sidekiq threads

Before After
18 s 8 s

@Gargron Gargron added the performance Runtime performance label Mar 23, 2019
@Gargron Gargron force-pushed the feature-request-pool branch 12 times, most recently from c0d7bc9 to 1a0bc64 Compare March 24, 2019 00:47
@Gargron Gargron marked this pull request as ready for review March 24, 2019 00:47
@Gargron Gargron force-pushed the feature-request-pool branch 3 times, most recently from 6ce5105 to ac6e32a Compare March 24, 2019 02:52
@zunda
Copy link
Contributor

zunda commented Mar 24, 2019

Such a cool new feature! I've cherry-picked feature-request-pool at ac6e32a to my single user instance and saw increase in peak delivery speed from about 230 toots/min to 290 toots/min on a Sidekiq with 3 threads.

When the app is starting, I noticed a single occurrence of NoMethodError: undefined method 'call' for nil:NilClass at app/lib/request_pool.rb:26:in 'try_create' (full backtrace at https://gist.github.com/zunda/19162442ae84256633becb1f273795ee ). Might this be a code to leap unused pool?

I'll keep the server running the code for a while and see what happens.

@ClearlyClaire
Copy link
Contributor

From https://github.com/mperham/connection_pool

WARNING: Don't ever use Timeout.timeout in your Ruby code or you will see occasional silent corruption and mysterious errors

Both http.rb and our own Request class do make use of Timeout.timeout

@Gargron
Copy link
Member Author

Gargron commented Mar 24, 2019

When the app is starting, I noticed a single occurrence of NoMethodError: undefined method 'call' for nil:NilClass at app/lib/request_pool.rb:26:in 'try_create'

@zunda I have noticed this too. It seems to disappear when I remove the monkey-patch of the try_create method. The only difference between original try_create and monkey-patch is unless @created == @max.

The intent of the monkey-patch is to remove fixed size of the pool. If more threads try to access the same URL than the pool size, they will wait up to 5 seconds (and then fail), which would be a performance regression. While without fixed size, even if all those threads won't benefit from keep-alive on first call, performance will improve over repeated calls, and the reaper will remove unused connections over time.

There should be no way that @create_block is nil in try_create. It is set once, during initialize, nothing changes it. I assume this is some kind of thread-safety issue with monkey-patching.

WARNING: Don't ever use Timeout.timeout in your Ruby code or you will see occasional silent corruption and mysterious errors

@ThibG This reminds me of the warning not to use rack-timeout. I am not sure if this is truly a deal-breaker, or the timeouts are encapsulated enough not to break connection pooling.

@Gargron
Copy link
Member Author

Gargron commented Mar 24, 2019

I have resolved the try_create issue. I shouldn't have been monkey-patching these classes, as other code (Sidekiq itself) uses them. I just created my own classes inheriting from them.

@zunda
Copy link
Contributor

zunda commented Mar 24, 2019

Coool! I've merged 3f8df86 and confirmed that the error is gone.

@Gargron
Copy link
Member Author

Gargron commented Apr 21, 2019

Before/after using request pools (MAX_REQUEST_POOL_SIZE=60) when push queue is backlogged

grafik

@Gargron
Copy link
Member Author

Gargron commented Jun 25, 2019

I could use some help debugging "too many open files" errors that seem to appear in this PR after some use

@Gargron Gargron force-pushed the feature-request-pool branch 4 times, most recently from b094f45 to 7823b8b Compare June 25, 2019 16:19
@Gargron
Copy link
Member Author

Gargron commented Jun 26, 2019

Testing in production reveals a lot of timeout errors due to full pools, so it's not ready yet.

@Gargron
Copy link
Member Author

Gargron commented Jun 26, 2019

I rewrote the code so that if the maximum pool size is reached, and we try to get a connection for a host that we don't have idle connections ready for, it removes one of the idle connections for a different host and creates a new one in its place.

@Gargron Gargron merged commit 0d9ffe5 into master Jul 1, 2019
@Gargron Gargron deleted the feature-request-pool branch July 1, 2019 22:34
hiyuki2578 pushed a commit to ProjectMyosotis/mastodon that referenced this pull request Oct 2, 2019
* Add request pool to improve delivery performance

Fix mastodon#7909

* Ensure connection is closed when exception interrupts execution

* Remove Timeout#timeout from socket connection

* Fix infinite retrial loop on HTTP::ConnectionError

* Close sockets on failure, reduce idle time to 90 seconds

* Add MAX_REQUEST_POOL_SIZE option to limit concurrent connections to the same server

* Use a shared pool size, 512 by default, to stay below open file limit

* Add some tests

* Add more tests

* Reduce MAX_IDLE_TIME from 90 to 30 seconds, reap every 30 seconds

* Use a shared pool that returns preferred connection but re-purposes other ones when needed

* Fix wrong connection being returned on subsequent calls within the same thread

* Reduce mutex calls on flushes from 2 to 1 and add test for reaping
messenjahofchrist pushed a commit to Origin-Creative/mastodon that referenced this pull request Jul 30, 2021
* Add request pool to improve delivery performance

Fix mastodon#7909

* Ensure connection is closed when exception interrupts execution

* Remove Timeout#timeout from socket connection

* Fix infinite retrial loop on HTTP::ConnectionError

* Close sockets on failure, reduce idle time to 90 seconds

* Add MAX_REQUEST_POOL_SIZE option to limit concurrent connections to the same server

* Use a shared pool size, 512 by default, to stay below open file limit

* Add some tests

* Add more tests

* Reduce MAX_IDLE_TIME from 90 to 30 seconds, reap every 30 seconds

* Use a shared pool that returns preferred connection but re-purposes other ones when needed

* Fix wrong connection being returned on subsequent calls within the same thread

* Reduce mutex calls on flushes from 2 to 1 and add test for reaping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pool HTTP connections
4 participants