Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
util-core: lock-free async semaphore
Problem The current implementation of `AsyncSemaphore` is lock-based and has proved to be too costly under contention. I've also observed that while testing the fiber scheduler implementation. Solution Migrate to a lock-free implementation similar to how semaphores are implemented in Linux by using a single atomic counter to indicate the number of available permits or the number of waiters. The happy path is uses a single CAS and if there aren't available permits, a concurrent linked queue is used. Since the queue incurs into additional allocations, I've also optimized the code to avoid allocations in a few places to compensate. Result The new implementation has better performance and lower allocation rate: baseline ``` [info] Benchmark Mode Cnt Score Error Units [info] AsyncSemaphoreBenchmark.mixed avgt 10 1265.338 ± 97.302 ns/op [info] AsyncSemaphoreBenchmark.mixed:·gc.alloc.rate avgt 10 258.451 ± 14.962 MB/sec [info] AsyncSemaphoreBenchmark.mixed:·gc.alloc.rate.norm avgt 10 50.551 ± 0.590 B/op [info] AsyncSemaphoreBenchmark.mixed:·gc.churn.Par_Eden_Space avgt 10 260.352 ± 138.484 MB/sec [info] AsyncSemaphoreBenchmark.mixed:·gc.churn.Par_Eden_Space.norm avgt 10 51.108 ± 28.172 B/op [info] AsyncSemaphoreBenchmark.mixed:·gc.count avgt 10 12.000 counts [info] AsyncSemaphoreBenchmark.mixed:·gc.time avgt 10 6.000 ms [info] AsyncSemaphoreBenchmark.noWaiters avgt 10 806.794 ± 208.589 ns/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.alloc.rate avgt 10 131.521 ± 35.256 MB/sec [info] AsyncSemaphoreBenchmark.noWaiters:·gc.alloc.rate.norm avgt 10 16.000 ± 0.001 B/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.churn.Par_Eden_Space avgt 10 130.410 ± 169.691 MB/sec [info] AsyncSemaphoreBenchmark.noWaiters:·gc.churn.Par_Eden_Space.norm avgt 10 16.496 ± 22.444 B/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.count avgt 10 6.000 counts [info] AsyncSemaphoreBenchmark.noWaiters:·gc.time avgt 10 5.000 ms [info] AsyncSemaphoreBenchmark.waiters avgt 10 3671.510 ± 773.611 ns/op [info] AsyncSemaphoreBenchmark.waiters:·gc.alloc.rate avgt 10 213.112 ± 22.384 MB/sec [info] AsyncSemaphoreBenchmark.waiters:·gc.alloc.rate.norm avgt 10 120.443 ± 11.879 B/op [info] AsyncSemaphoreBenchmark.waiters:·gc.churn.Par_Eden_Space avgt 10 215.990 ± 0.763 MB/sec [info] AsyncSemaphoreBenchmark.waiters:·gc.churn.Par_Eden_Space.norm avgt 10 123.114 ± 26.471 B/op [info] AsyncSemaphoreBenchmark.waiters:·gc.count avgt 10 10.000 counts [info] AsyncSemaphoreBenchmark.waiters:·gc.time avgt 10 7.000 ms ``` optimized ``` [info] Benchmark Mode Cnt Score Error Units [info] AsyncSemaphoreBenchmark.mixed avgt 10 826.948 ± 33.943 ns/op [info] AsyncSemaphoreBenchmark.mixed:·gc.alloc.rate avgt 10 383.787 ± 14.822 MB/sec [info] AsyncSemaphoreBenchmark.mixed:·gc.alloc.rate.norm avgt 10 48.244 ± 0.177 B/op [info] AsyncSemaphoreBenchmark.mixed:·gc.churn.Par_Eden_Space avgt 10 391.142 ± 99.118 MB/sec [info] AsyncSemaphoreBenchmark.mixed:·gc.churn.Par_Eden_Space.norm avgt 10 49.164 ± 12.446 B/op [info] AsyncSemaphoreBenchmark.mixed:·gc.count avgt 10 19.000 counts [info] AsyncSemaphoreBenchmark.mixed:·gc.time avgt 10 12.000 ms [info] AsyncSemaphoreBenchmark.noWaiters avgt 10 735.404 ± 24.172 ns/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.alloc.rate avgt 10 145.514 ± 4.828 MB/sec [info] AsyncSemaphoreBenchmark.noWaiters:·gc.alloc.rate.norm avgt 10 16.000 ± 0.001 B/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.churn.Par_Eden_Space avgt 10 143.113 ± 149.378 MB/sec [info] AsyncSemaphoreBenchmark.noWaiters:·gc.churn.Par_Eden_Space.norm avgt 10 15.715 ± 16.429 B/op [info] AsyncSemaphoreBenchmark.noWaiters:·gc.count avgt 10 7.000 counts [info] AsyncSemaphoreBenchmark.noWaiters:·gc.time avgt 10 5.000 ms [info] AsyncSemaphoreBenchmark.waiters avgt 10 2518.887 ± 781.773 ns/op [info] AsyncSemaphoreBenchmark.waiters:·gc.alloc.rate avgt 10 278.906 ± 49.568 MB/sec [info] AsyncSemaphoreBenchmark.waiters:·gc.alloc.rate.norm avgt 10 104.685 ± 17.330 B/op [info] AsyncSemaphoreBenchmark.waiters:·gc.churn.Par_Eden_Space avgt 10 281.048 ± 157.356 MB/sec [info] AsyncSemaphoreBenchmark.waiters:·gc.churn.Par_Eden_Space.norm avgt 10 104.627 ± 55.264 B/op [info] AsyncSemaphoreBenchmark.waiters:·gc.count avgt 10 13.000 counts [info] AsyncSemaphoreBenchmark.waiters:·gc.time avgt 10 9.000 ms ``` JIRA Issues: CSL-8426, VM-4536 Differential Revision: https://phabricator.twitter.biz/D663434
- Loading branch information