Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) add FixedThreadsExecutorGroup #168 #170

Merged
merged 55 commits into from
Jul 5, 2019
Merged

Conversation

fengjiachun
Copy link
Contributor

@fengjiachun fengjiachun commented May 22, 2019

Motivation:

注:暂不合并,看测试数据

优化 append-entries 线程模型,试图减少上下文切换和锁竞争

Modification:

主要增加 FixedThreadsExecutorGroup 和 SingleThreadExecutor 类,并可兼容 AppendEntriesRequestProcessor 中原来使用的 DefaultEventExecutor

Result:

Fixes #168 #158

@fengjiachun
Copy link
Contributor Author

fengjiachun commented May 22, 2019

TODO:

  1. 补充单元测试
  2. 提供一个基于 Mpsc queue 的 SingleThreadExecutor 实现

@fengjiachun
Copy link
Contributor Author

fengjiachun commented May 23, 2019

添加一个 MpscSingleThreadExecutor,基于 jctools 的 Mpsc queue 实现多生产者单消费者模型的线程池
以下是一个简单的 benchmark 数据,32 个线程产生 100w 个 task, 每个 SingleThreadExecutor 执行任务消耗的时间

3 个 SingleThreadExecutor 分别对应

  • DefaultSingleThreadExecutor:基于 LinkedBlockingQueue 的 一个 Single Thread 的 ThreadPoolExecutor
  • Netty 的 DefaultEventExecutor
  • MpscSingleThreadExecutor
     * default_single_thread_executor 1222 ms
     * netty_default_event_executor   623 ms
     * mpsc_single_thread_executor    271 ms
     *
     * default_single_thread_executor 1331 ms
     * netty_default_event_executor   616 ms
     * mpsc_single_thread_executor    269 ms
     *
     * default_single_thread_executor 1231 ms
     * netty_default_event_executor   588 ms
     * mpsc_single_thread_executor    270 ms

fengjiachun and others added 25 commits May 23, 2019 23:08
* (feat) refactor ThreadId and replicator

* (feat) Adds javadoc
* (feat) add AdaptiveBufAllocator

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (fix) rename method name

* (fix) minor fix

* (fix) add metric for recyclers (#164)

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment

* feat/zero copy with replicator (#167)

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric

* (fix) by review comment

* (feat) add unit test AdaptiveBufAllocatorTest

* (feat) add unit test RecyclersTest

* (feat) add unit test RecyclableByteBufferListTest

* (feat) add unit test ByteBufferCollectorTest
These tests were written using Diffblue Cover.
@fengjiachun
Copy link
Contributor Author

新增了几个测试场景,主要是基于 MpscSingleThreadExecutor 使用了不同的 queue,从这个测试结果也可以看出 MpscSingleThreadExecutor 之所以快一些,使用 jctools 的 mpsc queue 也不是唯一的原因

  • default_single_thread_executor:
    • 基于 LinkedBlockingQueue 的 一个 Single Thread 的 ThreadPoolExecutor,可以看到这个场景里性能最差
  • netty_default_event_executor:
    • Netty 的 DefaultEventExecutor,这个场景里倒数第二,但是相比 ThreadPoolExecutor 也有明显提升
  • mpsc_single_thread_executor:
    • 默认基于 mpsc queue 的 MpscSingleThreadExecutor,这个场景里性能第一
  • mpsc_single_thread_executor_concurrent_linked_queue:
    • 基于 ConcurrentLinkedQueue 的 MpscSingleThreadExecutor,这个场景里与 LinkedTransferQueue 并列第二,缺点是无法设置容量,无界队列
  • mpsc_single_thread_executor_linked_blocking_queue:
    • 基于 LinkedBlockingQueue 的 MpscSingleThreadExecutor,性能和 DefaultEventExecutor 差不多
  • mpsc_single_thread_executor_linked_transfer_queue:
    • 基于 LinkedTransferQueue 的 MpscSingleThreadExecutor,这个场景里与 ConcurrentLinkedQueue 并列第二,缺点是无法设置容量,无界队列
     * default_single_thread_executor                      1259 ms
     * netty_default_event_executor                        596 ms
     * mpsc_single_thread_executor                         270 ms
     * mpsc_single_thread_executor_concurrent_linked_queue 324 ms
     * mpsc_single_thread_executor_linked_blocking_queue   535 ms
     * mpsc_single_thread_executor_linked_transfer_queue   322 ms
     *
     * default_single_thread_executor                      1277 ms
     * netty_default_event_executor                        608 ms
     * mpsc_single_thread_executor                         273 ms
     * mpsc_single_thread_executor_concurrent_linked_queue 321 ms
     * mpsc_single_thread_executor_linked_blocking_queue   476 ms
     * mpsc_single_thread_executor_linked_transfer_queue   335 ms
     *
     * default_single_thread_executor                      1235 ms
     * netty_default_event_executor                        619 ms
     * mpsc_single_thread_executor                         265 ms
     * mpsc_single_thread_executor_concurrent_linked_queue 320 ms
     * mpsc_single_thread_executor_linked_blocking_queue   509 ms
     * mpsc_single_thread_executor_linked_transfer_queue   328 ms

@masaimu
Copy link
Contributor

masaimu commented May 25, 2019

除了 benchmark 里 "NEETY_EXECUTOR" -> "NETTY_EXECUTOR" 的错别字外,其他的还要细看 😂

@fengjiachun
Copy link
Contributor Author

除了 benchmark 里 "NEETY_EXECUTOR" -> "NETTY_EXECUTOR" 的错别字外,其他的还要细看 😂

我改下 😸

@masaimu
Copy link
Contributor

masaimu commented May 25, 2019

有一个很有趣的现象,在 benchmark 中,当我如下排列参赛选手的出场顺序时,他们的成绩会相应缩小

executors.put("mpsc_single_thread_executor    ", MPSC_EXECUTOR);
executors.put("netty_default_event_executor   ", NEETY_EXECUTOR);
executors.put("default_single_thread_executor ", DEFAULT);
mpsc_single_thread_executor    477 ms
netty_default_event_executor   491 ms
default_single_thread_executor 879 ms

而当把 warmup 的次数调大到 10,000,000 之后,他们的差距会进一步缩小:

mpsc_single_thread_executor    194 ms
netty_default_event_executor   242 ms
default_single_thread_executor 465 ms

所以我怀疑 warmup 也在 benchmark 中起到了影响的作用,尤其是在 warmup 次数不足时,越靠后的选手越是得利,尽管如此,MpscSingleThreadExecutor 依然在性能上有着更好的表现。

@fengjiachun
Copy link
Contributor Author

有一个很有趣的现象,在 benchmark 中,当我如下排列参赛选手的出场顺序时,他们的成绩会相应缩小

executors.put("mpsc_single_thread_executor    ", MPSC_EXECUTOR);
executors.put("netty_default_event_executor   ", NEETY_EXECUTOR);
executors.put("default_single_thread_executor ", DEFAULT);
mpsc_single_thread_executor    477 ms
netty_default_event_executor   491 ms
default_single_thread_executor 879 ms

而当把 warmup 的次数调大到 10,000,000 之后,他们的差距会进一步缩小:

mpsc_single_thread_executor    194 ms
netty_default_event_executor   242 ms
default_single_thread_executor 465 ms

所以我怀疑 warmup 也在 benchmark 中起到了影响的作用,尤其是在 warmup 次数不足时,越靠后的选手越是得利,尽管如此,MpscSingleThreadExecutor 依然在性能上有着更好的表现。

👍非常关键的发现,我仔细检查了代码,怀疑主要原因应该是 producers 使用了 ThreadPoolExecutor 和 LinkedBlockingQueue,由于 producers 是全局共享的,运行次数很多,间接帮助 default_single_thread_executor 做了预热,所以把它放的越靠后性能越好,我重写了 benchmark 代码,借助 jmh,应该不会再有外界因素影响了

@fengjiachun
Copy link
Contributor Author

fengjiachun commented May 25, 2019

jmh 的测试结果
每个 ops 表示 32 个生产者(thread)投递 100w 个 task 并由单消费者消费完成的操作

cnt=3

Benchmark                                                                         Mode  Cnt  Score   Error  Units
SingleThreadExecutorBenchmark.defaultSingleThreadPollExecutor                    thrpt    3  1.266 ± 2.822  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutor                           thrpt    3  4.066 ± 4.990  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithConcurrentLinkedQueue  thrpt    3  3.470 ± 0.845  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithLinkedBlockingQueue    thrpt    3  2.643 ± 1.222  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithLinkedTransferQueue    thrpt    3  3.266 ± 1.613  ops/s
SingleThreadExecutorBenchmark.nettyDefaultEventExecutor                          thrpt    3  2.290 ± 0.446  ops/s

cnt=10

Benchmark                                                                         Mode  Cnt  Score   Error  Units
SingleThreadExecutorBenchmark.defaultSingleThreadPollExecutor                    thrpt   10  1.389 ± 0.130  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutor                           thrpt   10  3.646 ± 0.323  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithConcurrentLinkedQueue  thrpt   10  3.386 ± 0.247  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithLinkedBlockingQueue    thrpt   10  2.535 ± 0.153  ops/s
SingleThreadExecutorBenchmark.mpscSingleThreadExecutorWithLinkedTransferQueue    thrpt   10  3.184 ± 0.299  ops/s
SingleThreadExecutorBenchmark.nettyDefaultEventExecutor                          thrpt   10  2.097 ± 0.075  ops/s

@killme2008
Copy link
Contributor

@fengjiachun 存在冲突

@fengjiachun
Copy link
Contributor Author

@fengjiachun 存在冲突

done

@killme2008 killme2008 merged commit 0ff2b1f into master Jul 5, 2019
@killme2008 killme2008 deleted the feat/event_loop branch July 5, 2019 02:01
@fengjiachun fengjiachun mentioned this pull request Aug 15, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(feat) Log replication thread model
4 participants