-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable channel auto read when publish rate or publish buffer exceeded #6550
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sijie
approved these changes
Mar 18, 2020
jiazhai
pushed a commit
that referenced
this pull request
Mar 20, 2020
#6550) Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)
tuteng
pushed a commit
to AmateurEvents/pulsar
that referenced
this pull request
Mar 21, 2020
apache#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)
tuteng
pushed a commit
that referenced
this pull request
Apr 6, 2020
#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)
tuteng
pushed a commit
that referenced
this pull request
Apr 13, 2020
#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)
jiazhai
pushed a commit
to jiazhai/pulsar
that referenced
this pull request
May 18, 2020
apache#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)
huangdx0726
pushed a commit
to huangdx0726/pulsar
that referenced
this pull request
Aug 24, 2020
apache#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
release/2.5.1
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem.
Enable publish rate limit in broker.conf
Use Pulsar perf to test 100 partition message publishing:
The test result:
Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message.
This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message.
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation