Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for kafka #1085

Merged
merged 5 commits into from
Mar 17, 2023
Merged

Add benchmarks for kafka #1085

merged 5 commits into from
Mar 17, 2023

Conversation

rukai
Copy link
Member

@rukai rukai commented Mar 16, 2023

Adds benchmarks for kafka using the kafka-producer-perf-test.sh script that comes with kafka.
The results of kafka_bench.rs is that shotover has the same throughput as a direct connection:

shotover-proxy〉cargo run --release --example kafka_bench
   Compiling test-helpers v0.1.0 (/home/rukai2/Projects/Crates/shotover/shotover-proxy/test-helpers)
   Compiling shotover-proxy v0.1.9 (/home/rukai2/Projects/Crates/shotover/shotover-proxy/shotover-proxy)
    Finished release [optimized] target(s) in 22.04s
     Running `target/release/examples/kafka_bench`
shotover   01:21:27.856420Z  INFO shotover_proxy::runner: Starting Shotover 0.1.9
shotover   01:21:27.856442Z  INFO shotover_proxy::runner: configuration=Config { main_log_level: "info,shotover_proxy=info", observability_interface: "0.0.0.0:9001" }
shotover   01:21:27.856449Z  INFO shotover_proxy::runner: topology=Topology { sources: {"kafka_source": Kafka(KafkaConfig { listen_addr: "127.0.0.1:9192", connection_limit: None, hard_connection_limit: None, tls: None, timeout: None })}, chain_config: {"main_chain": [KafkaSinkSingle(KafkaSinkSingleConfig { address: "127.0.0.1:9092", connect_timeout_ms: 3000, read_timeout: None })]}, source_to_chain_mapping: {"kafka_source": "main_chain"} }
shotover   01:21:27.856471Z  INFO shotover_proxy::config::topology: Loaded chains ["main_chain"]
shotover   01:21:27.856483Z  INFO shotover_proxy::sources::kafka: Starting Kafka source on [127.0.0.1:9192]
shotover   01:21:27.856509Z  INFO shotover_proxy::config::topology: Loaded sources [["kafka_source"]] and linked to chains
shotover   01:21:27.856527Z  INFO shotover_proxy::server: accepting inbound connections
Benching Shotover ...
[2023-03-16 12:21:28,611] WARN [Producer clientId=perf-producer-client] Error while fetching metadata with correlation id 1 : {foo=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
643164 records sent, 128632.8 records/sec (122.67 MB/sec), 105.4 ms avg latency, 587.0 ms max latency.
877718 records sent, 175543.6 records/sec (167.41 MB/sec), 0.6 ms avg latency, 11.0 ms max latency.
882304 records sent, 176460.8 records/sec (168.29 MB/sec), 0.4 ms avg latency, 5.0 ms max latency.
897520 records sent, 179504.0 records/sec (171.19 MB/sec), 0.4 ms avg latency, 3.0 ms max latency.
874635 records sent, 174927.0 records/sec (166.82 MB/sec), 0.4 ms avg latency, 3.0 ms max latency.
5000000 records sent, 168361.505825 records/sec (160.56 MB/sec), 13.95 ms avg latency, 587.00 ms max latency, 0 ms 50th, 14 ms 95th, 402 ms 99th, 561 ms 99.9th.
shotover   01:21:58.470692Z  INFO shotover_proxy::runner: received SIGTERM
shotover   01:21:58.470766Z  INFO shotover_proxy::runner: Shotover was shutdown cleanly.

Benching Direct Kafka ...
[2023-03-16 12:22:04,936] WARN [Producer clientId=perf-producer-client] Error while fetching metadata with correlation id 1 : {foo=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)
652616 records sent, 130523.2 records/sec (124.48 MB/sec), 96.1 ms avg latency, 538.0 ms max latency.
868250 records sent, 173650.0 records/sec (165.61 MB/sec), 0.4 ms avg latency, 7.0 ms max latency.
872695 records sent, 174539.0 records/sec (166.45 MB/sec), 0.4 ms avg latency, 4.0 ms max latency.
889070 records sent, 177814.0 records/sec (169.58 MB/sec), 0.5 ms avg latency, 7.0 ms max latency.
894402 records sent, 178880.4 records/sec (170.59 MB/sec), 0.5 ms avg latency, 3.0 ms max latency.
5000000 records sent, 168953.166182 records/sec (161.13 MB/sec), 12.95 ms avg latency, 538.00 ms max latency, 0 ms 50th, 5 ms 95th, 381 ms 99th, 510 ms 99.9th.

However I do not really trust these results, I suspect the bencher is only single threaded and therefore isnt putting a proper load on shotover or kafka.
We can further investigate a better benchmarker but for now lets land these benches as they are a useful starting point.

The benchmarks also do not measure encoding/decoding time as a simple shotover setup does not trigger encoding/decoding.

@rukai rukai requested a review from conorbros March 16, 2023 23:05
@conorbros
Copy link
Member

I wonder if it's more worth our time starting with a mock kafka like we have for Cassandra.

@rukai
Copy link
Member Author

rukai commented Mar 17, 2023

Yep, I think a mock kafka would be nice, ultimately we still need a real kafka bench as well because it demonstrates performance in a more realistic scenario.

@rukai rukai enabled auto-merge (squash) March 17, 2023 00:55
@github-actions
Copy link

15 benchmark regressed. 0 benchmark improved. Please check the benchmark workflow logs for full details: https://github.com/shotover/shotover-proxy/actions/runs/4443429016

cassandra/protect_local_select_unencrypted
                        time:   [1.0151 ms 1.0674 ms 1.1238 ms]
                        thrpt:  [889.81  elem/s 936.88  elem/s 985.08  elem/s]
                 change:
                        time:   [+33.278% +40.051% +47.491%] (p = 0.00 < 0.05)
                        thrpt:  [-32.199% -28.598% -24.969%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:26:33.898793Z�[0m  �[32mINFO�[0m �[2;39mshotover_proxy::server�[0m�[2;39m:�[0m accepting inbound connections
cassandra/tls_insert    time:   [758.12 µs 781.69 µs 809.95 µs]
                        thrpt:  [1.2346 Kelem/s 1.2793 Kelem/s 1.3190 Kelem/s]
                 change:
                        time:   [+20.028% +25.722% +31.298%] (p = 0.00 < 0.05)
                        thrpt:  [-23.837% -20.460% -16.686%]
                        Performance has regressed.
--
cassandra/protect_local_insert_encrypted
                        time:   [1.2557 ms 1.3352 ms 1.4253 ms]
                        thrpt:  [701.60  elem/s 748.98  elem/s 796.36  elem/s]
                 change:
                        time:   [+41.731% +52.074% +65.457%] (p = 0.00 < 0.05)
                        thrpt:  [-39.561% -34.243% -29.444%]
                        Performance has regressed.
--
cassandra/protect_local_select_encrypted
                        time:   [1.1333 ms 1.1943 ms 1.2617 ms]
                        thrpt:  [792.55  elem/s 837.32  elem/s 882.36  elem/s]
                 change:
                        time:   [+32.872% +42.112% +51.104%] (p = 0.00 < 0.05)
                        thrpt:  [-33.820% -29.633% -24.740%]
                        Performance has regressed.
--
cassandra/request_throttling_insert
                        time:   [1.2914 ms 1.3827 ms 1.4742 ms]
                        thrpt:  [678.33  elem/s 723.24  elem/s 774.34  elem/s]
                 change:
                        time:   [+68.643% +79.877% +93.845%] (p = 0.00 < 0.05)
                        thrpt:  [-48.413% -44.407% -40.703%]
                        Performance has regressed.
--
cassandra/request_throttling_select
                        time:   [1.1895 ms 1.2573 ms 1.3280 ms]
                        thrpt:  [753.04  elem/s 795.33  elem/s 840.70  elem/s]
                 change:
                        time:   [+55.578% +64.947% +74.168%] (p = 0.00 < 0.05)
                        thrpt:  [-42.584% -39.374% -35.724%]
                        Performance has regressed.
--
cassandra/request_throttling_execute
                        time:   [740.49 µs 768.58 µs 799.48 µs]
                        thrpt:  [1.2508 Kelem/s 1.3011 Kelem/s 1.3505 Kelem/s]
                 change:
                        time:   [+51.675% +67.872% +96.085%] (p = 0.00 < 0.05)
                        thrpt:  [-49.002% -40.431% -34.070%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:28:39.389761Z�[0m  �[32mINFO�[0m �[1;39mconnection�[0m�[1;39m{�[0mid�[2;39m=�[0m1 source�[2;39m=�[0m"RedisSource"�[1;39m}�[0m�[2;39m:�[0m �[2;39mshotover_proxy::transforms::chain�[0m�[2;39m:�[0m Buffered chain three was shutdown
redis/active_set        time:   [290.71 µs 298.39 µs 306.61 µs]
                        thrpt:  [3.2615 Kelem/s 3.3513 Kelem/s 3.4399 Kelem/s]
                 change:
                        time:   [+20.094% +24.178% +28.746%] (p = 0.00 < 0.05)
                        thrpt:  [-22.328% -19.471% -16.732%]
                        Performance has regressed.
--
  2 (2.00%) high mild
redis/active_get        time:   [333.80 µs 347.05 µs 360.20 µs]
                        thrpt:  [2.7762 Kelem/s 2.8814 Kelem/s 2.9958 Kelem/s]
                 change:
                        time:   [+32.547% +37.020% +41.972%] (p = 0.00 < 0.05)
                        thrpt:  [-29.564% -27.018% -24.555%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:29:18.687363Z�[0m  �[32mINFO�[0m �[2;39mshotover_proxy::server�[0m�[2;39m:�[0m accepting inbound connections
redis/cluster_set       time:   [177.83 µs 184.57 µs 192.09 µs]
                        thrpt:  [5.2058 Kelem/s 5.4180 Kelem/s 5.6232 Kelem/s]
                 change:
                        time:   [+36.748% +45.091% +57.326%] (p = 0.00 < 0.05)
                        thrpt:  [-36.438% -31.078% -26.873%]
                        Performance has regressed.
--
  3 (3.00%) high severe
redis/cluster_get       time:   [133.81 µs 136.93 µs 140.08 µs]
                        thrpt:  [7.1388 Kelem/s 7.3030 Kelem/s 7.4733 Kelem/s]
                 change:
                        time:   [+36.022% +39.400% +42.924%] (p = 0.00 < 0.05)
                        thrpt:  [-30.033% -28.264% -26.483%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:29:48.822522Z�[0m  �[32mINFO�[0m �[2;39mshotover_proxy::server�[0m�[2;39m:�[0m accepting inbound connections
redis/passthrough_set   time:   [189.20 µs 194.86 µs 201.18 µs]
                        thrpt:  [4.9707 Kelem/s 5.1319 Kelem/s 5.2854 Kelem/s]
                 change:
                        time:   [+30.113% +33.073% +36.141%] (p = 0.00 < 0.05)
                        thrpt:  [-26.547% -24.854% -23.143%]
                        Performance has regressed.
--
  1 (1.00%) high severe
redis/passthrough_get   time:   [192.28 µs 197.19 µs 202.26 µs]
                        thrpt:  [4.9441 Kelem/s 5.0713 Kelem/s 5.2007 Kelem/s]
                 change:
                        time:   [+30.245% +34.257% +38.432%] (p = 0.00 < 0.05)
                        thrpt:  [-27.762% -25.516% -23.222%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:30:15.653932Z�[0m  �[32mINFO�[0m �[2;39mshotover_proxy::server�[0m�[2;39m:�[0m accepting inbound connections
redis/single_tls        time:   [226.14 µs 239.65 µs 254.60 µs]
                        thrpt:  [3.9277 Kelem/s 4.1727 Kelem/s 4.4220 Kelem/s]
                 change:
                        time:   [+32.868% +38.696% +44.716%] (p = 0.00 < 0.05)
                        thrpt:  [-30.899% -27.900% -24.738%]
                        Performance has regressed.
--
�[2;39mshotover  �[0m �[2;39m02:30:44.420745Z�[0m  �[32mINFO�[0m �[2;39mshotover_proxy::server�[0m�[2;39m:�[0m accepting inbound connections
redis/cluster_tls       time:   [204.07 µs 214.25 µs 224.65 µs]
                        thrpt:  [4.4514 Kelem/s 4.6675 Kelem/s 4.9003 Kelem/s]
                 change:
                        time:   [+34.314% +40.187% +46.137%] (p = 0.00 < 0.05)
                        thrpt:  [-31.571% -28.667% -25.547%]
                        Performance has regressed.

@rukai rukai merged commit 6f2834e into shotover:main Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants