Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing tool #11

Merged
merged 51 commits into from
Oct 27, 2021
Merged

Conversation

chinghongfang
Copy link
Collaborator

效能測試程式 (phase1) #7
快速建立效能測試的工具
使用方式:

# 先建立專案
./gradlew build
#  架設叢集
./docker/start_zk.sh
./docker/start_broker.sh zookeeper.connect=192.168.103.26:17095
./docker/start_broker.sh zookeeper.connect=192.168.103.26:17095
./docker/start_broker.sh zookeeper.connect=192.168.103.26:17095
# 開始測試
./bin/performance.sh --brokers 192.168.103.26:14334,192.168.103.26:10718,192.168.103.26:14416 --topic createNew --topicConfigs partitions:27,replicationFactor:3 --producers 5 --consumers 2 --records 100000 --recordSize 100000

(brokers選項要換成啟動docker後,給予的broker address)

功能如 效能測試程式 (phase1) #7 所述。

目前有遇到的問題:

  1. 要先下載kafka library
    目前是在script內檢查有否kafka library?如果沒有kafka_2.13-2.8.0這個資料夾,就會下載。
  2. producer傳送訊息速率
    現在的producer每發送完一筆訊息就會停1毫秒,或許會有更好的傳送方式。
  3. 檔案放置的資料夾
    目前將檔案放在app資料夾底下,不確定是否妥當。

竫泓 and others added 8 commits September 3, 2021 11:55
Producer每秒傳送一筆訊息,consumer接收後印出延時。
執行bin/performance.sh時:若檔案內沒有kafka_2.13-2.8.0資料夾,則會去下載。
Record bytes read/write per second, and latency. And print them out per second.
Implement configures:
--brokers, --bootstrapServers
--topic
--topicConfigs
--producers
--consumers
The producer thread sleeps for 1ms every time it send record. To avoid records stuck.
指定producer/consumer數量,指定訊息大小,指定topic參數
輸出每個producer/consumer,輸出/入(MB/second),latency
Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch.

Noted that we have introduced the gradle application plugin and so we can leverage it to run our application

See https://docs.gradle.org/current/userguide/application_plugin.html for more details

bin/performance.sh Outdated Show resolved Hide resolved
app/src/main/java/org/astraea/performance/Performance.java Outdated Show resolved Hide resolved
@chinghongfang
Copy link
Collaborator Author

謝謝學長的指教,
瞭解,我會改用gradle啟動的。
沒問題,我會再把動作整理出來,然後補上測試。

@chinghongfang chinghongfang mentioned this pull request Sep 17, 2021
7 tasks
Warm up consumers to make the metrics more accurate.
Split producers and consumers start up from main.
Test consumers and producers creating.
Copy link
Collaborator Author

@chinghongfang chinghongfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改命名問題,改寫主程式邏輯。

更新程式碼,改利用ThreadPoolTopicAdmin
使用embedded kafka 進行測試。

README.md Show resolved Hide resolved
Copy link
Collaborator

@garyparrot garyparrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感謝更新,README.md 好像有些格式壞了

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Change unit test environment from fake producer/consumer to real Kafka
producer/consumer connects to an embedded Kafka cluster.
Higher Consumer `poll()` duration in consumer thread to 10 seconds.
1. Change Metric.java to thread-safe object. Add a new function to update all
data to prevent from data inconsistency.
2. Consumer with the same group-id should subscribe the same topic. So the
consumers created by the componentFactory have the same group-id and
subscribe to the same topics. More, the producers created by the
componentFactory send to the same topic.
Copy link
Collaborator Author

@chinghongfang chinghongfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates:

  • Metric thread-safe
  • Metric data consistency
  • Component factory create consumer/producer using the same topic

Copy link
Collaborator Author

@chinghongfang chinghongfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已經修正提到的問題。
且新增partitioner的入口。讓自定義的partitioner可以嵌入評測用的producer中(使用方法),方便#8 的partitioner做測試。
還請不吝指教。

Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinghongfang 感謝更新,幾個小問題

另外也請用舊的perf tool和新的tool比較一下,看延遲和吞吐量是否有差異,謝謝

static ThreadPool.Executor consumerExecutor(Consumer consumer, Metrics metrics) {
return new ThreadPool.Executor() {
@Override
public void execute() throws InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterruptedException is not used.

byte[] payload = new byte[param.recordSize];
return new ThreadPool.Executor() {
@Override
public void execute() throws InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterruptedException is useless

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will check through the code for unused again.

validateWith = ArgumentUtil.NotEmptyString.class)
String JMXAddress = "0.0.0.0@0";

public Map<String, Object> perfProps() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這個參數要餵給producer, consumer and admin

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,了解,我再修改。

Properties prop = new Properties();
prop.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
prop.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, partitionerName);
prop.put("jmx_servers", JMXAddress);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這邊要考慮一下JMXAddress不存在的時候該怎麼辦?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這裡是要給partitioner使用的參數,也只有custom partitioner 會使用這項參數。這裡想交由partitioner做判斷,看partitioner是否接受無JMXAddress

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我沒表達清楚。

想像一下之後的情境,我們應該會測試poison partitioner和其他partitioners的比較,換言之,前者我們會設定jmx address,但後者我們不會設定。因此這邊應該要判斷是否有jmx address來決定要用哪個partitioner

當然這個可以到下一個PR再來處理,那麼我們應該先不要引入jmx address這個參數,下一隻PR在處理

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

瞭解,這裡就先不引入jmx address,之後再來處理。

private long num;
private long max;
private long min;
private final LongAdder bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因為所有public methods都已經使用synchronized,似乎這裡也就不用LongAdder

@chinghongfang
Copy link
Collaborator Author

比較End2EndLatency.java和Performance.java延遲和吞吐量

給定參數實測結果比較

End2EndLatency:
20 seconds, 3 producers, 1MByte record size, send at full speed.
(整理過的輸出)

[producer latency] throughput: 648.51 MB/s avg_latency: 96.14 ms
[consumer latency] throughput: 47.88 MB/s avg_latency: 578.46 ms
[producer latency] throughput: 810.64 MB/s avg_latency: 91.46 ms
[consumer latency] throughput: 221.39 MB/s avg_latency: 1248.29 ms
[producer latency] throughput: 846.88 MB/s avg_latency: 95.08 ms
[consumer latency] throughput: 349.26 MB/s avg_latency: 1500.71 ms
[producer latency] throughput: 869.24 MB/s avg_latency: 96.49 ms
[consumer latency] throughput: 490.89 MB/s avg_latency: 1603.53
[producer latency] throughput: 826.33 MB/s avg_latency: 95.95 ms
[consumer latency] throughput: 485.94 MB/s avg_latency: 1540.38 ms
[producer latency] throughput: 720.11 MB/s avg_latency: 115.27 ms
[consumer latency] throughput: 426.87 MB/s avg_latency: 1538.52 ms
[producer latency] throughput: 629.82 MB/s avg_latency: 126.6 ms
[consumer latency] throughput: 419.57 MB/s avg_latency: 1429.3 ms
[producer latency] throughput: 643.69 MB/s avg_latency: 127.42 ms
[consumer latency] throughput: 481.62 MB/s avg_latency: 1299.04 ms
[producer latency] throughput: 674.71 MB/s avg_latency: 122.13 ms
[consumer latency] throughput: 465.49 MB/s avg_latency: 1264.74 ms
[producer latency] throughput: 610.45 MB/s avg_latency: 122.13 ms

EndToEndLatency2
發送的延遲很穩定,在90~120ms。端到端的延遲就會比較大,在1200~1500ms。執行20秒。

Performance:
3 producers, 10000 records, 1MByte record size, send 1000 records/second/producer.
(整理過的輸出)

[producer latency] throughput: 770.569MB/second average latency: 2.542ms
[consumer latency] throughput: 707.626MB/second average latency: 84.503ms
[producer latency] throughput: 842.094MB/second average latency: 2.442ms
[consumer latency] throughput: 902.175MB/second average latency: 51.894ms
[producer latency] throughput: 926.971MB/second average latency: 2.294ms
[consumer latency] throughput: 789.642MB/second average latency: 56.278ms
[producer latency] throughput: 905.991MB/second average latency: 2.242ms
[consumer latency] throughput: 965.117MB/second average latency: 74.231ms
[producer latency] throughput: 926.018MB/second average latency: 2.196ms
[consumer latency] throughput: 1007.079MB/second average latency: 72.079ms
[producer latency] throughput: 988.960MB/second average latency: 2.127ms
[consumer latency] throughput: 987.052MB/second average latency: 59.402ms
[producer latency] throughput: 968.933MB/second average latency: 2.091ms
[consumer latency] throughput: 970.839MB/second average latency: 50.905ms
[producer latency] throughput: 967.979MB/second average latency: 2.062ms
[consumer latency] throughput: 966.071MB/second average latency: 44.628ms
[producer latency] throughput: 988.007MB/second average latency: 2.032ms
[consumer latency] throughput: 988.959MB/second average latency: 39.673ms
[producer latency] throughput: 930.786MB/second average latency: 2.023ms
[consumer latency] throughput: 931.739MB/second average latency: 35.992ms
[producer latency] throughput: 46.730MB/second average latency: 2.018ms
[consumer latency] throughput: 46.730MB/second average latency: 35.823ms
[producer latency] throughput: 158.310MB/second average latency: 2.424ms
[consumer latency] throughput: 158.310MB/second average latency: 35.688ms
[producer latency] throughput: 51.498MB/second average latency: 2.810ms
[consumer latency] throughput: 51.498MB/second average latency: 35.901ms
[producer latency] throughput: 54.359MB/second average latency: 3.070ms
[consumer latency] throughput: 54.359MB/second average latency: 35.973ms
[producer latency] throughput: 11.444MB/second average latency: 3.070ms
[consumer latency] throughput: 11.444MB/second average latency: 36.005ms

Performance
發送的延遲同樣很穩定,在2~3ms。端到端的延遲也會相較發送延遲大,約在30~70ms。Produce/consume 約執行15秒。

兩者在相近吞吐量下,前者(End2End)延遲較大,推測前者會達到consumer消費速率上限,所以造成訊息囤積,而後者(Performance)還尚未達到消費速率上限。
後者(Performance)調高發送密度後,也是1000多毫秒的延遲,可知訊息堆積後,兩者的延遲相近。

從圖片來看:
前者(End2End)收/送曲線較為分離,推測是因為consumer建立要花較長時間,所以producer傳送資料時,還沒建立好。
後者(Performance)收/送曲線較緊密,是因為有等待consumer建立。

功能比較:

End2EndLatency:

  • 全力發送
  • 指定持續時間
  • 自動topic設定
  • 發送隨機訊息

Performance:

  • 間隔發送
  • 指定發送數量
  • 可指定topic設定
  • 發送固定訊息

@chia7712
Copy link
Contributor

間隔發送

確認一下,間隔發送的話要如何確定有測到叢集的速度上限?

@chia7712
Copy link
Contributor

自動topic設定

舊版的是可以指定也可以自動產生。新版的建議也要這樣做比較方便。

指定發送數量

在正式測試上會比較仰賴時間,一來時間驗證是很重要的事情(例如跑了一小時後速度保持穩定)。因此會建議一定要有時間設定,records數量倒是其次,有的話也很棒

發送固定訊息

建議要用隨機內容,這樣在測試壓縮時的速度時才不會失真

@chinghongfang
Copy link
Collaborator Author

間隔發送的話要如何確定有測到叢集的速度上限?

無法確定是否到上限。
這裡是我考量錯誤了,我錯當成模擬情境了,應該要測出上限才是。

舊版的是可以指定也可以自動產生。新版的建議也要這樣做比較方便。

好的,這樣是比使用預設固定值好。

在正式測試上會比較仰賴時間,一來時間驗證是很重要的事情(例如跑了一小時後速度保持穩定)。因此會建議一定要有時間設定,records數量倒是其次,有的話也很棒

瞭解,我再修改。

建議要用隨機內容,這樣在測試壓縮時的速度時才不會失真

原來如此!當時想著要盡量減少資料處理的時間,漏掉壓縮的差異。

@chia7712
Copy link
Contributor

@chinghongfang 上面討論的那些可以在後續的PRs處理,先記錄到issue裡面或開新的也行。

我想先合併這隻,然後把舊的清掉

@chinghongfang
Copy link
Collaborator Author

好的,那我就先處理test failed。

@chia7712
Copy link
Contributor

@chinghongfang 麻煩把我們上述討論的後續題目貼到issues去喔,不然一定會忘記@@

@chia7712 chia7712 merged commit f5a9631 into opensource4you:main Oct 27, 2021
This was referenced Oct 27, 2021
@chinghongfang chinghongfang deleted the performance branch November 1, 2021 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants