Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experiment report #1304

Merged
merged 3 commits into from
Dec 21, 2022
Merged

Conversation

chinghongfang
Copy link
Collaborator

@chinghongfang chinghongfang commented Dec 18, 2022

完成 #1236 後,紀錄實驗結果,並修正 #1236 提到錯誤的算法

BuiltInPartitioner 選擇 partitioner 的算法,它算機率是用最大的隊伍長度來算的,當時記憶錯誤,記成是用隊伍總長度來計算。所以這裡重新解釋先前提到的例子:


Built-In Partitioner 會根據 accumulator 內,batch 排隊的長度作為選擇 partition 的機率。

然後這是我們配置的 partition

# of partitions B1 B2 B3
testing-2 16 16 4

這裡舉個例子來解釋:假設 B1, B2 都很壅塞,accumulator 內要發給 B1, B2 partition 的訊息都堆積了 100 筆,然後要發給 B3 partition 的訊息沒有堆積。

接下來要算每個 partition 被選擇到的機率:
B1 有 16 partitions, accumulator 內各堆積 100 筆
B2 有 16 partitions, accumulator 內各堆積 100 筆
B3 有 4 partitions, accumulator 內沒有堆積

最大堆積長度:100
每個 partition 會用 (最大堆積長度 + 1 ) 減去 堆積長度,來作為選中該 partition 的權重。
所以 B1 的每個 partition 都有 (101-100) = 1 被選中的權重
B2 的每個 partition 都有 (101-100) = 1 被選中的權重
B3 的每個 partition 都有 (101-0) = 101 被選中的權重

1*16+1*16+101*4 = 436
所以選中 B1 的 partition 的機率是 16/436
選中 B2 的 partition 的機率是 16/436
選中 B1 的 partition 的機率是 404/436


BuiltInPartitioner 來說, topic partition 的分佈雖然也有對機率有影響,但也不是絕對的。

另外,回答之前的問題,多次實驗觀察發現,不一定只有 B3 出現較低的吞吐量 (這一次的實驗結果是 B2 吞吐量較低),目前猜測是 BuiltInPartitioner 間形成的 "動態平衡" 。

@chia7712
Copy link
Contributor

(101-100) = 1 被選中的權重

請問101是怎麼來的?

116+116+101*4 = 436

請問 116 是怎麼來的

@chinghongfang
Copy link
Collaborator Author

請問101是怎麼來的?

Max queue length +1,我再文件裡面說明清楚。

請問 116 是怎麼來的

不好意思這邊打錯了,少打了跳脫字元,正確算式是 1*16+1*16+101*4

Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

記得把報告加 https://github.com/skiptests/astraea/tree/main/docs/dispatcher

另外也要把測試時的 revision 加進去,前面有兩份報告沒有加上 revision,看看是否能補一下,如果還記得的話

Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chinghongfang chinghongfang linked an issue Dec 21, 2022 that may be closed by this pull request
@chinghongfang chinghongfang merged commit 3d72d7b into opensource4you:main Dec 21, 2022
@chinghongfang chinghongfang deleted the experiment branch January 10, 2023 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kafka Producer metrics 有時會出現 NaN 的統計值
2 participants