fix(cluster): do no use channel created for a different address #12305

deepthidevaki · 2023-04-06T07:16:45Z

Description

When only using IP to find a channel pool, we observed that a channel pool created for a node is re-used by another node which got its old ip. This could lead to unexpected situations, where messages are sent to wrong node.

To fix this, we now use both address and ip to find a channel pool. In a setup where restarts and ip re-assignment is common, it is safer this way.

Related issues

closes #12173

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Other teams:
If the change impacts another team an issue has been created for this team, explaining what they need to do to support this change.

Please refer to our review guidelines.

github-actions · 2023-04-06T07:34:58Z

Test Results

  952 files - 101   952 suites - 101 1h 7m 12s ⏱️ - 44m 52s
7 758 tests - 764 7 742 ✔️ - 758 16 💤 - 6 0 ❌ ±0
7 962 runs - 768 7 946 ✔️ - 762 16 💤 - 6 0 ❌ ±0

Results for commit 7c7dccc. ± Comparison against base commit 8b3bdc9.

This pull request removes 1213 and adds 449 tests. Note that renamed tests count towards both.

DmnEvaluationTest If successfully evaluated, the output ‑ Should return a message pack output[6] value={y=true, z=[1, 2, 3], x=1}
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[56] collection=[Left[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[57] collection=[Right[[B@2c04fea6], Left[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[58] collection=[Left[[B@2c04fea6], Right[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[59] collection=[Left[[B@2c04fea6], Left[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[60] collection=[Right[[B@2c04fea6], Right[[B@2c04fea6], Left[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[61] collection=[Right[[B@2c04fea6], Left[[B@2c04fea6], Right[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[62] collection=[Right[[B@2c04fea6], Left[[B@2c04fea6], Left[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[63] collection=[Left[[B@2c04fea6], Right[[B@2c04fea6], Right[[B@2c04fea6]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[64] collection=[Left[[B@2c04fea6], Right[[B@2c04fea6], Left[[B@2c04fea6]]
…

DmnEvaluationTest If successfully evaluated, the output ‑ Should return a message pack output[6] value={x=1, z=[1, 2, 3], y=true}
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[56] collection=[Left[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[57] collection=[Right[[B@2b383625], Left[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[58] collection=[Left[[B@2b383625], Right[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[59] collection=[Left[[B@2b383625], Left[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[60] collection=[Right[[B@2b383625], Right[[B@2b383625], Left[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[61] collection=[Right[[B@2b383625], Left[[B@2b383625], Right[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[62] collection=[Right[[B@2b383625], Left[[B@2b383625], Left[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[63] collection=[Left[[B@2b383625], Right[[B@2b383625], Right[[B@2b383625]]
EitherTest Streams of Eithers can be collected using .collector() ‑ Only Streams with Lefts are collected into a Left[64] collection=[Left[[B@2b383625], Right[[B@2b383625], Left[[B@2b383625]]
…

♻️ This comment has been updated with latest results.

When only using IP to find a channel pool, we observed that a channel pool created for a node is re-used by another node which got its old ip. This could lead to unexpected situations, where messages are sent to wrong node. To fix this, we now use both address and ip to find a channel pool. In a setup where restarts and ip re-assignment is common, it is safer this way.

If a new message is send while the channel is beingclosed, then it will use the existing channel and the test fails. So wait until it is closed before sending the new message

npepinpe · 2023-04-06T08:40:29Z

Weird that the backport flagged a checkstyle issue 🤔

npepinpe

👍 Sorry for the slow review

❓ I'm a little bit out of the topic again, but is there a risk we leave some garbage in the channel pool now? I know we remove the channel from the pool once it's closed, but I honestly can't tell if we can guarantee that channels are always closed, or that the listener is always called. I think this is why Netty has the IdleStateHandler you can add to the pipeline 🤷 Not a blocker, as this may not be a new problem anyway 😄
🔧 I'm not super keen on having to create a tuple every time we want to get a channel, so on every request. At the same time, I don't know if that really has some impact, so 🤷 Maybe interesting to TLAB allocs on day via Pyroscope.

npepinpe · 2023-04-11T18:59:42Z

atomix/cluster/src/main/java/io/atomix/cluster/messaging/impl/ChannelPool.java

-    final List<CompletableFuture<Channel>> channelPool = getChannelPool(address);
+    final InetAddress inetAddress = address.address();
+    if (inetAddress == null) {
+      final CompletableFuture<Channel> failedFuture = new OrderedFuture<>();


❓ I always wonder why we use OrderedFuture 🤔

I'm not sure. This part of the code used OrderedFuture, so I used it for consistency.

Is it maybe because we want to execute all the "listeners" which listen for the future completion in order on which they have been registered? 🤔 And the normal completable futruee doesn't guarantee this ? 🤔

Technically yes, but I never understood why this is such a big advantage 🤷 We could likely write everything without this constraint

github-actions · 2023-04-11T19:32:40Z

Setup

Deployed to measurement-4671156218

camunda-platform:
  zeebe:
    image:
      repository: gcr.io/zeebe-io/zeebe
      tag: dd-12173-channel-pool-benchmark-7c7dccc
  zeebe-gateway:
    image:
      repository: gcr.io/zeebe-io/zeebe
      tag: dd-12173-channel-pool-benchmark-7c7dccc
global:
  image:
    tag: dd-12173-channel-pool-benchmark-7c7dccc

Measurement before

Process Instance Execution Time: p99=1.421 p90=0.317 p50=0.081
Throughput: 150.246 PI/s
Grafana

Chaos injection

Deployed chaos network-latency-5

Measurement after

Process Instance Execution Time: p99=4.078 p90=2.180 p50=0.857
Throughput: 74.906 PI/s
Grafana

Details

See https://github.com/camunda/zeebe/actions/runs/4671156218

deepthidevaki · 2023-04-12T07:24:29Z

question I'm a little bit out of the topic again, but is there a risk we leave some garbage in the channel pool now? I know we remove the channel from the pool once it's closed, but I honestly can't tell if we can guarantee that channels are always closed, or that the listener is always called. I think this is why Netty has the IdleStateHandler you can add to the pipeline shrug Not a blocker, as this may not be a new problem anyway smile

If the channel is never closed, it is not removed from the pool. But I don't think that will happen quite often, right?

wrench I'm not super keen on having to create a tuple every time we want to get a channel, so on every request. At the same time, I don't know if that really has some impact, so shrug Maybe interesting to TLAB allocs on day via Pyroscope.

I would assume it doesn't have a significant impact. Th benchmark also doesn't show any impact.

deepthidevaki · 2023-04-12T07:25:30Z

bors merge

ghost · 2023-04-12T08:14:04Z

Build succeeded:

Test summary

backport-action · 2023-04-12T08:14:30Z

Successfully created backport PR for stable/8.0:

[Backport stable/8.0] fix(cluster): do no use channel created for a different address #12376

backport-action · 2023-04-12T08:14:36Z

Successfully created backport PR for stable/8.1:

[Backport stable/8.1] fix(cluster): do no use channel created for a different address #12377

backport-action · 2023-04-12T08:14:42Z

Successfully created backport PR for stable/8.2:

[Backport stable/8.2] fix(cluster): do no use channel created for a different address #12378

12378: [Backport stable/8.2] fix(cluster): do no use channel created for a different address r=deepthidevaki a=backport-action # Description Backport of #12305 to `stable/8.2`. relates to #12173 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>

12377: [Backport stable/8.1] fix(cluster): do no use channel created for a different address r=deepthidevaki a=backport-action # Description Backport of #12305 to `stable/8.1`. relates to #12173 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>

12376: [Backport stable/8.0] fix(cluster): do no use channel created for a different address r=deepthidevaki a=backport-action # Description Backport of #12305 to `stable/8.0`. relates to #12173 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>

deepthidevaki added backport stable/8.2 labels Apr 6, 2023

deepthidevaki marked this pull request as draft April 6, 2023 07:30

deepthidevaki added 4 commits April 6, 2023 10:11

test(cluster): verify channel mapping for different nodes

838c0c9

fix(cluster): remove closed channel from channel pool

208441b

test(cluster): fix race condition

675b006

If a new message is send while the channel is beingclosed, then it will use the existing channel and the test fails. So wait until it is closed before sending the new message

deepthidevaki force-pushed the dd-12173-channel-pool branch from a36cd30 to 675b006 Compare April 6, 2023 08:13

style(cluster): refactor to reduce nested if-else

7c7dccc

deepthidevaki marked this pull request as ready for review April 6, 2023 09:54

deepthidevaki requested a review from npepinpe April 6, 2023 09:54

npepinpe approved these changes Apr 11, 2023

View reviewed changes

npepinpe added the benchmark label Apr 11, 2023

ghost merged commit 82d4d91 into main Apr 12, 2023

ghost deleted the dd-12173-channel-pool branch April 12, 2023 08:14

backport-action mentioned this pull request Apr 12, 2023

[Backport stable/8.0] fix(cluster): do no use channel created for a different address #12376

Merged

backport-action mentioned this pull request Apr 12, 2023

[Backport stable/8.1] fix(cluster): do no use channel created for a different address #12377

Merged

backport-action mentioned this pull request Apr 12, 2023

[Backport stable/8.2] fix(cluster): do no use channel created for a different address #12378

Merged

megglos added the version:8.2.2 Marks an issue as being completely or in parts released in 8.2.2 label Apr 17, 2023

lenaschoenburg added the version:8.1.11 Marks an issue as being completely or in parts released in 8.1.11 label Apr 21, 2023

npepinpe removed the benchmark label Apr 27, 2023

remcowesterhoud added the release/8.0.14 label May 3, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster): do no use channel created for a different address #12305

fix(cluster): do no use channel created for a different address #12305

deepthidevaki commented Apr 6, 2023

github-actions bot commented Apr 6, 2023 •

edited

Loading

npepinpe commented Apr 6, 2023

npepinpe left a comment

npepinpe Apr 11, 2023

deepthidevaki Apr 12, 2023

ChrisKujawa Apr 17, 2023

npepinpe Apr 17, 2023

github-actions bot commented Apr 11, 2023

deepthidevaki commented Apr 12, 2023

deepthidevaki commented Apr 12, 2023

ghost commented Apr 12, 2023

backport-action commented Apr 12, 2023

backport-action commented Apr 12, 2023

backport-action commented Apr 12, 2023

fix(cluster): do no use channel created for a different address #12305

fix(cluster): do no use channel created for a different address #12305

Conversation

deepthidevaki commented Apr 6, 2023

Description

Related issues

Definition of Done

github-actions bot commented Apr 6, 2023 • edited Loading

Test Results

npepinpe commented Apr 6, 2023

npepinpe left a comment

Choose a reason for hiding this comment

npepinpe Apr 11, 2023

Choose a reason for hiding this comment

deepthidevaki Apr 12, 2023

Choose a reason for hiding this comment

ChrisKujawa Apr 17, 2023

Choose a reason for hiding this comment

npepinpe Apr 17, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

Setup

Measurement before

Chaos injection

Measurement after

Details

deepthidevaki commented Apr 12, 2023

deepthidevaki commented Apr 12, 2023

ghost commented Apr 12, 2023

backport-action commented Apr 12, 2023

backport-action commented Apr 12, 2023

backport-action commented Apr 12, 2023

github-actions bot commented Apr 6, 2023 •

edited

Loading