From 51eed4933f05d3e6efa3778c39766dd865f2cd4f Mon Sep 17 00:00:00 2001 From: Christopher Zell Date: Thu, 23 Feb 2023 14:43:32 +0100 Subject: [PATCH] docs: fix writing --- .../index.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/chaos-days/blog/2023-02-23-Recursive-call-activity/index.md b/chaos-days/blog/2023-02-23-Recursive-call-activity/index.md index a130c2a6e..bb0e864b1 100644 --- a/chaos-days/blog/2023-02-23-Recursive-call-activity/index.md +++ b/chaos-days/blog/2023-02-23-Recursive-call-activity/index.md @@ -15,31 +15,31 @@ authors: zell Long time no see. Happy to do my first chaos day this year. In the last week have implemented interesting features, which I would like to experiment with. [Batch processing](https://github.com/camunda/zeebe/issues/11416) was one of them. -**TL;DR;** Chaos experiment failed. :boom: Batch processing doesn't seem to respect the configured limit, which cause issues on processing and influences health of the system. We found a bug :muscle: +**TL;DR;** Chaos experiment failed. :boom: Batch processing doesn't seem to respect the configured limit, which causes issues with processing and influences the health of the system. We found a bug :muscle: ## Chaos Experiment -In today's chaos experiment we want to experiment with [Batch processing](https://github.com/camunda/zeebe/issues/11416) and how it can handle error conditions, like deploying an endless recursive process-model. +In today's chaos experiment, we want to experiment with [Batch processing](https://github.com/camunda/zeebe/issues/11416) and how it can handle error conditions, like deploying an endless recursive process model. ![recursive process](call.png) ### Expected -When we deploy such process model, and create an instance of it, we expect that the execution is done endless. In normal process models with batch processing the execution of a process instance is done until a wait-state is reached. In this process model there exist no wait state. In order to handle such case, we have implemented a batch limit, which can be configured via [maxCommandsInBatch](https://github.com/camunda/zeebe/blob/main/dist/src/main/config/broker.standalone.yaml.template#L695). This configuration is by default set to 100 commands. Meaning the stream processor will process 100 commands until it stops, to make room for other things. +When we deploy such a process model and create an instance of it, we expect that the execution is done endlessly. In normal process models with batch processing, the execution of a process instance is done until a wait state is reached. In this process model, there exists no wait state. To handle such cases, we have implemented a batch limit, which can be configured via [maxCommandsInBatch](https://github.com/camunda/zeebe/blob/main/dist/src/main/config/broker.standalone.yaml.template#L695). This configuration is by default set to 100 commands. Meaning the stream processor will process 100 commands until it stops, to make room for other things. -We expect that our limit handling steps in during the execution and we are able to execute also other instances or, cancel the problematic process instance. Furthermore, we expect to stay healthy, we should be able to update our health check continuously. +We expect that our limit handling steps in during the execution and we can execute also other instances or, cancel the problematic process instance. Furthermore, we expect to stay healthy, we should be able to update our health check continuously. ### Actual -Before we can start with our experiment we need to start our benchmark Zeebe cluster. This has become easier now, since I have written the last post. Previously we had to use the scripts and Makefile in the [zeebe/benchmark sub-directory](https://github.com/camunda/zeebe/tree/main/benchmarks/setup). +Before we can start with our experiment we need to start our benchmark Zeebe cluster. This has become easier now since I have written the last post. Previously we had to use the scripts and Makefile in the [zeebe/benchmark sub-directory](https://github.com/camunda/zeebe/tree/main/benchmarks/setup). -We have now provide new [Benchmark Helm charts](https://github.com/zeebe-io/benchmark-helm), based on our Camunda Platform Helm charts. They allow us to deploy a new zeebe benchmark setup via: +We have now provided new [Benchmark Helm charts](https://github.com/zeebe-io/benchmark-helm), based on our Camunda Platform Helm charts. They allow us to deploy a new zeebe benchmark setup via: ```shell kubectl create namespace zell-chaos # create a new namespace -kubens zell-chaos # change context to new namespace +kubens zell-chaos # change context to a new namespace # deploy zeebe benchmark cluster - without starter and worker helm install zell-chaos \ zeebe-benchmark/zeebe-benchmark \ @@ -48,7 +48,7 @@ helm install zell-chaos \ ``` -In order to deploy the model we can use [zbchaos v1.0.0](https://github.com/zeebe-io/zeebe-chaos/releases/tag/zbchaos-v1.0.0). +To deploy the model we can use [zbchaos v1.0.0](https://github.com/zeebe-io/zeebe-chaos/releases/tag/zbchaos-v1.0.0). ```shell @@ -57,14 +57,14 @@ $ zbchaos deploy process --processModelPath call.bpmn Deployed given process model call.bpmn, under key 2251799813685249! ``` -*Note: Looks like we have some left-over debug log, which we should remove.* +*Note: Looks like we have some left-over debug logs, which we should remove.* To create an instance we can use: ```shell $ zbchaos verify instance-creation --bpmnProcessId super {1 LEADER -1 10 msg false 1 LEADER -1 2 LEADER -1 1677157569058 false false true false false 30 false -1 super 30 } -The steady-state was successfully verified! +The steady state was successfully verified! ``` After creating the instance we can observe the behavior of the Zeebe via [grafana](https://grafana.dev.zeebe.io/). @@ -75,7 +75,7 @@ We can see that the processing starts immediately quite high and is continuously **We have two instances running, one on partition three and one on partition one.** -_One interesting fact is that the topology request rate is also up to 0.400 per second, so potentially every 2.5 second we send an topology request to the gateway. But there is no application deployed which does this. [I have recently found out again](https://github.com/camunda/zeebe/pull/11599#discussion_r1109846523), that we have the Zeebe client usage in the gateway to request the topology. Might be worth to investigate whether this is an issue._ +_One interesting fact is that the topology request rate is also up to 0.400 per second, so potentially every 2.5 seconds we send a topology request to the gateway. But there is no application deployed that does this. [I have recently found out again](https://github.com/camunda/zeebe/pull/11599#discussion_r1109846523), that we have the Zeebe client usage in the gateway to request the topology. Might be worth investigating whether this is an issue._ After observing this cluster for a while we can see that after around five minutes the cluster fails. The processing for the partitions breaks down to 1/10 of what was processed before. A bit later it looks like it tries to come back but, failed again. @@ -90,18 +90,18 @@ Partition-1 failed, marking it as unhealthy: Broker-2{status=HEALTHY} Detected 'UNHEALTHY' components. The current health status of components: [Partition-2{status=HEALTHY}, Partition-1{status=UNHEALTHY, issue=HealthIssue[message=null, throwable=null, cause=Broker-2-StreamProcessor-1{status=UNHEALTHY, issue=HealthIssue[message=actor appears blocked, throwable=null, cause=null]}]}, Partition-3{status=HEALTHY}] ``` -Interesting insights we can get in our new Batch processing metrics. We see that at the beginning we use our limit of 100 commands per batch, but soon was we start with the recursion we use an enormous high batch processing command count. +Interesting insights we can get in our new Batch processing metrics. We see that at the beginning we use our limit of 100 commands per batch, but soon as we start with the recursion we use an enormous high batch processing command count. ![fail-batchprocessing.png](fail-batchprocessing.png) -The new sequence metric show similar results, so there must be a problem with not respecting the limit. +The new sequence metric shows similar results, so there must be a problem with not respecting the limit. ![sequencer](sequencer.png) -With this I mark this chaos experiment as failed. We need to investigate this further and fix the related issue.:boom: +With this, I mark this chaos experiment as failed. We need to investigate this further and fix the related issue.:boom: ## Found Bugs * zbchaos logs debug message on normal usage -* Every 2.5 second we send an topology request, which is shown in the metrics +* Every 2.5 seconds we send a topology request, which is shown in the metrics * Batch processing doesn't respect the limit