Regression in deploying large payloads #13233

korthout · 2023-06-29T08:17:56Z

We've introduced a regression on deploying large payloads on multi-partition clusters with

Switch Deployment:Create processor to new record distribution behavior #11661

Before that pull request, a deployed resource was written in two follow-up events:

Deployment:CREATED
Process:CREATED (or DecisionRequirements:CREATED depending on resource)

The specialized DeploymentDistribution would take the resource from the Deployment:CREATED to distribute it to the other partitions.

With #11456, a new event CommandDistribution:STARTED was introduced to store the command for distribution. For deployments, this contains the entire Deployment incl. the resource. This event is only appended on multi-partition clusters. But, when it is appended, it further reduces the maximum payload size because the follow-up events reach the MAX_BATCH_SIZE restriction with a lower payload size.

Note The MAX_BATCH_SIZE is a limitation that originated from the MAX_MESSAGE_SIZE configuration setting but nowadays is only defined by the LogStreamBuilder's MAX_FRAGMENT_SIZE: 4MB.

This regression lowers the maximum payload size of deployments from ~2MB down to ~1.4MB.

The regression does not exist on single partition clusters as these do not require the distribution of the deployment.

Suggested solution:

don't write the resource in the Deployment:CREATED event, only in the Process:CREATED and the DecisionRequirements:CREATED events (estimate: x-small)

Of course, we'd need to inform users that we're no longer writing the resource for that event in the Update Guide. As this would be breaking user space. The documentation should clarify that the resource is available in the Process:CREATED and the DecisionRequirements:CREATED instead.

The text was updated successfully, but these errors were encountered:

korthout · 2023-06-29T08:37:54Z

There are several solution ideas:

Allow writing arbitrarily large Raft entries/batches #11513
- Does not seem trivial to implement
- Mostly a ZDP effort
don't write the resource in the Deployment:CREATED event, only in the Process:CREATED and the DecisionRequirements:CREATED events (estimate: x-small)
- would require alignment with Operate and Optimize to make sure they consume the resource from Process:CREATED

korthout · 2023-06-29T11:56:05Z

@sdorokhova Can you verify for me that Operate consumes the Process:CREATED event to get the BPMN model and not the Deployment:CREATED event? Likewise, does it consume the DecisionRequirements:CREATED event for DMN models?

@RomanJRW Can you verify the same as above, but for Optimize?

sdorokhova · 2023-06-29T15:50:03Z

Hi @korthout ,
we read processes from process index and decision requirements from decision-requirements index. You can check here all the indices we're reading from. Is this what you were asking about?

korthout · 2023-06-29T15:59:43Z

@sdorokhova Perfect! That's exactly what I mean. I should've mentioned that it concerns the ES indices zeebe-record-process and zeebe-record-decision-requirements.

I was hoping you weren't reading the BPMN/DMN resources from the zeebe-record-deployment index.

13239: Deployment payload size regression tests r=korthout a=korthout ## Description  Adds a regression test for the maximum deployment payload size currently possible on a 3-partition cluster. This also cleans up the other tests related to the size of the deployment's payload in an effort to clarify what they truly verify: - `CreateLargeDeploymentTest.shouldRejectDeployIfResourceIsTooLarge` - tests that the incoming request's maxMessageSize is tested - is a regression test against #12591 - `DeploymentRejectionTest.shouldRejectDeploymentIfResourceIsTooLarge` - verifies that the deployment command is rejected when its resources fit the max message size but still exceed the batch record size (due to follow-up records) - `CreateDeploymentTest.shouldRejectDeployIfResourceIsTooLarge` - verifies that the client's request is rejected when its resources fit the max message size but still exceed the batch record size (due to follow-up records) > **Note** This pull request explicitly targets `stable/8.2` as all these tests succeed there, but some will not succeed on `main` due to #13233. ## Related issues  relates to #13233 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

megglos · 2023-07-03T08:25:43Z

@korthout do I understand this correctly that this regression is only present in 8.3? And the changes that happened in 8.2 like #11660 have not caused a regression?

korthout · 2023-07-03T08:48:23Z

@megglos Correct, the regression only exists on main and if left unfixed will only be present in the upcoming 8.3. The reason is that deployment distribution has been switched over only for 8.3(-alpha*), not for 8.2.

This test should no longer be ignored when the regression is fixed. See: #13233

13291: [Backport stable/8.1] Deployment payload size regression tests r=remcowesterhoud a=korthout ## Description  Backport of - #13239 There were conflicts, please see the commit messages. I also had to adjust a test case for this specific branch. ## Related issues  relates to #13233 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

13292: [Backport stable/8.0] Deployment payload size regression tests r=remcowesterhoud a=korthout ## Description  Backport of - #13239 There were conflicts, please see the commit messages. This backport misses the commit bdcb0d9 because the file `too_large_process.bpmn` did not exist on this branch. I also had to ignore / remove some test cases because the related bug was not fixed / feature was not implemented on this branch. And I had to adjust a test case for this specific branch. ## Related issues  relates to #13233 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

13290: [Forwardport main] Deployment payload size regression tests r=korthout a=korthout ## Description  Forwardport of - #13239 There were conflicts, please see the commit messages. ## Related issues  relates to #13233 Co-authored-by: Nico Korthout <nico.korthout@camunda.com>

RomanJRW · 2023-07-10T13:32:00Z

Hey @korthout - apologies for slow response, I have had FTO. I can confirm the same for Optimize, that we read from zeebe-record-process and not the deployment index

korthout · 2023-07-11T15:50:33Z

Now that both Operate and Optimize have confirmed that they don't access the resource from the zeebe-record-deployment index, I think it's fine to take this solution:

don't write the resource in the Deployment:CREATED event, only in the Process:CREATED and the DecisionRequirements:CREATED events (estimate: x-small)

Of course, we'd need to inform users that we're no longer writing the resource for that event in the Update Guide. As this would be breaking user space. The documentation should clarify that the resource is available in the Process:CREATED and the DecisionRequirements:CREATED instead.

related to 13140

korthout mentioned this issue Jun 29, 2023

[EPIC] Generalized command distribution #11456

Closed

3 tasks

korthout self-assigned this Jun 29, 2023

korthout mentioned this issue Jun 29, 2023

Deployment payload size regression tests #13239

Merged

14 tasks

ChrisKujawa added the version:8.2.8 Marks an issue as being completely or in parts released in 8.2.8 label Jul 3, 2023

This was referenced Jul 3, 2023

[Forwardport main] Deployment payload size regression tests #13290

Merged

[Backport stable/8.1] Deployment payload size regression tests #13291

Merged

[Backport stable/8.0] Deployment payload size regression tests #13292

Merged

korthout added a commit that referenced this issue Jul 3, 2023

test: ignore test that highlights regression

b275ef0

This test should no longer be ignored when the regression is fixed. See: #13233

korthout added a commit that referenced this issue Jul 4, 2023

test: ignore test that highlights regression

0c8c446

This test should no longer be ignored when the regression is fixed. See: #13233

ChrisKujawa added the version:8.1.14 Marks an issue as being completely or in parts released in 8.1.14 label Jul 6, 2023

korthout added good first issue Marks an issue as simple enough for first time contributors onboarding labels Jul 11, 2023

nicpuppa self-assigned this Jul 13, 2023

nicpuppa unassigned korthout Jul 13, 2023

korthout mentioned this issue Jul 19, 2023

When submitting a payload lower than the configured maxMessageSize, the request fails with EXCEEDED_BATCH_RECORD_SIZE #12873

Open

ghost closed this as completed in 269dd99 Jul 20, 2023

megglos added the version:8.3.0-alpha4 Marks an issue as being completely or in parts released in 8.3.0-alpha4 label Aug 2, 2023

megglos added the version:8.3.0 Marks an issue as being completely or in parts released in 8.3.0 label Oct 5, 2023

github-merge-queue bot pushed a commit that referenced this issue Aug 9, 2024

ci: fix broken assign dri command workflow (#13233)

d70fe89

related to 13140

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in deploying large payloads #13233

Regression in deploying large payloads #13233

korthout commented Jun 29, 2023 •

edited

Loading

korthout commented Jun 29, 2023 •

edited

Loading

korthout commented Jun 29, 2023

sdorokhova commented Jun 29, 2023

korthout commented Jun 29, 2023

megglos commented Jul 3, 2023 •

edited

Loading

korthout commented Jul 3, 2023

RomanJRW commented Jul 10, 2023

korthout commented Jul 11, 2023

Regression in deploying large payloads #13233

Regression in deploying large payloads #13233

Comments

korthout commented Jun 29, 2023 • edited Loading

korthout commented Jun 29, 2023 • edited Loading

korthout commented Jun 29, 2023

sdorokhova commented Jun 29, 2023

korthout commented Jun 29, 2023

megglos commented Jul 3, 2023 • edited Loading

korthout commented Jul 3, 2023

RomanJRW commented Jul 10, 2023

korthout commented Jul 11, 2023

korthout commented Jun 29, 2023 •

edited

Loading

korthout commented Jun 29, 2023 •

edited

Loading

megglos commented Jul 3, 2023 •

edited

Loading