-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stable/8.0] Allow retried DMN resource distribution #9883
Conversation
@oleschoenburg Do we want to forward port this to Note, we can always retro-actively forward port this. |
A critical bug was found related to redistributing DMN resources. When the distribution isn't acknowledged, it is retried until it is. This retry also occurs on the restart of the deployment partition, if the deployment distribution hasn't been completed for that partition yet. This retry results in an additional distribute deployment command on the other partition. Once processed it writes the DISTRIBUTED event which when applied inserts the decision and drg into the state. The second time the event is written again and applied again, in which case the insert fails due to consistency checks of the insert method. This test doesn't check the implementation of the insert (using the insert method), but checks the behavior. If the insert fails then it also doesn't write the DISTRIBUTED event, as the exception will make sure the entire transaction is rolled back. This test verifies that the DISTRIBUTED event is correctly written to the log and so without errors. It actually checks that this event is written twice, after the command was written twice. After this we can be sure that the retried distribution of the dmn resource in the deployment was succesfully processed. Note that this solution differs from the alternative: rejecting the retried DISTRIBUTE command if the resources are already known. However, this solution is more complex and not suitable for patching a critical bug. We can have a look at that solution in future versions. Co-authored-by: Nico korthout <nico.korthout@camunda.com>
This fixes critical a critical bug, where a decision and/or drg can be inserted twice due to retry logic for inter-partition distribution of deployments. To counter act, this allows an overwrite of existing values. See #9877 and DeploymentClusteredTest.shouldDistributeDmnResourceOnRetry.
db84c51
to
0e60752
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @korthout ⛑️
Nice that you found a way to write this test cleanly 🙈
I'd say we can cherry-pick this on main, there shouldn't be any conflicts.
bors merge |
Successfully created backport PR #9887 for |
9885: Support logging partition id in compact record logger r=remcowesterhoud a=korthout ## Description <!-- Please explain the changes you made here. --> The compact record logger could already log the encoded partition id of keys when multiple partitions were used, but sometimes we write records on other partitions without a key (-1), or we write records on other partitions with a key that encodes a different partition. This PR adds support for printing the partition id at the start of each log line. The partition id is omitted if the records don't contain any reference to other partitions (neither `partitionId` nor encoded in the `key`). I ran into this while working on a test case for #9877. ## Related issues <!-- Which issues are closed by this PR or are related --> relates #9877 9887: [Backport main] [stable/8.0] Allow retried DMN resource distribution r=korthout a=github-actions[bot] # Description Backport of #9883 to `main`. relates to #9877 Co-authored-by: Nico Korthout <nico.korthout@camunda.com> Co-authored-by: Ole Schönburg <ole.schoenburg@gmail.com>
Description
This patches a critical bug related to the retry mechanism of deployment distribution. If the retried deployment distribution contains a DMN resource, then this would trigger the consistency checks.
This patch is only available for
stable/8.0
, as we want to provide a different solution onmain
. See #9877 (comment).Related issues
closes #9877
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/1.3
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation:
Please refer to our review guidelines.