-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeebeDbInconsistentException in ColumnFamily DMN_DECISION_REQUIREMENTS #9115
Comments
@oleschoenburg / @npepinpe Please have a look at this one. Both in terms of the bug and whether or not the consistency checks should be enabled |
@npepinpe Which area would you use for data integrity? |
BTW First occurrence was for a paying customer on what looks like a test system. |
Hm, the checks should be enabled only on the trial plan 🤔 |
It is a trial plan |
Then yes, all trial users should get the checks as a means to test it via progressive roll out. The goal is to make sure there's no false positives and no big performance impact before rolling it out to everyone. |
Regarding label, each of them has a description, so pick which one seems to fit best (e.g. |
Looks like someone deployed DMN resources but the decision requirements key already exists. If this were valid, that would mean that |
It is interesting that the error only happens on partition 2 but not on partition 1. I assume that it is related to the distribution of deployments. Having a quick look at |
Based on the logs it looks like maybe the distribution of the deployment timed out, was retried and then fails:
|
I think we could temporarily disable the checks for this one cluster, restart the brokers, wait for the deployment distribution to finish and then re-enable the checks. This would make this cluster healthy again (and not corrupt any data) but not solve the underlying issue so we might see the same exception again soon. |
9121: Prevent duplicate key insertion for DMN r=remcowesterhoud a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> To make sure we keep our data consistent we should make sure we don't store duplicate values into the state. The DMN resources were missing the required checks to prevent this. We would always try to insert the resources, disregarding if it is a duplicate. This change filters out the duplicate records and guarantees we only store the non-duplicates. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #9115 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
9121: Prevent duplicate key insertion for DMN r=remcowesterhoud a=remcowesterhoud ## Description <!-- Please explain the changes you made here. --> To make sure we keep our data consistent we should make sure we don't store duplicate values into the state. The DMN resources were missing the required checks to prevent this. We would always try to insert the resources, disregarding if it is a duplicate. This change filters out the duplicate records and guarantees we only store the non-duplicates. ## Related issues <!-- Which issues are closed by this PR or are related --> closes #9115 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
Cool! Let's do a patch release tomorrow 👍 |
9125: [Backport stable/8.0] fix(broker): do not log transition failure due to term mismatch as error r=deepthidevaki a=github-actions[bot] # Description Backport of #9122 to `stable/8.0`. relates to #9040 9133: [Backport stable/8.0] Prevent duplicate key insertion for DMN r=remcowesterhoud a=github-actions[bot] # Description Backport of #9121 to `stable/8.0`. relates to #9115 Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com> Co-authored-by: Remco Westerhoud <remco@westerhoud.nl>
Apparently versions 8.0.1 and 8.0.2 show the same symptoms. At least https://console.cloud.google.com/errors/detail/CPDM9-CV9Nvk3wE;service=zeebe;time=P7D?project=camunda-cloud-240911 shows that the same exception is thrown on newer versions. |
I saved the state from a 8.0.2 cluster where this has happened again in case it's useful for root-causing: https://drive.google.com/file/d/1EO6a_zBeTR5bJvc-GXYRYasY972_9z7B/view?usp=sharing |
This is most likely related to #9337 It’s interesting this is happening in Zeebe now as I couldn’t find how to reproduce it. But I have found a most likely root cause and will fix it when I’m back from holiday. |
9458: [Backport stable/8.0] Support deploying multiple DMN files at once r=remcowesterhoud a=backport-action # Description Backport of #9432 to `stable/8.0`. relates to camunda/zeebe-process-test#357 #9337 #9115 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl> Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
9458: [Backport stable/8.0] Support deploying multiple DMN files at once r=saig0 a=backport-action # Description Backport of #9432 to `stable/8.0`. relates to camunda/zeebe-process-test#357 #9337 #9115 Co-authored-by: Remco Westerhoud <remco@westerhoud.nl> Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
fixed by #9887 |
Describe the bug
Found in error logs
https://console.cloud.google.com/errors/detail/CPDM9-CV9Nvk3wE;service=zeebe;time=P7D?project=camunda-cloud-240911
https://console.cloud.google.com/errors/detail/CLWTn7vY7pS04QE;service=zeebe;time=P7D?project=camunda-cloud-240911
Expected behavior
Log/Stacktrace
Full Stacktrace
Environment:
The text was updated successfully, but these errors were encountered: