-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad changeset due to untouched reciprocal history entry #7893
Comments
➤ On 2024-03-13, danieltabacaru commented: I've tried reproducing the bug with no success unfortunately. I inspected the realm and it indeed contains the right data it receives at T2. I then used the history command tool to inspect the reciprocal history: the entry is empty as if clear didn't discard the insertion! It seems to be an instance of #6367, but the fix for it should be in the SDK you're using. Can you confirm there is no other change sent by the server that updates intListField? This being said, your soak tests use core 13.14.0 which is 9 months old. You could try using a newer version. |
➤ On 2024-03-15, danieltabacaru commented: [~sean.brandenburg@mongodb.com] Is there any way I could run this test locally and try to reproduce it? |
➤ On 2024-03-18, Sean Brandenburg commented: The soak tests can be run locally, but I don't know if there's a seeded way to re-run these tests (and I think generally they're not very reproducible since there's a variety of timing related things that can be different between runs) I took a look at the history dump and it does look like the only changes that made it into the merge window with the local changes are the ones detailed in the ticket (the clear followed by some inserts and the next insert that triggers the prior size mismatch) I've just attached relevant server history for this object to this ticket. We can upgrade our sdk version for the soak tests and keep and eye on if this recurs, but it does seem like it could be related to the ticket the fix you've linked here |
➤ On 2024-03-18, danieltabacaru commented: Even if it's not always reproducible, it could be valuable to run them locally (that is if I can't figure out what's going on otherwise) |
➤ On 2024-03-18, Sean Brandenburg commented: Got it, yeah I've admittedly never run them locally and it doesn't look like we have any docs on doing so, but it should be possible (though not super simple) to run them by following the procedure we use to set up the soak tests in the "local" env in this evergreen file. The most straightforward thing to do would probably be to run the local variant of these tests on evergreen (PBS-Local-Fast) and sub in your own build of the realm sync state binary here (if your goal is to get more verbose/custom logging) |
➤ On 2024-06-13, Sean Brandenburg commented: [~daniel.tabacaru@mongodb.com] As an FYI, we upgraded our soak tests to use the most recent JS SDK and are still hitting this issue |
➤ On 2024-06-13, danieltabacaru commented: thanks for the heads up (didn't really expect to go away). just a wild guess, are compensating writes in any way involved? either way, could we try to write a simpler test to repro it? I didn't have any luck with client tests. |
➤ On 2024-06-26, Kiro Morkos commented: [~daniel.tabacaru@mongodb.com] no compensating writes (this is happening in PBS which doesn't have compensating writes, and the QBS variants that run into this all have open permissions+truepredicate queries, so no compensating writes there either). We can try reproducing in a simpler test. If you have a hunch of where things might be going wrong that would help us focus in on which operations might be problematic and should be reproduced. |
➤ On 2024-06-27, danieltabacaru commented: [~sean.brandenburg@mongodb.com] The history_dump does not seem to contain the changes at T1. Any chance we can recover those? |
➤ On 2024-07-08, Sean Brandenburg commented: [~daniel.tabacaru@mongodb.com] I was having trouble getting the tooling for decoding the raw changeset working, but here's the raw history entry for CV 4996 at T1 {
"_id": {
"$oid": "65c629e0298546c714dc17a7"
},
"appId": {
"$oid": "65c626aec6dd2a420f9a8ce4"
},
"appPartitionId": {
"$oid": "65c626b9908fcdd4021b995e"
},
"timestamp": {
"$numberLong": "1707485664"
},
"originFileIdent": {
"$numberLong": "3"
},
"version": {
"$numberLong": "6779"
},
"version_salt": {
"$numberLong": "3112504645168333980"
},
"changeset": {
"$binary": {
"base64": "NDk5NiA2Nzc5IDI4NzQxNTIwNjY1NSAzIDE3NyA/AAxjb2xsUmVxdWlyZWQ/AQ1saW5rTGlzdEZpZWxkPwIKaW50TGlzdEZsZD8DEmNvbGxSZXF1aXJlZExpbmtlZD8EBV9wS2V5CgAK1bW9jQnmioAnq9XFAQECBQoACsxX9H9YB0/dnaGiaQEBFyAEAApfYsfOjCoEtHk3zgkCARYBm8L7jIvM6QwbBAMKvwICgu+KgwqULk7oBAADEmhpd3F3bHVva3lnZXR3a2ZtagA=",
"subType": "00"
}
},
"history_size": {
"$numberLong": "9695159"
},
"clientVersion": {
"$numberLong": "4996"
},
"workerId": "baas-sync-57b4949f4-tpdxd"
} |
➤ On 2024-07-09, danieltabacaru commented: [~sean.brandenburg@mongodb.com] Thanks. |
➤ On 2024-07-09, Sean Brandenburg commented: Ah yep, you're right. I was looking at the fileIdent we were merging against. Here's the entry for fileIdent 1. {
"_id": {
"$oid": "65c62a35c142b06dd099c9c5"
},
"appId": {
"$oid": "65c626aec6dd2a420f9a8ce4"
},
"appPartitionId": {
"$oid": "65c626b9908fcdd4021b995e"
},
"timestamp": {
"$numberLong": "1707485749"
},
"originFileIdent": {
"$numberLong": "1"
},
"version": {
"$numberLong": "7440"
},
"version_salt": {
"$numberLong": "2015044307439901013"
},
"changeset": {
"$binary": {
"base64": "",
"subType": "00"
}
},
"history_size": {
"$numberLong": "10510097"
},
"clientVersion": {
"$numberLong": "4996"
},
"workerId": "baas-sync-57b4949f4-45qvk"
} Also this reproduces 100% of the time in the qbsv2 soak tests so Kiro and I are going to take a look at a fresh run today and will update here with any findings |
➤ On 2024-07-10, Sean Brandenburg commented: [~daniel.tabacaru@mongodb.com] Kiro and I were able to get a repro with the apply to state tool working (In QBS). I've attached the client logs with parsed instructions and a file containing the messages for the repro [^device_2.parsed.log] [^repro_messages.txt] |
➤ On 2024-07-11, Kiro Morkos commented: I was able to narrow down the repro even further (only 5 downloads + 2 uploads) - [^messages_668f2ef6e8ed03bf530574a7.txt] Taking a look at the contents of mainline vs reciprocal history in the realm file shows the bug pretty clearly I think: [ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681 ]
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset
# Version 6
InternStrings 0="collRequired", 1="intListFld"
Update path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=14
# Version 7
InternStrings 0="collRequired", 1="intListFld"
Update path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681 ]
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset -r 0
# Version 6
InternStrings 0="collRequired", 1="intListFld"
Update path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=12
# Version 7
InternStrings 0="collRequired", 1="intListFld"
Update path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681 ]
> Note that the prior size of the first local change is properly adjusted down, but the prior size of the second one is not. However, during the first merge it appears that the in-memory changeset is properly merged against based on the log, the dirty reciprocal history is just not persisted it seems. This explains why in every repro scenario it's always the second incoming download that hits the issue. Output of the repro for posterity - > realm-apply-to-state -r $(mktemp realmXXX) -i ./messages_668f2ef6e8ed03bf530574a7.txt --verbose
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1, client: 0} upload: {server: 0, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1, clientVersion: 0, origin: 666 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1102, client: 0} upload: {server: 1102, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1102, clientVersion: 0, origin: 7 [changeset with 13 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1146, client: 0} upload: {server: 1146, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1146, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - found upload changeset: 1146 8 300550592903 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 6
Realm.Sync.Client.Changeset - found upload changeset: 1146 9 300550593403 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 7
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1183, client: 0} upload: {server: 1183, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1183, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm - Scanning incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 2 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Finished transforming 2 local changesets through 1 incoming changesets (2 vs 2 instructions, in 1 conflict groups)
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1189, client: 0} upload: {server: 1189, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1189, clientVersion: 0, origin: 4 [changeset with 1 instructions]
Realm - Scanning incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 1 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
terminate called after throwing an instance of 'realm::sync::IntegrationException'
what(): Failed to transform received changeset: Merge error: left.prior_size == right.prior_size (instruction target: collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[11]). Please contact support.
Aborted |
➤ On 2024-07-12, danieltabacaru commented: Thanks a lot guys! This is very helpful. I hope to get to the bottom of it now. I'll keep you in the loop. |
➤ PM Bot commented: Jira ticket: RCORE-1977 |
➤ danieltabacaru commented: I managed to repro it and a fix is on its way. The issue is due to a subtle bug when saving empty reciprocal changesets (while at least one other non-empty reciprocal changeset at a previous version needs to be saved too). Once again, thanks a lot guys. |
➤ Kiro Morkos commented: Nice!! Good to hear our theory was pretty close! The messages I shared are being applied properly when I run the tool, but it's possible I'm using a different version. It seems like you already have your own repro now, but if ours would still be useful I can look into it. Let me know |
➤ danieltabacaru commented: No need to look further into it. The bug manifests before applying that download message. It was mostly to confirm if the assertion failure is expected or not. There are cases when there will be no crash, but with an empty realm there is no way the local client version will get to 360 before applying that message. |
➤ Kiro Morkos commented: Ah okay yea you're right, the header on that last changeset is incorrect, but the bug is manifesting before the invariant triggers an assertion failure. |
Our soak tests ran into a bad changeset in PBS performing the following series of events
T1: client version 4996: local ArrayInsert to obj.intListField
T2: client receives SV 6658: Clear obj.intListField
T3: client receive SV 6723: ArrayInsert to obj.list
Based on this it looks like the client is running OT and getting the correct output (as it gets applied to state properly), but the OT output is not being reflected in reciprocal history
EVG Task
device_4.realm
anddevice_4.log
insubsoakers.tgz
attached to this evergreen taskintListField
Relevant changesets from server history dump (Dump can be
mongorestore
d from __realm_sync.gz)The text was updated successfully, but these errors were encountered: