Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad changeset due to untouched reciprocal history entry #7893

Closed
sync-by-unito bot opened this issue Jul 16, 2024 · 20 comments · Fixed by #7955
Closed

Bad changeset due to untouched reciprocal history entry #7893

sync-by-unito bot opened this issue Jul 16, 2024 · 20 comments · Fixed by #7955
Assignees

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Jul 16, 2024

Our soak tests ran into a bad changeset in PBS performing the following series of events

T1: client version 4996: local ArrayInsert to obj.intListField

T2: client receives SV 6658: Clear obj.intListField

  • Expectation is that ArrayInsert @ 4996 is discarded when it merges against the Clear
  • Reciprocal history does not reflect this update, but realm data does

T3: client receive SV 6723: ArrayInsert to obj.list

  • Results in a bad changeset due to a prior size mismatch
  • Theory is because reciprocal history @ 4996 does not reflect the discarded ArrayInsert

Based on this it looks like the client is running OT and getting the correct output (as it gets applied to state properly), but the OT output is not being reflected in reciprocal history

EVG Task

  • the relevant files for this repro are device_4.realm and device_4.log in subsoakers.tgz attached to this evergreen task
  • Object Primary Key: oid("0ca58574b6f0f0572e")
  • Array Field: intListField

Relevant changesets from server history dump (Dump can be mongorestored from __realm_sync.gz)

// Changeset that the local change at T1 merged against
{ timestamp: 287415144155, _id: 65c629de298546c714dc15e2, createdAt: '2024-02-09 13:34:22 +0000 UTC', fileIdent: 3, clientVersion: 4736, serverVersion: 6658, instructions: [
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="_pKey"), payload: "hiwqwluokygetwkfmj", isDefault: false, priorSize: 0 },
         Clear { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld"), priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.0"), payload: 3255124513315405, priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.1"), payload: 1152914988180245, priorSize: 1 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.2"), payload: 8652981199148631, priorSize: 2 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.3"), payload: 5257082114615562, priorSize: 3 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.4"), payload: 4962132771478012, priorSize: 4 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.5"), payload: 610258687391357, priorSize: 5 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.6"), payload: 8030180275848615, priorSize: 6 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.7"), payload: 2692238070770941, priorSize: 7 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.8"), payload: 8389870175223855, priorSize: 8 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.9"), payload: 2550864430468186, priorSize: 9 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.10"), payload: 2448326381062568, priorSize: 10 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 3920266420108281, priorSize: 11 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.12"), payload: 4748114750730776, priorSize: 12 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.13"), payload: 5290562076489202, priorSize: 13 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.14"), payload: 4763018075988447, priorSize: 14 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.15"), payload: 5558702512205115, priorSize: 15 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.16"), payload: 1470857233029320, priorSize: 16 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.17"), payload: 2414691363089715, priorSize: 17 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.18"), payload: 6136840945697697, priorSize: 18 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.19"), payload: 6240611872508867, priorSize: 19 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.20"), payload: 696600421386660, priorSize: 20 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.21"), payload: 3759166984809251, priorSize: 21 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.22"), payload: 6200379315555408, priorSize: 22 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.23"), payload: 3740818814444825, priorSize: 23 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.24"), payload: 5052040814781635, priorSize: 24 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a
205"), fullPath="intListFld.25"), payload: 8632950791858863, priorSize: 25 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.26"), payload: 268386300614666, priorSize: 26 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.27"), payload: 7300284751995718, priorSize: 27 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.28"), payload: 4891235888437641, priorSize: 28 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.29"), payload: 4931806891077047, priorSize: 29 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.30"), payload: 59838987277831, priorSize: 30 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.31"), payload: 1391073917123324, priorSize: 31 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.32"), payload: 6926012470475492, priorSize: 32 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.33"), payload: 2566972138449680, priorSize: 33 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.34"), payload: 8381966378538948, priorSize: 34 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.35"), payload: 5199522972991873, priorSize: 35 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.36"), payload: 3893666273634272, priorSize: 36 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.37"), payload: 7564985999791688, priorSize: 37 },
intListFld.26"), payload: 268386300614666, priorSize: 26 },
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="_pKey"), payload: "hiwqwluokygetwkfmj", isDefault: false, priorSize: 0 },
         Clear { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld"), priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.0"), payload: 3255124513315405, priorSize: 0 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.1"), payload: 1152914988180245, priorSize: 1 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.2"), payload: 8652981199148631, priorSize: 2 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.3"), payload: 5257082114615562, priorSize: 3 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.4"), payload: 4962132771478012, priorSize: 4 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.5"), payload: 610258687391357, priorSize: 5 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.6"), payload: 8030180275848615, priorSize: 6 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.7"), payload: 2692238070770941, priorSize: 7 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.8"), payload: 8389870175223855, priorSize: 8 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.9"), payload: 2550864430468186, priorSize: 9 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.10"), payload: 2448326381062568, priorSize: 10 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 3920266420108281, priorSize: 11 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.12"), payload: 4748114750730776, priorSize: 12 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.13"), payload: 5290562076489202, priorSize: 13 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.14"), payload: 4763018075988447, priorSize: 14 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.15"), payload: 5558702512205115, priorSize: 15 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.16"), payload: 1470857233029320, priorSize: 16 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.17"), payload: 2414691363089715, priorSize: 17 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.18"), payload: 6136840945697697, priorSize: 18 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.19"), payload: 6240611872508867, priorSize: 19 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.20"), payload: 696600421386660, priorSize: 20 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.21"), payload: 3759166984809251, priorSize: 21 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.22"), payload: 6200379315555408, priorSize: 22 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.23"), payload: 3740818814444825, priorSize: 23 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.24"), payload: 5052040814781635, priorSize: 24 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.25"), payload: 8632950791858863, priorSize: 25 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.26"), payload: 268386300614666, priorSize: 26 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.27"), payload: 7300284751995718, priorSize: 27 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.28"), payload: 4891235888437641, priorSize: 28 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.29"), payload: 4931806891077047, priorSize: 29 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.30"), payload: 59838987277831, priorSize: 30 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.31"), payload: 1391073917123324, priorSize: 31 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.32"), payload: 6926012470475492, priorSize: 32 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.33"), payload: 2566972138449680, priorSize: 33 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.34"), payload: 8381966378538948, priorSize: 34 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.35"), payload: 5199522972991873, priorSize: 35 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.36"), payload: 3893666273634272, priorSize: 36 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.37"), payload: 7564985999791688, priorSize: 37 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.38"), payload: 8428617618974834, priorSize: 38 },
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.39"), payload: 4842348580420880, priorSize: 39 },
         Update { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="linkField"), payload: RealmLink(table: "collRequiredLinked", key: ObjectID("01bfab61b060cd39cd577edf")), isDefault: false, priorSize: 0 },
  ]
}

// Changeset that triggered integration error
{ timestamp: 287415176158, _id: 65c629df298546c714dc16d1, createdAt: '2024-02-09 13:34:23 +0000 UTC', fileIdent: 3, clientVersion: 4873, serverVersion: 6723, instructions: [
         ArrayInsert { (table="collRequired", ID=ObjectID("7305fdbf077271adc079a205"), fullPath="intListFld.11"), payload: 7637604215384347, priorSize: 40 },
  ]
} 
Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-03-13, danieltabacaru commented:

I've tried reproducing the bug with no success unfortunately. I inspected the realm and it indeed contains the right data it receives at T2. I then used the history command tool to inspect the reciprocal history: the entry is empty as if clear didn't discard the insertion! It seems to be an instance of #6367, but the fix for it should be in the SDK you're using.

Can you confirm there is no other change sent by the server that updates intListField?

This being said, your soak tests use core 13.14.0 which is 9 months old. You could try using a newer version.

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-03-15, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] Is there any way I could run this test locally and try to reproduce it?

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-03-18, Sean Brandenburg commented:

The soak tests can be run locally, but I don't know if there's a seeded way to re-run these tests (and I think generally they're not very reproducible since there's a variety of timing related things that can be different between runs)

I took a look at the history dump and it does look like the only changes that made it into the merge window with the local changes are the ones detailed in the ticket (the clear followed by some inserts and the next insert that triggers the prior size mismatch)

I've just attached relevant server history for this object to this ticket.

We can upgrade our sdk version for the soak tests and keep and eye on if this recurs, but it does seem like it could be related to the ticket the fix you've linked here

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-03-18, danieltabacaru commented:

Even if it's not always reproducible, it could be valuable to run them locally (that is if I can't figure out what's going on otherwise)

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-03-18, Sean Brandenburg commented:

Got it, yeah I've admittedly never run them locally and it doesn't look like we have any docs on doing so, but it should be possible (though not super simple) to run them by following the procedure we use to set up the soak tests in the "local" env in this evergreen file.

The most straightforward thing to do would probably be to run the local variant of these tests on evergreen (PBS-Local-Fast) and sub in your own build of the realm sync state binary here (if your goal is to get more verbose/custom logging)

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-06-13, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] As an FYI, we upgraded our soak tests to use the most recent JS SDK and are still hitting this issue

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-06-13, danieltabacaru commented:

thanks for the heads up (didn't really expect to go away). just a wild guess, are compensating writes in any way involved? either way, could we try to write a simpler test to repro it? I didn't have any luck with client tests.

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-06-26, Kiro Morkos commented:

[~daniel.tabacaru@mongodb.com] no compensating writes (this is happening in PBS which doesn't have compensating writes, and the QBS variants that run into this all have open permissions+truepredicate queries, so no compensating writes there either).

We can try reproducing in a simpler test. If you have a hunch of where things might be going wrong that would help us focus in on which operations might be problematic and should be reproduced.

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-06-27, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] The history_dump does not seem to contain the changes at T1. Any chance we can recover those?

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-08, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] I was having trouble getting the tooling for decoding the raw changeset working, but here's the raw history entry for CV 4996 at T1

{
  "_id": {
    "$oid": "65c629e0298546c714dc17a7"
  },
  "appId": {
    "$oid": "65c626aec6dd2a420f9a8ce4"
  },
  "appPartitionId": {
    "$oid": "65c626b9908fcdd4021b995e"
  },
  "timestamp": {
    "$numberLong": "1707485664"
  },
  "originFileIdent": {
    "$numberLong": "3"
  },
  "version": {
    "$numberLong": "6779"
  },
  "version_salt": {
    "$numberLong": "3112504645168333980"
  },
  "changeset": {
    "$binary": {
      "base64": "NDk5NiA2Nzc5IDI4NzQxNTIwNjY1NSAzIDE3NyA/AAxjb2xsUmVxdWlyZWQ/AQ1saW5rTGlzdEZpZWxkPwIKaW50TGlzdEZsZD8DEmNvbGxSZXF1aXJlZExpbmtlZD8EBV9wS2V5CgAK1bW9jQnmioAnq9XFAQECBQoACsxX9H9YB0/dnaGiaQEBFyAEAApfYsfOjCoEtHk3zgkCARYBm8L7jIvM6QwbBAMKvwICgu+KgwqULk7oBAADEmhpd3F3bHVva3lnZXR3a2ZtagA=",
      "subType": "00"
    }
  },
  "history_size": {
    "$numberLong": "9695159"
  },
  "clientVersion": {
    "$numberLong": "4996"
  },
  "workerId": "baas-sync-57b4949f4-tpdxd"
} 

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-09, danieltabacaru commented:

[~sean.brandenburg@mongodb.com] Thanks.
Unfortunately I didn't have much like either decoding the changeset. Shouldn't the file ident be different than 3 (checking the log files I think it should be 1)?

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-09, Sean Brandenburg commented:

Ah yep, you're right. I was looking at the fileIdent we were merging against. Here's the entry for fileIdent 1.

{
  "_id": {
    "$oid": "65c62a35c142b06dd099c9c5"
  },
  "appId": {
    "$oid": "65c626aec6dd2a420f9a8ce4"
  },
  "appPartitionId": {
    "$oid": "65c626b9908fcdd4021b995e"
  },
  "timestamp": {
    "$numberLong": "1707485749"
  },
  "originFileIdent": {
    "$numberLong": "1"
  },
  "version": {
    "$numberLong": "7440"
  },
  "version_salt": {
    "$numberLong": "2015044307439901013"
  },
  "changeset": {
    "$binary": {
      "base64": "",
      "subType": "00"
    }
  },
  "history_size": {
    "$numberLong": "10510097"
  },
  "clientVersion": {
    "$numberLong": "4996"
  },
  "workerId": "baas-sync-57b4949f4-45qvk"
} 

Also this reproduces 100% of the time in the qbsv2 soak tests so Kiro and I are going to take a look at a fresh run today and will update here with any findings

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-10, Sean Brandenburg commented:

[~daniel.tabacaru@mongodb.com] Kiro and I were able to get a repro with the apply to state tool working (In QBS). I've attached the client logs with parsed instructions and a file containing the messages for the repro

[^device_2.parsed.log]

[^repro_messages.txt]

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-11, Kiro Morkos commented:

I was able to narrow down the repro even further (only 5 downloads + 2 uploads) - [^messages_668f2ef6e8ed03bf530574a7.txt]

Taking a look at the contents of mainline vs reciprocal history in the realm file shows the bug pretty clearly I think:

[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset
# Version 6
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=14
# Version 7
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings   
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> ~/dev/realm-core/build/src/realm/sync/tools/realm-hist-dbg ./gotest_668f2ef6e8ed03bf530574a6 -V -f changeset -r 0
# Version 6
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[3], value=Int(2923615466627780), prior_size=12
# Version 7
InternStrings   0="collRequired", 1="intListFld"
Update          path=collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[5], value=Int(5405916949283399), prior_size=14
# Version 8
InternStrings   
3 history entries (version 5 -> 8)
[ 09:09 PM kmorkos@W-PW0959CH: /tmp/gotest_1651888681  ] 
> 

Note that the prior size of the first local change is properly adjusted down, but the prior size of the second one is not. However, during the first merge it appears that the in-memory changeset is properly merged against based on the log, the dirty reciprocal history is just not persisted it seems. This explains why in every repro scenario it's always the second incoming download that hits the issue.

Output of the repro for posterity -

> realm-apply-to-state -r $(mktemp realmXXX) -i ./messages_668f2ef6e8ed03bf530574a7.txt --verbose
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1, client: 0} upload: {server: 0, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1, clientVersion: 0, origin: 666 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1102, client: 0} upload: {server: 1102, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1102, clientVersion: 0, origin: 7 [changeset with 13 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1146, client: 0} upload: {server: 1146, client: 0}, latest: 2189552}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1146, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - found upload changeset: 1146 8 300550592903 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 6
Realm.Sync.Client.Changeset - found upload changeset: 1146 9 300550593403 0 56
Realm.Sync.Client.Changeset - Decoded changeset: [changeset with 1 instructions]
Realm - integrated local changesets as version 7
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1183, client: 0} upload: {server: 1183, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1183, clientVersion: 0, origin: 7 [changeset with 2 instructions]
Realm - Scanning incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (2 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 2 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Finished transforming 2 local changesets through 1 incoming changesets (2 vs 2 instructions, in 1 conflict groups)
Realm.Sync.Client.Changeset - Integrated 1 changesets out of 1
Realm.Sync.Client.Changeset - decoding download message. {download: {server: 1189, client: 0} upload: {server: 1189, client: 0}, latest: 18351976}
Realm.Sync.Client.Changeset - found download changeset: serverVersion: 1189, clientVersion: 0, origin: 4 [changeset with 1 instructions]
Realm - Scanning incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [1/2] (1 instructions)
Realm.Sync.Client.Changeset - Scanning local changeset [2/2] (1 instructions)
Realm.Sync.Client.Changeset - Indexing incoming changeset [1/1] (1 instructions)
Realm.Sync.Client.Changeset - Finished changeset indexing (incoming: 1 changeset(s) / 1 instructions, local: 2 changeset(s) / 2 instructions, conflict group(s): 1)
Realm.Sync.Client.Changeset - Transforming local changeset [1/2] through 1 incoming changeset(s) with 1 conflict group(s)
Realm.Sync.Client.Changeset - Transforming local changeset [2/2] through 1 incoming changeset(s) with 1 conflict group(s)
terminate called after throwing an instance of 'realm::sync::IntegrationException'
  what():  Failed to transform received changeset: Merge error: left.prior_size == right.prior_size (instruction target: collRequired[ObjectId{517c5129a9090f09b83dff2b}].intListFld[11]). Please contact support.
Aborted

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ On 2024-07-12, danieltabacaru commented:

Thanks a lot guys! This is very helpful. I hope to get to the bottom of it now. I'll keep you in the loop.

Copy link
Author

sync-by-unito bot commented Jul 16, 2024

➤ PM Bot commented:

Jira ticket: RCORE-1977

Copy link
Author

sync-by-unito bot commented Jul 17, 2024

➤ danieltabacaru commented:

I managed to repro it and a fix is on its way. The issue is due to a subtle bug when saving empty reciprocal changesets (while at least one other non-empty reciprocal changeset at a previous version needs to be saved too). Once again, thanks a lot guys.
[~kiro.morkos@mongodb.com] fyi, your repro cannot be fully applied by the tool ({}Assertion failed: changeset.last_integrated_remote_version <= local_version with (changeset.last_integrated_remote_version, local_version) = [360, 9]{}). May you copy-pasted from a bigger file?

Copy link
Author

sync-by-unito bot commented Jul 17, 2024

➤ Kiro Morkos commented:

Nice!! Good to hear our theory was pretty close!

The messages I shared are being applied properly when I run the tool, but it's possible I'm using a different version. It seems like you already have your own repro now, but if ours would still be useful I can look into it. Let me know

Copy link
Author

sync-by-unito bot commented Jul 17, 2024

➤ danieltabacaru commented:

No need to look further into it. The bug manifests before applying that download message. It was mostly to confirm if the assertion failure is expected or not. There are cases when there will be no crash, but with an empty realm there is no way the local client version will get to 360 before applying that message.

Copy link
Author

sync-by-unito bot commented Jul 17, 2024

➤ Kiro Morkos commented:

Ah okay yea you're right, the header on that last changeset is incorrect, but the bug is manifesting before the invariant triggers an assertion failure.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant