Check for overlapping tags #441

cypherdare · 2016-07-08T01:42:06Z

Addresses #440.

magro · 2016-08-27T21:07:31Z

Thanks for the PR, and sorry for the delay (I was on vacation with bad connectivity).

Can you add a test for the added check?

cypherdare · 2017-04-20T01:36:27Z

@magro Sorry to take so long. I got burned by this issue again and spent three hours before discovering it. So I remembered that I had this PR waiting for clean-up. Should be ready now.

magro · 2017-04-22T09:12:19Z

Great, thanks!

magro · 2017-04-22T09:37:18Z

@cypherdare One follow up question: what's the recommendation for users that get this exception for their existing code base after the kryo update (assuming the underlying issue was not noticed before)? We should at least tell this in the release notes, maybe we could add some actionable hint to the exception.

cypherdare · 2017-04-24T01:37:00Z

@magro
So, thinking through ways in which someone may have released something with overlapping tags without noticing the error:

The overlapping tags were for fields that use the same number of bytes when written. Reading the data is safe, but one or both of the two fields has the incorrect value in it, so it is conceivable that this might not have been caught in testing. If the read order and write order are the same, then the second of the two fields will at least be correct. The order might have changed if field names have changed.
The overlapping tags were for the last two fields written, and the object doesn't appear as part of a larger object graph (because if it did, subsequent data in the stream would be read from the wrong location and an error would have occurred). In this case, the data in both fields is probably corrupted.
There is an extremely small chance that something could have made it through testing without either of the above occurring.

We might be able to conceivably recover data for situation 1. If the field names have not been changed, the two fields were probably written in alphabetical order, so when reading them, we could map multiple appearances of the same Tag ID sequentially to their original fields. Maybe someone could pass in a Map<Integer, List> linking old tag values to new tag values as a configuration. This is an awful lot of complexity to add to the TaggedFieldSerializer class for a small possibility. Perhaps a RecoveryTaggedFieldSerializer class could be written and put in a Gist for those that might want to attempt this recovery. But I don't know that it's worth trying to adequately test a solution like this if we don't know if anyone has been affected.

For situation 1, if we can assume the fields were written in alphabetical order*, the simpler solution is to give one of the two fields a new unique tag value, and the other should be left alone. The one that keeps the tag value should be the one that is last alphabetically. During reading, it will be written once with the wrong value, and then a second time with the correct value. The class should be updated to be able to withstand the first field's data being lost.

*I'm not 100% this can be assumed. It looks to me like fields are naturally in alphabetical order. We started sorting them by tag ID at some point, but fields with the same ID probably didn't switch places during sorting. If we can't assume this, we still need to keep that tag on one of the values to prevent errors during reading (like losing our place in the stream), but the class must be updated to assume that both of the fields might have incorrect values.

For situation 2, I think both fields need new IDs, and the original tag ID should be deprecated, which can be done by creating some new unused field and tagging it with that ID and deprecating it. Old data should still be readable, and the application can now safely write the data as part of a larger graph.

For situation 3, I think the same should be done as for situation 2, but there should also be error checking on the object graph after reading it if attempting to read old data.

magro · 2017-04-25T20:47:37Z

@cypherdare Wow, thanks for this elaborate analysis! When preparing the release notes I'll add a link to your comment here, ok?

cypherdare · 2017-04-26T02:39:34Z

@magro No problem. There could be errors here but if someone encounters this issue, we can probably reason out a solution.

cypherdare changed the title ~~Check for overlapping tags (addresses #440)~~ Check for overlapping tags Jul 8, 2016

cypherdare force-pushed the overlappingTagsCheck branch from bfd71b9 to 5fc5dcd Compare April 20, 2017 01:34

Check for overlapping tags.

763ce88

cypherdare force-pushed the overlappingTagsCheck branch from 5fc5dcd to 763ce88 Compare April 20, 2017 15:34

magro merged commit c3ad9fb into EsotericSoftware:master Apr 22, 2017

cypherdare deleted the overlappingTagsCheck branch April 24, 2017 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for overlapping tags #441

Check for overlapping tags #441

cypherdare commented Jul 8, 2016 •

edited

Loading

magro commented Aug 27, 2016

cypherdare commented Apr 20, 2017

magro commented Apr 22, 2017

magro commented Apr 22, 2017

cypherdare commented Apr 24, 2017 •

edited

Loading

magro commented Apr 25, 2017

cypherdare commented Apr 26, 2017

Check for overlapping tags #441

Check for overlapping tags #441

Conversation

cypherdare commented Jul 8, 2016 • edited Loading

magro commented Aug 27, 2016

cypherdare commented Apr 20, 2017

magro commented Apr 22, 2017

magro commented Apr 22, 2017

cypherdare commented Apr 24, 2017 • edited Loading

magro commented Apr 25, 2017

cypherdare commented Apr 26, 2017

cypherdare commented Jul 8, 2016 •

edited

Loading

cypherdare commented Apr 24, 2017 •

edited

Loading