Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for overlapping tags #441

Merged
merged 1 commit into from
Apr 22, 2017

Conversation

cypherdare
Copy link
Contributor

@cypherdare cypherdare commented Jul 8, 2016

Addresses #440.

@cypherdare cypherdare changed the title Check for overlapping tags (addresses #440) Check for overlapping tags Jul 8, 2016
@magro
Copy link
Collaborator

magro commented Aug 27, 2016

Thanks for the PR, and sorry for the delay (I was on vacation with bad connectivity).

Can you add a test for the added check?

@cypherdare
Copy link
Contributor Author

@magro Sorry to take so long. I got burned by this issue again and spent three hours before discovering it. So I remembered that I had this PR waiting for clean-up. Should be ready now.

@magro magro merged commit c3ad9fb into EsotericSoftware:master Apr 22, 2017
@magro
Copy link
Collaborator

magro commented Apr 22, 2017

Great, thanks!

@magro
Copy link
Collaborator

magro commented Apr 22, 2017

@cypherdare One follow up question: what's the recommendation for users that get this exception for their existing code base after the kryo update (assuming the underlying issue was not noticed before)? We should at least tell this in the release notes, maybe we could add some actionable hint to the exception.

@cypherdare cypherdare deleted the overlappingTagsCheck branch April 24, 2017 00:56
@cypherdare
Copy link
Contributor Author

cypherdare commented Apr 24, 2017

@magro
So, thinking through ways in which someone may have released something with overlapping tags without noticing the error:

  1. The overlapping tags were for fields that use the same number of bytes when written. Reading the data is safe, but one or both of the two fields has the incorrect value in it, so it is conceivable that this might not have been caught in testing. If the read order and write order are the same, then the second of the two fields will at least be correct. The order might have changed if field names have changed.
  2. The overlapping tags were for the last two fields written, and the object doesn't appear as part of a larger object graph (because if it did, subsequent data in the stream would be read from the wrong location and an error would have occurred). In this case, the data in both fields is probably corrupted.
  3. There is an extremely small chance that something could have made it through testing without either of the above occurring.

We might be able to conceivably recover data for situation 1. If the field names have not been changed, the two fields were probably written in alphabetical order, so when reading them, we could map multiple appearances of the same Tag ID sequentially to their original fields. Maybe someone could pass in a Map<Integer, List> linking old tag values to new tag values as a configuration. This is an awful lot of complexity to add to the TaggedFieldSerializer class for a small possibility. Perhaps a RecoveryTaggedFieldSerializer class could be written and put in a Gist for those that might want to attempt this recovery. But I don't know that it's worth trying to adequately test a solution like this if we don't know if anyone has been affected.

For situation 1, if we can assume the fields were written in alphabetical order*, the simpler solution is to give one of the two fields a new unique tag value, and the other should be left alone. The one that keeps the tag value should be the one that is last alphabetically. During reading, it will be written once with the wrong value, and then a second time with the correct value. The class should be updated to be able to withstand the first field's data being lost.

*I'm not 100% this can be assumed. It looks to me like fields are naturally in alphabetical order. We started sorting them by tag ID at some point, but fields with the same ID probably didn't switch places during sorting. If we can't assume this, we still need to keep that tag on one of the values to prevent errors during reading (like losing our place in the stream), but the class must be updated to assume that both of the fields might have incorrect values.

For situation 2, I think both fields need new IDs, and the original tag ID should be deprecated, which can be done by creating some new unused field and tagging it with that ID and deprecating it. Old data should still be readable, and the application can now safely write the data as part of a larger graph.

For situation 3, I think the same should be done as for situation 2, but there should also be error checking on the object graph after reading it if attempting to read old data.

@magro
Copy link
Collaborator

magro commented Apr 25, 2017

@cypherdare Wow, thanks for this elaborate analysis! When preparing the release notes I'll add a link to your comment here, ok?

@cypherdare
Copy link
Contributor Author

@magro No problem. There could be errors here but if someone encounters this issue, we can probably reason out a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants