Fix: This should not happen. Please open an issue on GitHub. #914
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, this PR prevents the occurence of the well known
HiveError: This should not happen. Please open an issue on GitHub.
error moreover it fixes these boxes if the box becomes unrecoverable.This PR has two commits:
When / How does this error happen?
Short answer
The problem happens when Hive tries to open a corrupted box which is recoverable however unfortunately the recover process is working incorrectly and if Hive tries to put something in the "recovered" box then the box becomes unrecoverable but it can be used for write / read until the box is closed and whenever Hive wants open this unrecoverable box the above error happens.
Long answer + example code
Demonstration example:
Let me break the output of the example:
1) open db / write 2 frames / close / dump
These raw bytes contain two frames:
[27, 0, 0, 0, 1, 6, 102, 114, 97, 109, 101, 49, 4, 6, 0, 0, 0, 102, 114, 97, 109, 101, 49, 13, 235, 12, 102]
27, 0, 0, 0,
means frame's length is 271,
meansFrameKeyType
isasciiStringT
6,
means the keyframe1
is 6 length102, 114, 97, 109, 101, 49,
means the keyframe1
itself represented as UTF-16 code4,
meansFrameValueType
isstringT
6, 0, 0, 0,
means the valueframe1
is 6 length102, 114, 97, 109, 101, 49,
means the valueframe1
itself represented as UTF-16 code13, 235, 12, 102
CRC check[27, 0, 0, 0, 1, 6, 102, 114, 97, 109, 101, 50, 4, 6, 0, 0, 0, 102, 114, 97, 109, 101, 50, 71, 104, 155, 136]
frame2
so the102, 114, 97, 109, 101, 49,
parts become102, 114, 97, 109, 101, 50,
2) corrupting database / dump
As you can see
102, 114, 97, 109, 101, 50,
part is changed to102, 114, 97, 109, 101,
51,
which would mean the value isframe3
instead offrame2
however CRC check will know that this frame is corrupted3) open db / dump
It seems that Hive has successfully recovered the box.
4) write 1 frame / close / dump
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
this part is the main problem and according to current Hive implementation this results in an unrecoverable box.Note: here we can still write as many frames as we want and read them since the
keyStore
caches them.5) open database / close / dump
HiveError: This should not happen. Please open an issue on GitHub.
What happened?
The main problem is when Hive tries to recover the recoverable box it does the following:
The above snippet forgets to set
writeRaf
's position torecoveryOffset
. ThewriteOffset
is set appropriately which basically helps frames identify their start offset which is useful for instance: sorting frames so deleted ones can be determined / lazy frames can be read from filesystem.Closes #263