-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28456 HBase Restore restores old data if data for the same timestamp is in different hfiles #5775
Conversation
e788c26
to
cdef073
Compare
…stamp is in different hfiles
cdef073
to
5c744b1
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -165,7 +165,7 @@ protected static byte[] combineTableNameSuffix(byte[] tableName, byte[] suffix) | |||
* package-private for internal usage for jobs like WALPlayer which need to use features of | |||
* ExtendedCell. | |||
*/ | |||
static final String EXTENDED_CELL_SERIALIZATION_ENABLED_KEY = | |||
public static final String EXTENDED_CELL_SERIALIZATION_ENABLED_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is becoming part of the public API. It's worth updating the comment about its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndimiduk I decided to annotate that as IA.Private. I know we generally prefer not to do that for fields/methods, but for better or worse there's already a strong convention of doing it in this class (11 methods annotated IA.Private). It's probably worth a larger refactor/cleanup of HFileOutputFormat2
Let me know if you strongly disagree with this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good by me.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java
Outdated
Show resolved
Hide resolved
@@ -423,6 +424,54 @@ public String toString() { | |||
+ (isReference() ? "->" + getReferredToFile(this.getPath()) + "-" + reference : ""); | |||
} | |||
|
|||
/** | |||
* Cells in a bulkloaded file don't have a sequenceId since they don't go through memstore. When a | |||
* bulkload file is committed, the current memstore ts is stamped onto the file name as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! Seems like we need a seqId field in the file header instead of relying on the file name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why we don't do this already is that a bulkloaded HFile is typically created outside the cluster. At that point, we don't know what the memstore seq id is at. Only at the time of bulkload file commit, when we've locked and flushed the memstore, do we know the memstore seq id.
Since hfiles are immutable, we can't open it at that point and change a metadata. So we need to add it to the filename I guess. I agree relying on the file name seems brittle. A more robust solution might be tricky given the above tho
🎊 +1 overall
This message was automatically generated. |
|
||
import static org.apache.hadoop.hbase.backup.BackupInfo.BackupState.COMPLETE; | ||
import static org.apache.hadoop.hbase.backup.BackupType.FULL; | ||
import static org.junit.Assert.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid star imports.
import java.io.IOException; | ||
import java.nio.ByteBuffer; | ||
import java.time.Instant; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@@ -109,6 +114,9 @@ public boolean nextKeyValue() throws IOException, InterruptedException { | |||
return false; | |||
} | |||
value = scanner.getCell(); | |||
if (value != null && bulkloadSeqId.isPresent()) { | |||
PrivateCellUtil.setSequenceId(value, bulkloadSeqId.getAsLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use bulkloadSeqId.ifPresent because PrivateCellUtil.setSequenceId will throw exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea setSequenceId throws IOException. Figured it looks cleaner this way
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…stamp is in different hfiles (#5775) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
… for the same timestamp is in different hfiles (apache#5775) Signed-off-by: Nick Dimiduk <ndimiduk@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
_SeqId_
handling