-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-22890 Verify the files when RegionServer is starting and BucketCache is in file mode #528
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -242,6 +244,11 @@ public int compare(BlockCacheKey a, BlockCacheKey b) { | |||
/** In-memory bucket size */ | |||
private float memoryFactor; | |||
|
|||
private String[] filePaths; | |||
static final String FILE_VERIFY_ALGORITHM = "hbase.bucketcache.file.verify.algorithm"; | |||
static final String DEFAULT_FILE_VERIFY_ALGORITHM = "MD5"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set them private, unless you use them somewhere outside package.
@@ -288,14 +294,15 @@ public BucketCache(String ioEngineName, long capacity, int blockSize, int[] buck | |||
this.ramCache = new ConcurrentHashMap<BlockCacheKey, RAMQueueEntry>(); | |||
|
|||
this.backingMap = new ConcurrentHashMap<BlockCacheKey, BucketEntry>((int) blockNumCapacity); | |||
|
|||
this.algorithm = conf.get(FILE_VERIFY_ALGORITHM,DEFAULT_FILE_VERIFY_ALGORITHM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, space ,DEFAULT_FILE_VERIFY_ALGORITHM
} catch (NoSuchAlgorithmException e) { | ||
LOG.error("No such algorithm : " + algorithm + "! Failed to persist data on exit",e); | ||
} catch (Exception e) { | ||
LOG.error("persist to file error"+e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, space style "+e)
@@ -1056,6 +1069,7 @@ private void retrieveFromFile(int[] bucketSizes) throws IOException, BucketAlloc | |||
"Attempt to restore non-persistent cache mappings!"); | |||
fis = new FileInputStream(persistencePath); | |||
ois = new ObjectInputStream(fis); | |||
ois.readUTF(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks backward compatibility. For an upgrading rs, it should read an old persistent file.
public static String getPreFilesKey(String persistencePath) throws IOException { | ||
FileInputStream fis = null; | ||
ObjectInputStream ois = null; | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use try-with-resource style
init(); | ||
} | ||
|
||
public void init() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method should be private?
import java.security.MessageDigest; | ||
import java.security.NoSuchAlgorithmException; | ||
|
||
public class FileIOEngineUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class can be final
public class FileIOEngineUtils { | ||
|
||
private FileIOEngineUtils() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless new line
private FileIOEngineUtils() { | ||
|
||
} | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a new line.
* @throws IOException something happened like file not exists | ||
* @throws NoSuchAlgorithmException no such algorithm | ||
*/ | ||
public static String getFilesKey(String[] filePaths, String algorithmName) throws IOException, NoSuchAlgorithmException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long to match code style.
private String[] filePaths; | ||
static final String FILE_VERIFY_ALGORITHM = "hbase.bucketcache.file.verify.algorithm"; | ||
static final String DEFAULT_FILE_VERIFY_ALGORITHM = "MD5"; | ||
private String algorithm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc for this new member, what is this algorithm for?
@@ -242,6 +244,11 @@ public int compare(BlockCacheKey a, BlockCacheKey b) { | |||
/** In-memory bucket size */ | |||
private float memoryFactor; | |||
|
|||
private String[] filePaths; | |||
static final String FILE_VERIFY_ALGORITHM = "hbase.bucketcache.file.verify.algorithm"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this parameters, "hbase.bucketcache.persistent.file.integrity.check.algorithm" for your reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but there're many code style problems(space, new line), please fix.
BTW, there one place breaking backward compatibility, it's a design problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. And please raise a ticket to forward-port master, and branch-2.x.
💔 -1 overall
This message was automatically generated. |
@@ -267,6 +318,98 @@ void refreshFileConnection(int accessFileNum, IOException ioe) throws IOExceptio | |||
} | |||
} | |||
|
|||
@Override | |||
public byte[] readPersistenceChecksum(String persistencePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be called readPersistentCheckSum(). But do we need this API. Instead can we just do this internally under verifyCheckSum() or verifyFileIntegrity()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified. Thanks for giving advice.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed. LGTM. @ramkrish86 and @anoopsjohn ... is it good by you fellows? I see @Reidddddd gave it his blessing already.
private final String[] filePaths; | ||
private final FileChannel[] fileChannels; | ||
private final RandomAccessFile[] rafs; | ||
private final ReentrantLock[] channelLocks; | ||
|
||
private final long sizePerFile; | ||
private final long capacity; | ||
private final String algorithmName; | ||
private boolean isOldVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/isOldVersion/oldVersion/
isOldVersion is name you'd use for the method that returns this boolean.
|
||
/** | ||
* Whether the persistence file is old version, it's for back compatibility | ||
* @return true if the persistence file is old version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might say what 'old version' means... means it does not support this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified. Thanks!
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
private static final Log LOG = LogFactory.getLog(FileIOEngine.class); | ||
public static final String FILE_DELIMITER = ","; | ||
private static final DuFileCommand du = new DuFileCommand(new String[] {"du", ""}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper case for static final member.
@Override | ||
public byte[] calculateChecksum() | ||
throws IOException, NoSuchAlgorithmException { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary empty line.
💔 -1 overall
This message was automatically generated. |
if (!ioEngine.isPersistent()) { | ||
throw new IOException("Attempt to persist non-persistent cache mappings!"); | ||
} | ||
fos = new FileOutputStream(persistencePath, false); | ||
oos = new ObjectOutputStream(fos); | ||
if (ioEngine instanceof PersistentIOEngine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are in this persistToFile() , it means it is PersistentIOEngine. May be an assert and direct casting is better way than if check.
fis = new FileInputStream(persistencePath); | ||
ois = new ObjectInputStream(fis); | ||
// for backward compatibility | ||
if (ioEngine instanceof PersistentIOEngine && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment. See above
if (!ioEngine.isPersistent()) throw new IOException().
Just after that line itself you can do the typecast.
@@ -1078,9 +1097,8 @@ private void retrieveFromFile(int[] bucketSizes) throws IOException, BucketAlloc | |||
bucketAllocator = allocator; | |||
deserialiserMap = deserMap; | |||
backingMap = backingMapFromFile; | |||
blockNumber.set(backingMap.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Is it related to this jira directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When retrieve successfully from file, the "Block Count" in WebUI would be 0 if blockNumber is not changed. But it's have blocks actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So this is an existing bug. Its a one liner change. Still can be done as another bug jira may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
ois.read(PBMagic); | ||
int length = ois.readInt(); | ||
byte[] persistenceChecksum = new byte[length]; | ||
ois.read(persistenceChecksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we are reading persistentChecksum twice in this flow. At FileIOE create time as part of verify call and here too. Here we are doing as a skip way. So why can't we do it here only? We have verifyFileIntegrity() in PersistentIOEngine interface and we can call that from here? It looks bit odd. The oldVersion check can be done here also based on he PBMagic matching.
isOldVersion() API itself not needed in the FileIOE. We process the persisted meta info here and based on that recreate the backingMap etc here in BC. So knowing whether checksum also persisted and if so verify that all can be here. I mean the actual verify imp can be in FileIOE but the call to that should be from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about your thoughts. If we want to read it once, we should pass ObjectInputStream object "ois" to the verifyFileIntegrity() method. If it's an old version persistent file, the ois object should be reset, but reset() method is not support. We can recreate an ObjectInputStream without using try-with-resource statement, but this may be a bit unsightly......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we need to close and reopen for the read. My biggest worry is where we do this verify. See my below comment. Now we are doing it while creation of FileIOE. If the verify fails, we are not allowing the FileIOE to be created and do its future work. My point is this. We should create the FileIOE. And then the Bucket Cache is trying to retrieve the already cached data from persistent store and for that its recreating the cache meta. At that step we should be doing the verification right. First see whether the checksum for verify is present already and if so verify. If verify ok and then try to recreate the cache meta data. Or else just forget abt that existing persisted cache data and may be do the necessary cleanup. All these work of Bucket Cache. It can ask the FileIOE to do actual verify. But should be initiated by the BucketCache. You get my mind clearly now? Sorry for not saying in detail at 1st step itself.
Ya may be we need close the file and reopen in case of old style with no pb magic. Or else consider it as 4 bytes and read next 4 bytes and make out the 8 bytes long number. But that may be having challenges wrt the platform. I dont think it is an issue to just reopen the file if no pbmagic. Comments. @Reidddddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point already. In fact, if the verification fails, FileIOE can still be created. If verification fails, we would throw IOException, then cache the IOException and do some cleanup, but the creation of FileIOE will continue. Below is the code for the cache:
catch (IOException ioex) { LOG.error("File verification failed because of ", ioex); // delete cache files and backingMap persistent file. deleteCacheDataFile(); new File(persistentPath).delete(); }
However, I totally agree with what you said, I will modify it immediately.
if (!Bytes.equals(persistentChecksum, calculateChecksum)) { | ||
LOG.warn("The persistent checksum is " + Bytes.toString(persistentChecksum) + | ||
", but the calculate checksum is " + Bytes.toString(calculateChecksum)); | ||
throw new IOException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if the checksum do not match, we can still continue with RS operation. We can not regain the old cached data. But now as this throw IOE happens while construction of the FileIOEngine, we can no longer use the IOEngine itself. That is wrong. One more reason not to do this verify as part of constructor but at a later time as part of retrieve from persisted meta data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IOException will be cached, and then do the delete operation. So the IOEngine can use after that.
} | ||
} else { | ||
// not configure persistent path | ||
deleteCacheDataFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where we will create the cache files again then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache files would be created in init() method. rafs[i] = new RandomAccessFile(filePath, "rw");
will create new file if it not exist.
/** | ||
* Delete bucketcache files | ||
*/ | ||
void deleteCacheDataFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this also not needed in Interface at least as of now. If we verify the checksum in BucketCache (as above comment) and then decide to do this delete, ya then it is needed. But in flow when and where we create it again?
Hello, @Reidddddd @anoopsjohn , I have created a new pull request. HBASE-22890 |
HBASE 22890