-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-26064 Introduce a StoreFileTracker to abstract the stor… #3460
Conversation
@wchevreuil PTAL. I will commit this to a feature branch first as there are still lots of other place to integrate, such as MergeTableRegionsProcedure/SplitTableRegionProcedure, snapshot, major compaction tool and so on(just search the usage of HRegionFileSystem.getStorefiles). This is another direction on how to make it possible to write directly to data directory for compaction and flush. And I think this is a more general way to support different persistency ways, either with a file or with a region, and easier to migrate betewwn different persisent ways(or even without persistency) Feel free to comment anything here. I will put up a design doc soon. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick look. Had a few questions
@@ -49,6 +49,7 @@ | |||
private final ColumnFamilyDescriptor family; | |||
private final Path familyStoreDirectoryPath; | |||
private final RegionCoprocessorHost coprocessorHost; | |||
private final boolean primary; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does primary mean here? Primary versus read replica?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary region replica.
* callers to not call it inside a lock which may block normal read/write requests. | ||
*/ | ||
@InterfaceAudience.Private | ||
public interface StoreFileTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk a bit about your plans here.
What is the reasoning for introducing a new StoreFileTracker that duplicates some of the APIs from the SFM? Why not handle internal to the SFM (+ add a method either in the SFM or HStore/StoreContect about should write to tmp)? The SFM should have the source of truth for the right files to be included. If we do add this separate StoreFileTracker, I think it should be called internal to the SFM methods. We should not require callers of the SFM to have to call two separate interfaces for one logical operation (for example updating the files that are present in the SFM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very simple question, as there is already a StripeStoreFileManager, how do you plan to implement a seperated SFM to support tracking store files for users who have enabled stripe compaction?
/** | ||
* Load the store files list when opening a region. | ||
*/ | ||
List<StoreFileInfo> loadStoreFiles() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on moving this out of HStore. Let's discuss adding it to the SFM interface versus the StoreFileTracker. I believe in our implementation we added it to the SFM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see HBASE-26065, where I added clear javadoc for StoreFileManager that it only holds in memory state. This is very important as we will call the update methods of SFM inside the write lock of a HStore, which will block all the normal read write requests. It is not a good idea to do IO inside these methods, that's another reason why I do not think we should add the persistent logic in SFM. If we want to do this, we need to move the lock into SFM to make it more fine-grained, which I think will be much more complicated...
public final class StoreFileTrackerFactory { | ||
|
||
public static StoreFileTracker create(StoreContext ctx) { | ||
return new DefaultStoreFileTracker(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a NoopStoreFileTracker if the StoreContext tells us we aren't the primary? Then you don't have to add the isPrimary() checks everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a Noop... At least it needs to support loading store files... It just means for secondary replicas you should not try to update the store file list...
🎊 +1 overall
This message was automatically generated. |
https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit?usp=sharing I've finish the first version of the design doc. PTAL. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Moved StoreFileTracker to StoreEngine, per @z-york and @saintstack 's suggestion on the design doc. And the root reason is I found that there is already a DateTieredStoreEngine, where we just make use DefaultStoreFileManager, so when switching StoreEngine, it does not mean that you need to change all the components. So I think make all StoreEngine use the same StoreFileTracker abstraction is also workable here. And I also moved the createWriterInTmp method to StoreFileTracker, to hide the requireCreateInTmp method. Thanks. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Sorry, hit close button by mistake thinking I was closing a comment of mine. Had just reopened it. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@saintstack @wchevreuil @z-york I've modified the patch. PTAL. Now most storefile relate operations are moved to StoreEngine, and we will not expose StoreFileTracker to upper layer any more. In HStore we just need to call the methods in StoreEngine, and inside StoreEngine we will update StoreFileTracker and then StoreFileManager. The store lock is also moved into StoreEngine to allow we do fine-grained lock in StoreEngine. Thanks. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Ping @wchevreuil @saintstack @z-york Any other concerns on this approach? If worth a try, I will merge this to a feature branch and start to implement file based tracking. Thanks. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I think this is good to go, @Apache9 . Are you planning to merge this on a separate feature branch, or the one created in HBASE-24749? It might conflict with the works I had made there for flushes and compactions, but I think this refactoring basically dismisses those. The splits/merges one (still open), may still apply. |
Better on another feature branch, I think this is another abstraction way comparing to HBASE-24749. We could work together on the following issues on how to better implement merge and split, with this abstraction way, and also how to integrate the region based tracking way which has already been implemented on branch HBASE-24749. If these things all work, then I think we could unify these two feature branches. If not, I think we need to go back to the design doc to discuss again. The goal of this issue is to make the store file tracking way pluggable so we will all be happy. What do you think? |
Actually, yeah, let's do on a new feature branch, I guess it would be easier to cherry-pick things from HBASE-24749 that may still be relevant into this new one, and apply this refactory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure to merge into the new feature branch (this PR is targeting master).
Pushed to branch HBASE-26067. Closed this PR. |
* <p/> | ||
* Here, since we always need to update SFM and SFT almost at the same time, we introduce methods in | ||
* StoreEngine directly to update them both, so upper layer just need to update StoreEngine once, to | ||
* reduce the possible misuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good
…e file tracking logic