-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28836 Parallelize the file archival to improve the split times #6243
base: master
Are you sure you want to change the base?
Changes from 1 commit
be6e570
2f405e9
b33878a
1e21b43
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -439,7 +439,7 @@ private static List<File> resolveAndArchive(FileSystem fs, Path baseArchiveDir, | |
|
||
List<File> failures = new ArrayList<>(); | ||
String startTime = Long.toString(start); | ||
for (File file : toArchive) { | ||
toArchive.parallelStream().forEach(file -> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also if toArchive is small, there are chances that making it parallel can take more time than doing the same sequentially. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This wouldn't be the case as it is an IO operation (and diff wouldn't be noticable). So parallelizing it would improve it over what we already have. In pricinple i agree about what you are saying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's another point to consider There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the access to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are IO operations in the code block so usually it is not safe to use parallelStream... AFAIK parallelStream will make use of a common thread pool(ForkJoinPool.commonPool IIRC) which could be used by lots of other operations in java, so it is easy to introduce strange dead locks. If we want to do it in parallel, we'd better introduce our own thread pool. |
||
// if its a file archive it | ||
try { | ||
LOG.trace("Archiving {}", file); | ||
|
@@ -463,7 +463,7 @@ private static List<File> resolveAndArchive(FileSystem fs, Path baseArchiveDir, | |
LOG.warn("Failed to archive {}", file, e); | ||
failures.add(file); | ||
} | ||
} | ||
}); | ||
return failures; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to check the difference between the parallelStream() and stream().parallel(). Java doc say that for parallelStream "
It is allowable for this method to return a sequential stream.
"