Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-26169 Fix MapReduceBackupCopyJob.BackupDistCp.getKey() concatenates strings #3562

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
import java.lang.reflect.Method;
import java.math.BigDecimal;
import java.util.Arrays;
import java.util.Deque;
import java.util.LinkedList;
import java.util.List;
import java.util.Objects;

Expand Down Expand Up @@ -331,12 +333,14 @@ protected Path createInputFileListing(Job job) throws IOException {
private Text getKey(Path path) {
int level = conf.getInt(NUMBER_OF_LEVELS_TO_PRESERVE_KEY, 1);
int count = 0;
String relPath = "";
Deque<String> paths = new LinkedList<>();
StringBuilder relPath = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that obvious to me that using StringBuilder in this way is faster (looking at the jira, I assumed this is the purpose of the change) than simple string concatenation, because StringBuilder.insert() may involve multiple array copy.

How about instantiating StringBuilder with a capacity (=level) as parameter so that we don't need to re-allocate & copy arrays?

Copy link
Author

@skyguard1 skyguard1 Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the length of path.getName() is not fixed, how to determine the initial capacity?
Unless the Path is traversed again and the length of the name is calculated, the performance will obviously not improve. Splicing strings in a loop will generate a new string each time. In my opinion, using StringBuilder will improve performance, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a possible way is to store the path name in a list and then concat the list of strings from tail to head?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it acceptable to apply for the space of the collection and the time cost of traversing the collection again?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 I changed to use a collection to store strings, PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Thank you.

while (count++ < level) {
relPath = Path.SEPARATOR + path.getName() + relPath;
paths.addFirst(Path.SEPARATOR + path.getName());
path = path.getParent();
}
return new Text(relPath);
paths.forEach(relPath::append);
return new Text(relPath.toString());
}

private List<Path> getSourceFiles() throws NoSuchFieldException, SecurityException,
Expand Down