-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27217 Revisit the DumpReplicationQueues tool #4810
HBASE-27217 Revisit the DumpReplicationQueues tool #4810
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Apache9 sir. Could you take a look? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, PTAL.
Thanks @2005hithlj !
} | ||
} | ||
|
||
public String dumpReplicationViaTable(Connection connection) throws ReplicationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public?
if (queueId.isRecovered()) { | ||
wals = AbstractFSWALProvider | ||
.getArchivedWALFiles(connection.getConfiguration(), queueId.getSourceServerName().get(), | ||
URLEncoder.encode(queueId.getSourceServerName().get().toString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need encode here?
sb.append(" PeerID: " + queueId.getPeerId() + "\n"); | ||
sb.append(" Recovered: " + queueId.isRecovered() + "\n"); | ||
if (queueId.getSourceServerName().isPresent()) { | ||
sb.append(" Dead RegionServers: " + queueId.getSourceServerName().get() + "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually here we only record the first region server instead of the all the region servers, so maybe we should change the wording here in the future...
// long position = queueStorage.getWALPosition(regionserver, queueInfo.getPeerId(), wal); | ||
// sb.append(" Replication position for " + wal + ": " | ||
// + (position > 0 ? position : "0" + " (not started or nothing to replicate)") + "\n"); | ||
long position = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should filter out the wals before the offset?
position = offset.getOffset(); | ||
} | ||
} | ||
sb.append(" Replication position for " + (walGroup != null ? walGroup + "/" + wal : wal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that, -1 in ReplicationGroupOffset means this file has already been fully replicated.
82a795a
to
67a94a6
Compare
60f3ae0
to
77e8bcb
Compare
@@ -229,21 +238,45 @@ private int dumpReplicationQueues(DumpOptions opts) throws Exception { | |||
LOG.info("Found [--distributed], will poll each RegionServer."); | |||
Set<String> peerIds = | |||
peers.stream().map((peer) -> peer.getPeerId()).collect(Collectors.toSet()); | |||
System.out.println(dumpQueues(zkw, peerIds, opts.isHdfs())); | |||
System.out.println(dumpQueues(zkw, connection, peerIds, opts.isHdfs())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need zkw?
for (ReplicationQueueId queueId : queueIds) { | ||
List<String> wals = null; | ||
if (queueId.isRecovered()) { | ||
wals = AbstractFSWALProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use queueId.getServerWALsBelongTo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can really solve my problem.
queueId.getSourceServerName().get().toString()) | ||
.stream().map(Path::toString).collect(Collectors.toList()); | ||
} else { | ||
wals = AbstractFSWALProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this enough to get all the wals for a live region server?
StringBuilder sb = new StringBuilder(); | ||
ReplicationQueueStorage queueStorage = | ||
ReplicationStorageFactory.getReplicationQueueStorage(connection, getConf()); | ||
Set<ServerName> liveRegionServers = ZKUtil.listChildrenNoWatch(zkw, zkw.getZNodePaths().rsZNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be done through Admin interface, so we do not need to depend on zk?
queueId.getServerName().toString()) | ||
.stream().map(Path::toString).collect(Collectors.toList()); | ||
} | ||
Collections.sort(wals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String compare is enough for sorting? Not sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use TS_COMPARATOR?
...er/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
} | ||
} | ||
|
||
private String dumpReplicationViaTable(Connection connection) throws ReplicationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the code in ZKDump.getReplicationZnodesDump, I do not think this is a direct mapping of the code there?
In ZKDump.getReplicationZnodesDump, first we dump the replication peer data, while here we do not dump it.
And when dumping the replication queue data, we will iterate all rsZNode and print them all, but here we first print all the peers and region servers, and then print all the wal files? Seems a bit strange...
And for dumping hfile refs, we will print hfile refs for each peer in ZKDump.getReplicationZnodesDump, but here we just print them all, without the peer information...
queueId.getServerName().toString()) | ||
.stream().map(Path::toString).collect(Collectors.toList()); | ||
} | ||
Collections.sort(wals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use TS_COMPARATOR?
" Replication position for " + (walGroup != null ? walGroup + "/" + wal : wal) + ": "); | ||
// Position is -1, which means that the file has already been fully replicated, | ||
// the logic here is different from the previous version. | ||
if (position == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we arrive here but position is -1? I think it will be filtered out in the above ReplicationOffsetUtil.shouldReplicate check?
@@ -83,7 +83,7 @@ public void testDumpReplicationReturnsWalSorted() throws Exception { | |||
Set<String> peerIds = new HashSet<>(); | |||
peerIds.add("1"); | |||
dumpQueues.setConf(config); | |||
String dump = dumpQueues.dumpQueues(zkWatcherMock, peerIds, false); | |||
String dump = dumpQueues.dumpQueues(null, peerIds, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing null is enough here? We do not need to pass a Connection instance?
67a94a6
to
6d4e7fd
Compare
6d4e7fd
to
56f0241
Compare
I've done a rebase on HBASE-27109, please also rebase here. Thanks. @2005hithlj |
@Apache9 OK sir, and I will have a new commit for the comments. |
dee333c
to
36ed2ca
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
...er/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
// only for test | ||
if (tmpWals.isEmpty() && !wals.contains(offset.getWal())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
...er/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java
Show resolved
Hide resolved
7d1cbb3
to
c5196d7
Compare
HBASE-27218 has been merged, so you need a rebase here... |
OK, sir. |
e90d2c9
to
e9cbb69
Compare
🎊 +1 overall
This message was automatically generated. |
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
https://issues.apache.org/jira/browse/HBASE-27217