Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27763 Recover WAL encounter KeeperErrorCode = NoNode cause Regi… #5177

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gottagogottagoGxj
Copy link

…onServer crash

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 54s master passed
+1 💚 compile 2m 34s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 30s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 35s the patch passed
+1 💚 compile 2m 31s the patch passed
+1 💚 javac 2m 31s the patch passed
-0 ⚠️ checkstyle 0m 34s hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 22s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
-1 ❌ spotless 0m 36s patch has 53 errors when running spotless:check, run spotless:apply to fix.
-1 ❌ spotbugs 1m 41s hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
40m 13s
Reason Tests
FindBugs module:hbase-server
Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:[line 180]
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5177
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux d9c601b0220d 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a711059
Default Java Eclipse Adoptium-11.0.17+8
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/patch-spotless.txt
spotbugs https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count 82 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Apr 13, 2023

Mind explaining more on how do we fix the no node exception?

@thangTang
Copy link
Contributor

thangTang commented Mar 21, 2024

Hi @gottagogottagoGxj, appreciate if you could give some more explain about this ticket and your HBase version.

Seems I met this issue too, on HBase 2.4.11

Here is my log:

2024-03-21 16:19:43,379 WARN  [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.ReplicationSourceShipper: com.shopee.di.foundation.hbase.KafkaInterClusterReplicationEndpoint threw unknown exception:
java.util.ConcurrentModificationException
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1221)
        at org.apache.hadoop.hbase.replication.regionserver.MetricsSource.updateTableLevelMetrics(MetricsSource.java:112)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:215)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
2024-03-21 16:19:43,405 ERROR [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.HRegionServer: ***** ABORTING region server ip-10-80-163-145.idata-server.shopee.io,16020,1704705566934: Failed to operate on replication queue *****
org.apache.hadoop.hbase.replication.ReplicationException: Failed to set log position (serverName=xxxxx,1704705566934, queueId=xxxxx,1705567104078, fileName=000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1.1711008927746, position=130724689)
        at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:255)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.lambda$logPositionAndCleanOldLogs$8(ReplicationSourceManager.java:552)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.interruptOrAbortWhenFail(ReplicationSourceManager.java:500)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:551)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceInterface.logPositionAndCleanOldLogs(ReplicationSourceInterface.java:206)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.updateLogPosition(ReplicationSourceShipper.java:264)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:203)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1925)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1830)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:658)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1534)
        at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:245)
        ... 7 more

*Desensitized information such as servername and IP.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants