Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:propagate onerror event to statelistener #918

Merged
merged 1 commit into from
Jan 9, 2023

Conversation

yuyang0423
Copy link
Contributor

Motivation:

We see below error log after a crash case. The node comes back but failed started success.

how about propagate onError event to state listener and let upper user know what happened?

2022-12-26 14:24:11.750 [parallel-kv-executor #2] INFO  TaskHandler:61 - Task handler on start: parallel-kv-executor #2.
2022-12-26 14:24:11.751 [parallel-kv-executor #3] INFO  TaskHandler:61 - Task handler on start: parallel-kv-executor #3.
2022-12-26 14:24:11.757 [main] INFO  NamedThreadFactory:82 - Creates new Thread[get_batching #0,5,main].
2022-12-26 14:24:11.759 [main] INFO  NamedThreadFactory:82 - Creates new Thread[get_batching_only_safe #0,5,main].
2022-12-26 14:24:11.763 [main] INFO  NamedThreadFactory:82 - Creates new Thread[put_batching #0,5,main].
2022-12-26 14:24:11.764 [main] INFO  DefaultRheaKVStore:295 - [DefaultRheaKVStore] start successfully, options: RheaKVStoreOptions{clusterId=0, clusterName=' test_election', placementDriverOptions=PlacementDriverOptions{fake=true, cliOptions=null, pdRpcOptions=null, pdGroupId='null', regionRouteTableOptionsList=null, initialServerList='192.168.80.192:8082', initialPdServerList='null'}, storeEngineOptions=StoreEngineOptions{storageType=RocksDB, rocksDBOptions=RocksDBOptions{sync=false, disableWAL=true, fastSnapshot=false, asyncSnapshot=false, openStatisticsCollector=true, statisticsCallbackIntervalSeconds=0, dbPath='/opt/election/db/db_-1_8082'}, memoryDBOptions=null, raftDataPath='/opt/election', serverAddress=192.168.80.192:8082, commonNodeOptions=NodeOptions{electionTimeoutMs=5000, electionPriority=-1, decayPriorityGap=10, leaderLeaseTimeRatio=90, snapshotIntervalSecs=3600, snapshotLogIndexMargin=0, catchupMargin=1000, initialConf=, fsm=null, logUri='null', raftMetaUri='null', snapshotUri='null', filterBeforeCopyRemote=false, disableCli=false, sharedTimerPool=false, timerPoolSize=12, cliRpcThreadPoolSize=4, raftRpcThreadPoolSize=24, enableMetrics=false, snapshotThrottle=null, sharedElectionTimer=false, sharedVoteTimer=false, sharedStepDownTimer=false, sharedSnapshotTimer=false, serviceFactory=com.alipay.sofa.jraft.core.DefaultJRaftServiceFactory@64021427, applyTaskMode=NonBlocking, raftOptions=RaftOptions{maxByteCountPerRpc=131072, fileCheckHole=false, maxEntriesSize=1024, maxBodySize=524288, maxAppendBufferSize=262144, maxElectionDelayMs=3000, electionHeartbeatFactor=10, applyBatch=32, sync=true, syncMeta=false, openStatistics=true, replicatorPipeline=true, maxReplicatorInflightMsgs=256, disruptorBufferSize=16384, disruptorPublishEventWaitTimeoutSecs=10, enableLogEntryChecksum=false, readOnlyOptions=ReadOnlySafe, maxReadIndexLag=-1, stepDownWhenVoteTimedout=true, startUpOldStorage=false}} RpcOptions{rpcConnectTimeoutMs=1000, rpcDefaultTimeout=5000, rpcInstallSnapshotTimeout=300000, rpcProcessorThreadPoolSize=80, enableRpcChecksum=false, metricRegistry=null}, regionEngineOptionsList=[RegionEngineOptions{regionId=-1, startKey='null', startKeyBytes=null, endKey='null', endKeyBytes=null, raftGroupId=' test_election--1', raftDataPath='/opt/election/raft_data_region_-1_8082', nodeOptions=NodeOptions{electionTimeoutMs=5000, electionPriority=-1, decayPriorityGap=10, leaderLeaseTimeRatio=90, snapshotIntervalSecs=3600, snapshotLogIndexMargin=0, catchupMargin=1000, initialConf=192.168.80.192:8082, fsm=com.alipay.sofa.jraft.rhea.storage.KVStoreStateMachine@fbe70d8, logUri='/opt/election/raft_data_region_-1_8082/log', raftMetaUri='/opt/election/raft_data_region_-1_8082/meta', snapshotUri='/opt/election/raft_data_region_-1_8082/snapshot', filterBeforeCopyRemote=false, disableCli=false, sharedTimerPool=false, timerPoolSize=12, cliRpcThreadPoolSize=4, raftRpcThreadPoolSize=24, enableMetrics=true, snapshotThrottle=null, sharedElectionTimer=false, sharedVoteTimer=false, sharedStepDownTimer=false, sharedSnapshotTimer=false, serviceFactory=com.alipay.sofa.jraft.core.DefaultJRaftServiceFactory@64021427, applyTaskMode=NonBlocking, raftOptions=RaftOptions{maxByteCountPerRpc=131072, fileCheckHole=false, maxEntriesSize=1024, maxBodySize=524288, maxAppendBufferSize=262144, maxElectionDelayMs=3000, electionHeartbeatFactor=10, applyBatch=32, sync=true, syncMeta=false, openStatistics=true, replicatorPipeline=true, maxReplicatorInflightMsgs=256, disruptorBufferSize=16384, disruptorPublishEventWaitTimeoutSecs=10, enableLogEntryChecksum=false, readOnlyOptions=ReadOnlySafe, maxReadIndexLag=-1, stepDownWhenVoteTimedout=true, startUpOldStorage=false}} RpcOptions{rpcConnectTimeoutMs=1000, rpcDefaultTimeout=5000, rpcInstallSnapshotTimeout=300000, rpcProcessorThreadPoolSize=80, enableRpcChecksum=false, metricRegistry=com.codahale.metrics.MetricRegistry@7d2c345d}, serverAddress=192.168.80.192:8082, initialServerList='192.168.80.192:8082', metricsReportPeriod=300}], initialServerList='192.168.80.192:8082', heartbeatOptions=null, useSharedRpcExecutor=false, readIndexCoreThreads=16, leaderStateTriggerCoreThreads=4, snapshotCoreThreads=1, snapshotMaxThreads=32, cliRpcCoreThreads=16, raftRpcCoreThreads=32, kvRpcCoreThreads=32, metricsReportPeriod=300, leastKeysOnSplit=10000}, initialServerList='192.168.80.192:8082', onlyLeaderRead=true, rpcOptions=RpcOptions{callbackExecutorCorePoolSize=16, callbackExecutorMaximumPoolSize=32, callbackExecutorQueueCapacity=512, rpcTimeoutMillis=5000}, failoverRetries=2, futureTimeoutMillis=30000, useParallelKVExecutor=true, batchingOptions=BatchingOptions{allowBatching=true, batchSize=100, bufSize=8192, maxWriteBytes=32768, maxReadBytes=1024}, useParallelCompress=false, compressThreads=4, deCompressThreads=5}.
2022-12-26 14:24:13.246 [Rpc-netty-server-worker-10-thread-1] WARN  BoltRaftRpcFactory:82 - JRaft SET bolt.rpc.dispatch-msg-list-in-default-executor to be false for replicator pipeline optimistic.
2022-12-26 14:24:13.279 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] INFO  LocalRaftMetaStorage:132 - Save raft meta, path=/opt/election/raft_data_region_-1_8082/meta, term=4, votedFor=0.0.0.0:0, cost time=18 ms
2022-12-26 14:24:13.281 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1951 - Node < test_election--1/192.168.80.192:8082> reject term_unmatched AppendEntriesRequest from 192.168.80.136:8082, term=4, prevLogIndex=1437, prevLogTerm=4, localPrevLogTerm=0, lastLogIndex=1234, entriesSize=0.
2022-12-26 14:24:13.281 [JRaft-FSMCaller-Disruptor-0] INFO  StateMachineAdapter:89 - onStartFollowing: LeaderChangeContext [leaderId=192.168.80.136:8082, term=4, status=Status[ENEWLEADER<10011>: Raft node receives message from new leader with higher term.]].
2022-12-26 14:24:13.282 [JRaft-FSMCaller-Disruptor-0] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-raft-state-trigger #0,5,main].
2022-12-26 14:24:13.298 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1951 - Node < test_election--1/192.168.80.192:8082> reject term_unmatched AppendEntriesRequest from 192.168.80.136:8082, term=4, prevLogIndex=1234, prevLogTerm=3, localPrevLogTerm=2, lastLogIndex=1234, entriesSize=0.
2022-12-26 14:24:13.320 [JRaft-LogManager-Disruptor-0] WARN  LogManagerImpl:533 - Truncating storage to lastIndexKept=1233.
2022-12-26 14:24:13.324 [JRaft-LogManager-Disruptor-0] ERROR RocksDBLogStorage:629 - Fail to truncateSuffix 1233 in data path:/opt/election/raft_data_region_-1_8082/log.
org.rocksdb.RocksDBException: end key comes before start key
        at org.rocksdb.RocksDB.deleteRange(Native Method) ~[rocksdbjni-6.22.1.1.jar!/:?]
        at org.rocksdb.RocksDB.deleteRange(RocksDB.java:1539) ~[rocksdbjni-6.22.1.1.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.RocksDBLogStorage.truncateSuffix(RocksDBLogStorage.java:624) [jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:534) [jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:488) [jraft-core-1.3.11.jar!/:?]
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) [disruptor-3.3.7.jar!/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-12-26 14:24:13.326 [JRaft-FSMCaller-Disruptor-0] ERROR StateMachineAdapter:72 - Encountered an error=Status[EIO<1014>: Failed operation in LogStorage] on StateMachine com.alipay.sofa.jraft.rhea.storage.KVStoreStateMachine, it's highly recommended to implement this method as raft stops working since some error occurs, you should figure out the cause and repair or remove this node.
com.alipay.sofa.jraft.error.RaftException: ERROR_TYPE_LOG
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:570) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.access$1200(LogManagerImpl.java:75) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:555) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:488) ~[jraft-core-1.3.11.jar!/:?]
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) [disruptor-3.3.7.jar!/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-12-26 14:24:13.327 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] INFO  Recyclers:52 - -Djraft.recyclers.maxCapacityPerThread: 4096.
2022-12-26 14:24:13.327 [JRaft-FSMCaller-Disruptor-0] WARN  NodeImpl:2498 - Node < test_election--1/192.168.80.192:8082> got error: Error [type=ERROR_TYPE_LOG, status=Status[EIO<1014>: Failed operation in LogStorage]].
2022-12-26 14:24:13.327 [JRaft-FSMCaller-Disruptor-0] WARN  FSMCallerImpl:348 - FSMCaller already in error status, ignore new error.
com.alipay.sofa.jraft.error.RaftException: ERROR_TYPE_LOG
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:570) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.access$1200(LogManagerImpl.java:75) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:555) ~[jraft-core-1.3.11.jar!/:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:488) ~[jraft-core-1.3.11.jar!/:?]
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) [disruptor-3.3.7.jar!/:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2022-12-26 14:24:13.332 [JRaft-FSMCaller-Disruptor-0] INFO  StateMachineAdapter:84 - onStopFollowing: LeaderChangeContext [leaderId=192.168.80.136:8082, term=4, status=Status[EBADNODE<10009>: Raft node(leader or candidate) is in error.]].
2022-12-26 14:24:13.332 [JRaft-FSMCaller-Disruptor-0] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-raft-state-trigger #1,5,main].
2022-12-26 14:24:13.660 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1893 - Node < test_election--1/192.168.80.192:8082> is not in active state, currTerm=4.
2022-12-26 14:24:13.838 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1893 - Node < test_election--1/192.168.80.192:8082> is not in active state, currTerm=4.
2022-12-26 14:24:14.164 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1893 - Node < test_election--1/192.168.80.192:8082> is not in active state, currTerm=4.
2022-12-26 14:24:14.341 [ test_election--1/PeerPair[192.168.80.192:8082 -> 192.168.80.136:8082]-AppendEntriesThread0] WARN  NodeImpl:1893 - Node < test_election--1/192.168.80.192:8082> is not in active state, currTerm=4.
2022-12-26 14:24:14.391 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #0,5,main].
2022-12-26 14:24:14.410 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #1,5,main].
2022-12-26 14:24:14.426 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #2,5,main].
2022-12-26 14:24:14.444 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #3,5,main].
2022-12-26 14:24:14.459 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #4,5,main].
2022-12-26 14:24:14.475 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #5,5,main].
2022-12-26 14:24:14.490 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #6,5,main].
2022-12-26 14:24:14.506 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #7,5,main].
2022-12-26 14:24:14.522 [Rpc-netty-server-worker-10-thread-2] INFO  NamedThreadFactory:82 - Creates new Thread[rheakv-cli-rpc-executor #8,5,main].

Modification:

Add onError to State Listener

Result:

@yuyang0423 yuyang0423 changed the title propagate onerror event to statelistener feat:propagate onerror event to statelistener Dec 27, 2022
Copy link
Contributor

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skippyb1
Copy link

@yuyang0423 麻烦问下 org.rocksdb.RocksDBException: end key comes before start key 这个异常是怎么产生的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants