Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/importccl: TestMultiNodeExportStmt failed #43247

Closed
cockroach-teamcity opened this issue Dec 17, 2019 · 14 comments
Closed

ccl/importccl: TestMultiNodeExportStmt failed #43247

cockroach-teamcity opened this issue Dec 17, 2019 · 14 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(ccl/importccl).TestMultiNodeExportStmt failed on master@f2aec3eebbedf65911a8697375015ac11e6a7f45:

Fatal error:

F191217 16:47:26.335308 282160 storage/replica_init.go:276  [n5,s5,r9/2:/Table/1{3-4}] range descriptor ID (50) does not match replica's range ID (9)

Stack:

goroutine 282160 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa4ab401, 0x0, 0x0, 0xda07a7)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xbf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9adda40, 0xc000000004, 0x922b1bf, 0x17, 0x114, 0xc004078ba0, 0x59)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:214 +0xc1c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6bae660, 0xc001c02900, 0xc000000004, 0x2, 0x5d3267c, 0x3f, 0xc001bba330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6bae660, 0xc001c02900, 0x1, 0x4, 0x5d3267c, 0x3f, 0xc001bba330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).setDesc(0xc005657600, 0x6bae660, 0xc001c02900, 0xc0030ac770)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_init.go:276 +0x907
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleDescResult(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:263
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc0056576d0, 0x6bae660, 0xc001c02900, 0xc009258fa0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1118 +0xa6f
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc0056576d0, 0x6beb5a0, 0xc006b81008, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1001 +0xb60
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x149c02613f50, 0xc0056578e8, 0xc001bbb048, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x129
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc001bbb578, 0x6bae660, 0xc000f7e9f0, 0x6beb660, 0xc005657888, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:281 +0x272
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc001bbb578, 0x6bae660, 0xc000f7e9f0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0x103
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc005657600, 0x6bae660, 0xc000f7e9f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:721 +0x10fa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc005657600, 0x6bae660, 0xc000f7e9f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:395 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc000a58e00, 0x6bae660, 0xc002182a20, 0x9)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_raft.go:487 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc004146f30, 0x6bae660, 0xc002182a20)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x31a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6bae660, 0xc002182a20)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0027fc550, 0xc001d03860, 0xc0027fc530)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xc4
Log preceding fatal error

W191217 16:47:24.574846 281285 storage/store_raft.go:496  [n4,s4,r23/2:/Table/2{7-8}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.584368 280757 storage/store_raft.go:496  [n3,s3,r23/3:/Table/2{7-8}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.590741 282160 storage/store_raft.go:496  [n5,s5,r9/2:/Table/1{3-4}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.627515 281327 storage/store_raft.go:496  [n4,s4,r49/2:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.630993 279371 storage/store_raft.go:496  [n1,s1,r9/1:/Table/1{3-4}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
I191217 16:47:24.652476 326734 storage/replica_command.go:1706  [n4,s4,r49/2:/{Table/53/1/1…-Max}] change replicas (add [(n3,s3):5VOTER_INCOMING] remove [(n2,s2):1VOTER_DEMOTING]): existing descriptor r49:/{Table/53/1/160-Max} [(n2,s2):1, (n4,s4):2, (n1,s1):4, (n3,s3):5LEARNER, next=6, gen=22, sticky=9223372036.854775807,2147483647]
W191217 16:47:24.653464 279374 storage/store_raft.go:496  [n1,s1,r49/4:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.655842 280735 storage/store_raft.go:496  [n3,s3,r49/5:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.657441 280381 storage/store_raft.go:496  [n2,s2,r49/1:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.660380 280388 storage/store_raft.go:496  [n2,s2,r9/4:/Table/1{3-4}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:24.662253 280644 storage/store_raft.go:496  [n3,s3,r2/5:/System/NodeLiveness{-Max}] handle raft ready: 0.8s [applied=2, batches=1, state_assertions=0]
W191217 16:47:24.667083 279378 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.7s [applied=2, batches=1, state_assertions=0]
W191217 16:47:24.673389 281329 storage/store_raft.go:496  [n4,s4,r2/4:/System/NodeLiveness{-Max}] handle raft ready: 0.7s [applied=2, batches=1, state_assertions=0]
W191217 16:47:24.678066 279368 storage/store_raft.go:496  [n1,s1,r3/1:/System/{NodeLive…-tsd}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
I191217 16:47:24.678771 333236 storage/replica_raft.go:247  [n5,s5,r9/2:/Table/1{3-4}] proposing SIMPLE(l5) ADD_REPLICA[(n1,s1):5LEARNER]: after=[(n3,s3):4 (n4,s4):2 (n5,s5):3 (n1,s1):5LEARNER] next=6
W191217 16:47:24.691402 282096 storage/store_raft.go:496  [n5,s5,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.8s [applied=2, batches=1, state_assertions=0]
W191217 16:47:24.961990 280337 storage/store_raft.go:496  [n2,s2,r2/3:/System/NodeLiveness{-Max}] handle raft ready: 1.1s [applied=2, batches=1, state_assertions=0]
W191217 16:47:25.166190 279442 storage/node_liveness.go:559  [n1,liveness-hb] slow heartbeat took 2.0s
W191217 16:47:25.288747 280730 storage/store_raft.go:496  [n3,s3,r50/4:/Table/53/1/{20-40}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
I191217 16:47:25.325157 333772 storage/raft_snapshot_queue.go:169  [n2,raftsnapshot,s2,r4/2:/System{/tsd-tse}] skipping snapshot; replica is likely a learner in the process of being added: (n1,s1):5LEARNER
I191217 16:47:25.349035 332736 storage/store_snapshot.go:977  [n2,replicate,s2,r4/2:/System{/tsd-tse}] sending LEARNER snapshot f8d6ebc2 at applied index 504
W191217 16:47:25.473145 281279 storage/store_raft.go:496  [n4,s4,r50/2:/Table/53/1/{20-40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.476041 282119 storage/store_raft.go:496  [n5,s5,r50/3:/Table/53/1/{20-40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.546945 281317 storage/store_raft.go:496  [n4,s4,r2/4:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.552378 280639 storage/store_raft.go:496  [n3,s3,r2/5:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.578265 279364 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.589089 282169 storage/store_raft.go:496  [n5,s5,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.615483 282160 storage/store_raft.go:496  [n5,s5,r9/2:/Table/1{3-4}] handle raft ready: 0.7s [applied=0, batches=0, state_assertions=0]
I191217 16:47:25.772094 333881 storage/raft_snapshot_queue.go:169  [n2,raftsnapshot,s2,r4/2:/System{/tsd-tse}] skipping snapshot; replica is likely a learner in the process of being added: (n1,s1):5LEARNER
W191217 16:47:25.789131 280337 storage/store_raft.go:496  [n2,s2,r2/3:/System/NodeLiveness{-Max}] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.813104 279372 storage/store_raft.go:496  [n1,s1,r1/1:/{Min-System/NodeL…}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.856392 282117 storage/store_raft.go:496  [n5,s5,r1/2:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
I191217 16:47:25.890453 281352 server/status/runtime.go:498  [n4] runtime stats: 2.3 GiB RSS, 2011 goroutines, 100 MiB/65 MiB/205 MiB GO alloc/idle/total, 229 MiB/285 MiB CGO alloc/total, 846.8 CGO/sec, 229.3/23.6 %(u/s)time, 0.4 %gc (7x), 2.3 MiB/2.3 MiB (r/w)net
W191217 16:47:25.906539 280641 storage/store_raft.go:496  [n3,s3,r1/4:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.915445 280326 storage/store_raft.go:496  [n2,s2,r1/5:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 16:47:25.958242 281321 storage/store_raft.go:496  [n4,s4,r1/3:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 16:47:26.000558 279368 storage/store_raft.go:496  [n1,s1,r3/1:/System/{NodeLive…-tsd}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 16:47:26.062801 281358 storage/node_liveness.go:559  [n4,liveness-hb] slow heartbeat took 2.7s
W191217 16:47:26.191796 282138 storage/store_raft.go:496  [n5,s5,r7/5:/Table/1{1-2}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
W191217 16:47:26.227317 280743 storage/store_raft.go:496  [n3,s3,r49/5:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
W191217 16:47:26.266267 281318 storage/store_raft.go:496  [n4,s4,r7/3:/Table/1{1-2}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
W191217 16:47:26.304410 279413 storage/store_raft.go:496  [n1,s1,r7/6:/Table/1{1-2}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
W191217 16:47:26.288328 279399 storage/store_raft.go:496  [n1,s1,r49/4:/{Table/53/1/1…-Max}] handle raft ready: 0.6s [applied=2, batches=1, state_assertions=0]
W191217 16:47:26.298672 280369 storage/store_raft.go:496  [n2,s2,r7/4:/Table/1{1-2}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
W191217 16:47:26.301634 280345 storage/store_raft.go:496  [n2,s2,r49/1:/{Table/53/1/1…-Max}] handle raft ready: 0.6s [applied=2, batches=1, state_assertions=0]
W191217 16:47:26.311807 280778 storage/store_raft.go:496  [n3,s3,r7/2:/Table/1{1-2}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
W191217 16:47:26.321983 281323 storage/store_raft.go:496  [n4,s4,r2/4:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:26.325091 279358 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:26.327961 280734 storage/store_raft.go:496  [n3,s3,r2/5:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 16:47:26.331549 282166 storage/store_raft.go:496  [n5,s5,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 17, 2019
@cockroach-teamcity cockroach-teamcity added this to the 20.1 milestone Dec 17, 2019
@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@cb8cd590d7ed8666ab1aac053ee0e05cfe503a3d:

Fatal error:

F191217 17:15:37.283443 278475 storage/replica_init.go:276  [n1,s1,r9/1:/Table/1{3-4}] range descriptor ID (59) does not match replica's range ID (9)

Stack:

goroutine 278475 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa4ac401, 0x0, 0x0, 0xda07a7)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xbf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9adea40, 0xc000000004, 0x922bbb7, 0x17, 0x114, 0xc0008c0060, 0x59)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:214 +0xc1c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6baefa0, 0xc004534ff0, 0xc000000004, 0x2, 0x5d329bc, 0x3f, 0xc005838330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6baefa0, 0xc004534ff0, 0x1, 0x4, 0x5d329bc, 0x3f, 0xc005838330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).setDesc(0xc0009a2d00, 0x6baefa0, 0xc004534ff0, 0xc004cf8310)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_init.go:276 +0x907
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleDescResult(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:263
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc0009a2dd0, 0x6baefa0, 0xc004534ff0, 0xc0063ee640, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1118 +0xa6f
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc0009a2dd0, 0x6bebee0, 0xc00610e008, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1001 +0xb60
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x1490a4a0d478, 0xc0009a2fe8, 0xc005839048, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x129
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc005839578, 0x6baefa0, 0xc00199cb70, 0x6bebfa0, 0xc0009a2f88, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:281 +0x272
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc005839578, 0x6baefa0, 0xc00199cb70, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0x103
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc0009a2d00, 0x6baefa0, 0xc00199cb70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:721 +0x10fa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc0009a2d00, 0x6baefa0, 0xc00199cb70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:395 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc00074c000, 0x6baefa0, 0xc004636960, 0x9)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_raft.go:487 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc001c70000, 0x6baefa0, 0xc004636960)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x31a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6baefa0, 0xc004636960)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc00033c0e0, 0xc0039cedc0, 0xc00033c0c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xc4
Log preceding fatal error

W191217 17:15:32.141445 280562 storage/store_raft.go:496  [n4,s4,r29/6:/Table/53{-/1/40}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.182272 280558 storage/store_raft.go:496  [n4,s4,r2/4:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.183679 281355 storage/store_raft.go:496  [n5,s5,r2/3:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.185415 280557 storage/store_raft.go:496  [n4,s4,r1/2:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=0, batches=0, state_assertions=0]
W191217 17:15:32.198040 279965 storage/store_raft.go:496  [n3,s3,r2/5:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.200383 279631 storage/store_raft.go:496  [n2,s2,r29/5:/Table/53{-/1/40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.224037 279950 storage/store_raft.go:496  [n3,s3,r34/3:/Table/53/1/{60-80}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.225097 278656 storage/store_raft.go:496  [n1,s1,r1/1:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=0, batches=0, state_assertions=0]
W191217 17:15:32.243226 281271 storage/store_raft.go:496  [n5,s5,r1/4:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=0, batches=0, state_assertions=0]
W191217 17:15:32.268029 279943 storage/store_raft.go:496  [n3,s3,r1/5:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=0, batches=0, state_assertions=0]
W191217 17:15:32.299570 279634 storage/store_raft.go:496  [n2,s2,r1/3:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=0, batches=0, state_assertions=0]
W191217 17:15:32.309685 279643 storage/store_raft.go:496  [n2,s2,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.322178 280572 storage/store_raft.go:496  [n4,s4,r34/4:/Table/53/1/{60-80}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.553928 281267 storage/store_raft.go:496  [n5,s5,r3/2:/System/{NodeLive…-tsd}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.595920 279953 storage/store_raft.go:496  [n3,s3,r29/4:/Table/53{-/1/40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.602091 281422 storage/node_liveness.go:559  [n5,liveness-hb] slow heartbeat took 1.7s
W191217 17:15:32.617736 279983 storage/store_raft.go:496  [n3,s3,r3/5:/System/{NodeLive…-tsd}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.636516 278557 storage/store_raft.go:496  [n1,s1,r3/1:/System/{NodeLive…-tsd}] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.638121 280556 storage/store_raft.go:496  [n4,s4,r3/3:/System/{NodeLive…-tsd}] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
W191217 17:15:32.646567 279624 storage/store_raft.go:496  [n2,s2,r3/4:/System/{NodeLive…-tsd}] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
I191217 17:15:32.725865 325913 storage/replica_raft.go:247  [n3,s3,r35/3:/Table/53/1/{40-60}] proposing SIMPLE(l4) ADD_REPLICA[(n5,s5):4LEARNER]: after=[(n2,s2):1 (n4,s4):2 (n3,s3):3 (n5,s5):4LEARNER] next=5
W191217 17:15:32.823281 279643 storage/store_raft.go:496  [n2,s2,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 17:15:33.036150 279927 storage/node_liveness.go:559  [n3,liveness-hb] slow heartbeat took 1.3s
W191217 17:15:33.148880 279951 storage/store_raft.go:496  [n3,s3,r35/3:/Table/53/1/{40-60}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 17:15:33.257441 279605 storage/store_raft.go:496  [n2,s2,r35/1:/Table/53/1/{40-60}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 17:15:33.261929 280566 storage/store_raft.go:496  [n4,s4,r35/2:/Table/53/1/{40-60}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
I191217 17:15:33.962849 326920 storage/raft_snapshot_queue.go:169  [n3,raftsnapshot,s3,r35/3:/Table/53/1/{40-60}] skipping snapshot; replica is likely a learner in the process of being added: (n5,s5):4LEARNER
I191217 17:15:34.004115 325913 storage/store_snapshot.go:977  [n3,replicate,s3,r35/3:/Table/53/1/{40-60}] sending LEARNER snapshot 583256b7 at applied index 19
I191217 17:15:34.007046 325913 storage/store_snapshot.go:1020  [n3,replicate,s3,r35/3:/Table/53/1/{40-60}] streamed snapshot to (n5,s5):4LEARNER: kv pairs: 26, log entries: 0, rate-limit: 8.0 MiB/sec, 0.15s
I191217 17:15:34.190857 326951 storage/replica_raftstorage.go:798  [n5,s5,r35/4:{-}] applying LEARNER snapshot [id=583256b7 index=19]
I191217 17:15:34.551123 326951 storage/replica_raftstorage.go:819  [n5,s5,r35/4:/Table/53/1/{40-60}] applied LEARNER snapshot [total=360ms ingestion=4@355ms id=583256b7 index=19]
I191217 17:15:35.040179 278526 gossip/gossip.go:566  [n1] gossip status (ok, 5 nodes)
gossip client (0/3 cur/max conns)
gossip server (3/3 cur/max conns, infos 3816/1215 sent/received, bytes 886300B/310479B sent/received)
  2: 127.0.0.1:40021 (3m58s)
  3: 127.0.0.1:34417 (3m57s)
  4: 127.0.0.1:39343 (3m56s)
I191217 17:15:35.200001 278530 server/status/runtime.go:498  [n1] runtime stats: 2.1 GiB RSS, 1986 goroutines, 120 MiB/40 MiB/204 MiB GO alloc/idle/total, 244 MiB/301 MiB CGO alloc/total, 883.7 CGO/sec, 224.1/20.7 %(u/s)time, 0.5 %gc (6x), 2.5 MiB/2.5 MiB (r/w)net
I191217 17:15:35.206847 327033 storage/replica_command.go:1706  [n4,replicate,s4,r35/2:/Table/53/1/{40-60}] change replicas (add [] remove [(n5,s5):4LEARNER]): existing descriptor r35:/Table/53/1/{40-60} [(n2,s2):1, (n4,s4):2, (n3,s3):3, (n5,s5):4LEARNER, next=5, gen=23, sticky=9223372036.854775807,2147483647]
W191217 17:15:36.154722 278648 storage/store_raft.go:496  [n1,s1,r4/1:/System{/tsd-tse}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]
I191217 17:15:36.343395 327169 storage/replica_command.go:1706  [n3,s3,r37/3:/Table/53/1/1{20-40}] change replicas (add [(n1,s1):4LEARNER] remove []): existing descriptor r37:/Table/53/1/1{20-40} [(n2,s2):1, (n5,s5):2, (n3,s3):3, next=4, gen=19, sticky=9223372036.854775807,2147483647]
I191217 17:15:36.353667 281320 storage/compactor/compactor.go:325  [n5,s5,compactor] purging suggested compaction for range /Table/53 - /Table/53/1/60 that contains live data
I191217 17:15:36.872193 279670 gossip/gossip.go:566  [n2] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:39761 (4m0s: infos 423/1247 sent/received, bytes 90654B/309771B sent/received)
gossip server (0/3 cur/max conns, infos 423/1247 sent/received, bytes 90654B/309771B sent/received)
I191217 17:15:36.873689 327272 storage/replica_command.go:406  [n3,s3,r29/4:/Table/53{-/1/40}] initiating a split of this range at key /Table/53/1/20 [r38] (manual)
I191217 17:15:36.906770 327233 storage/replica_raft.go:247  [n1,s1,r9/1:/Table/1{3-4}] proposing SIMPLE(l4) ADD_REPLICA[(n5,s5):4LEARNER]: after=[(n2,s2):1 (n4,s4):2 (n3,s3):3 (n5,s5):4LEARNER] next=5
W191217 17:15:37.044380 278648 storage/store_raft.go:496  [n1,s1,r4/1:/System{/tsd-tse}] handle raft ready: 0.8s [applied=3, batches=1, state_assertions=0]
W191217 17:15:37.100644 280558 storage/store_raft.go:496  [n4,s4,r4/2:/System{/tsd-tse}] handle raft ready: 0.6s [applied=3, batches=1, state_assertions=0]

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@fd0282914f25a71eaeb28df266016d984a8c41f1:

Fatal error:

F191217 20:04:20.876024 278983 storage/replica_init.go:276  [n4,s4,r9/2:/Table/1{3-4}] range descriptor ID (35) does not match replica's range ID (9)

Stack:

goroutine 278983 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa4ae401, 0x0, 0x0, 0xda07a7)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xbf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9ae0ac0, 0xc000000004, 0x922dac7, 0x17, 0x114, 0xc0025bd320, 0x59)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:214 +0xc1c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6bb06e0, 0xc009742420, 0xc000000004, 0x2, 0x5d33c31, 0x3f, 0xc006564330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6bb06e0, 0xc009742420, 0x1, 0x4, 0x5d33c31, 0x3f, 0xc006564330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).setDesc(0xc001ddc900, 0x6bb06e0, 0xc009742420, 0xc00577bea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_init.go:276 +0x907
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleDescResult(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:263
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc001ddc9d0, 0x6bb06e0, 0xc009742420, 0xc0052d6a00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1118 +0xa6f
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc001ddc9d0, 0x6bed680, 0xc001394008, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1001 +0xb60
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x14af5820c3b0, 0xc001ddcbe8, 0xc006565048, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x129
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc006565578, 0x6bb06e0, 0xc004e28480, 0x6bed740, 0xc001ddcb88, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:281 +0x272
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc006565578, 0x6bb06e0, 0xc004e28480, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0x103
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc001ddc900, 0x6bb06e0, 0xc004e28480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:721 +0x10fa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc001ddc900, 0x6bb06e0, 0xc004e28480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:395 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc003f1a300, 0x6bb06e0, 0xc006efdf80, 0x9)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_raft.go:487 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc000e88750, 0x6bb06e0, 0xc006efdf80)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x31a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6bb06e0, 0xc006efdf80)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0014ec120, 0xc0031d7400, 0xc0014ec110)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xc4
Log preceding fatal error

W191217 20:04:18.714827 278324 storage/store_raft.go:496  [n3,s3,r33/3:/Table/53/1/1{00-40}] handle raft ready: 1.1s [applied=1, batches=1, state_assertions=1]
W191217 20:04:18.721138 276930 storage/store_raft.go:496  [n1,s1,r33/1:/Table/53/1/1{00-40}] handle raft ready: 1.1s [applied=1, batches=1, state_assertions=1]
I191217 20:04:18.733276 320920 storage/replica_raftstorage.go:798  [n4,s4,r59/4:{-}] applying LEARNER snapshot [id=e531386e index=18]
I191217 20:04:18.900306 316201 storage/replica_command.go:1706  [n1,s1,r61/1:/Table/53/1/1{40-60}] change replicas (add [(n5,s5):4LEARNER] remove []): existing descriptor r61:/Table/53/1/1{40-60} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=18, sticky=9223372036.854775807,2147483647]
W191217 20:04:18.921761 276999 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=2, batches=1, state_assertions=0]
I191217 20:04:19.261377 320920 storage/replica_raftstorage.go:819  [n4,s4,r59/4:/Table/53/1/{80-100}] applied LEARNER snapshot [total=527ms ingestion=4@222ms id=e531386e index=18]
W191217 20:04:19.277970 278415 storage/node_liveness.go:559  [n3,liveness-hb] slow heartbeat took 1.2s
W191217 20:04:19.340832 276931 storage/replica_application_result.go:153  [n1,s1,r35/1:/Table/53/1/{40-60}] failed to repropose with new lease index: [NotLeaseHolderError] r35: replica (n1,s1):1 not lease holder; current lease is repl=(n1,s1):1 seq=3 start=1576613055.784410670,0 epo=1 pro=1576613055.784426032,0
W191217 20:04:19.385395 277039 storage/store_raft.go:496  [n1,s1,r9/1:/Table/1{3-4}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.421008 278984 storage/store_raft.go:496  [n4,s4,r9/2:/Table/1{3-4}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.428108 279619 storage/store_raft.go:496  [n5,s5,r9/3:/Table/1{3-4}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.458052 278337 storage/store_raft.go:496  [n3,s3,r9/4:/Table/1{3-4}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.465748 276927 storage/store_raft.go:496  [n1,s1,r1/1:/{Min-System/NodeL…}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.475659 277936 storage/store_raft.go:496  [n2,s2,r9/5:/Table/1{3-4}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.543420 279018 storage/store_raft.go:496  [n4,s4,r1/4:/{Min-System/NodeL…}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.561783 278341 storage/store_raft.go:496  [n3,s3,r1/5:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.564994 277940 storage/store_raft.go:496  [n2,s2,r1/2:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.570241 279716 storage/store_raft.go:496  [n5,s5,r1/3:/{Min-System/NodeL…}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.631487 278973 storage/store_raft.go:496  [n4,s4,r2/3:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.648594 277994 storage/store_raft.go:496  [n2,s2,r2/5:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.648894 278350 storage/store_raft.go:496  [n3,s3,r35/3:/Table/53/1/{40-60}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.661895 278316 storage/store_raft.go:496  [n3,s3,r2/2:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.664676 277990 storage/store_raft.go:496  [n2,s2,r35/2:/Table/53/1/{40-60}] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.666115 278374 storage/store_raft.go:496  [n3,s3,r60/3:/{Table/53/1/1…-Max}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.673609 279703 storage/store_raft.go:496  [n5,s5,r2/4:/System/NodeLiveness{-Max}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.684094 277989 storage/store_raft.go:496  [n2,s2,r60/2:/{Table/53/1/1…-Max}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.712065 277995 storage/store_raft.go:496  [n2,s2,r29/2:/Table/53{-/1/40}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.745183 276997 storage/store_raft.go:496  [n1,s1,r29/1:/Table/53{-/1/40}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.747528 278295 storage/store_raft.go:496  [n3,s3,r29/3:/Table/53{-/1/40}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
W191217 20:04:19.752441 276992 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
I191217 20:04:19.804227 319760 storage/replica_command.go:1706  [n1,replicate,s1,r59/1:/Table/53/1/{80-100}] change replicas (add [(n4,s4):4VOTER_INCOMING] remove [(n2,s2):2VOTER_DEMOTING]): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n4,s4):4LEARNER, next=5, gen=19, sticky=9223372036.854775807,2147483647]
W191217 20:04:20.088018 279601 storage/store_raft.go:496  [n5,s5,r9/3:/Table/1{3-4}] handle raft ready: 0.6s [applied=4, batches=1, state_assertions=0]
W191217 20:04:20.116461 278358 storage/store_raft.go:496  [n3,s3,r9/4:/Table/1{3-4}] handle raft ready: 0.7s [applied=4, batches=1, state_assertions=0]
W191217 20:04:20.130833 277041 storage/store_raft.go:496  [n1,s1,r9/1:/Table/1{3-4}] handle raft ready: 0.7s [applied=4, batches=1, state_assertions=0]
W191217 20:04:20.136702 278984 storage/store_raft.go:496  [n4,s4,r9/2:/Table/1{3-4}] handle raft ready: 0.7s [applied=4, batches=1, state_assertions=0]
W191217 20:04:20.153616 277936 storage/store_raft.go:496  [n2,s2,r9/5:/Table/1{3-4}] handle raft ready: 0.7s [applied=4, batches=1, state_assertions=0]
W191217 20:04:20.275800 277978 storage/store_raft.go:496  [n2,s2,r61/2:/Table/53/1/1{40-60}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:20.284769 278295 storage/store_raft.go:496  [n3,s3,r29/3:/Table/53{-/1/40}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W191217 20:04:20.324775 278319 storage/store_raft.go:496  [n3,s3,r60/3:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=1]
W191217 20:04:20.350971 277989 storage/store_raft.go:496  [n2,s2,r60/2:/{Table/53/1/1…-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=1]
W191217 20:04:20.395222 278355 storage/store_raft.go:496  [n3,s3,r61/3:/Table/53/1/1{40-60}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:20.427314 277013 storage/store_raft.go:496  [n1,s1,r29/1:/Table/53{-/1/40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:20.473688 277932 storage/store_raft.go:496  [n2,s2,r29/2:/Table/53{-/1/40}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W191217 20:04:20.721611 278339 storage/store_raft.go:496  [n3,s3,r9/4:/Table/1{3-4}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
W191217 20:04:20.722262 277979 storage/store_raft.go:496  [n2,s2,r9/5:/Table/1{3-4}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
W191217 20:04:20.724880 279701 storage/store_raft.go:496  [n5,s5,r9/3:/Table/1{3-4}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
W191217 20:04:20.733282 276916 storage/store_raft.go:496  [n1,s1,r9/1:/Table/1{3-4}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
W191217 20:04:20.735852 278984 storage/store_raft.go:496  [n4,s4,r9/2:/Table/1{3-4}] handle raft ready: 0.6s [applied=2, batches=1, state_assertions=0]
W191217 20:04:20.739805 277023 storage/store_raft.go:496  [n1,s1,r2/1:/System/NodeLiveness{-Max}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
I191217 20:04:20.751863 321080 storage/replica_raft.go:247  [n4,s4,r9/2:/Table/1{3-4}] proposing SIMPLE(l4) ADD_REPLICA[(n5,s5):4LEARNER]: after=[(n1,s1):1 (n2,s2):2 (n3,s3):3 (n5,s5):4LEARNER] next=5

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@14dfb3d191ad3e4a4198358cefd183f7d0ae0504:

Fatal error:

F191219 15:59:51.091512 306801 storage/replica_init.go:273  [n4,s4,r9/2:/Table/1{3-4}] range descriptor ID (59) does not match replica's range ID (9)

Stack:

goroutine 306801 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa561e01, 0x0, 0x0, 0xda19e7)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xbf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9b94440, 0xc000000004, 0x92e07e2, 0x17, 0x111, 0xc007300180, 0x59)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:214 +0xc1c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6c2f5a0, 0xc0058bbf50, 0xc000000004, 0x2, 0x5da2663, 0x3f, 0xc0014b8330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6c2f5a0, 0xc0058bbf50, 0x1, 0x4, 0x5da2663, 0x3f, 0xc0014b8330, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).setDesc(0xc007d27200, 0x6c2f5a0, 0xc0058bbf50, 0xc0007283f0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_init.go:273 +0x907
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleDescResult(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:263
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc007d272d0, 0x6c2f5a0, 0xc0058bbf50, 0xc000e50be0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1118 +0xa6f
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc007d272d0, 0x6c6cb20, 0xc0059d9008, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:1001 +0xb60
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x14c43f468ad8, 0xc007d274e8, 0xc0014b9048, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x129
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc0014b9578, 0x6c2f5a0, 0xc00173fb60, 0x6c6cbe0, 0xc007d27488, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:281 +0x272
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc0014b9578, 0x6c2f5a0, 0xc00173fb60, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:247 +0x103
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc007d27200, 0x6c2f5a0, 0xc00173fb60, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:721 +0x10fa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc007d27200, 0x6c2f5a0, 0xc00173fb60, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:395 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc00572d100, 0x6c2f5a0, 0xc001de3f50, 0x9)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_raft.go:487 +0x1a0
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc001455ef0, 0x6c2f5a0, 0xc001de3f50)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:226 +0x31a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x6c2f5a0, 0xc001de3f50)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:166 +0x56
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc00032fae0, 0xc00180e780, 0xc00032fad0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x160
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:190 +0xc4
Log preceding fatal error

gossip server (1/3 cur/max conns, infos 1316/1062 sent/received, bytes 366014B/230235B sent/received)
  5: 127.0.0.1:39599 (2m56s)
I191219 15:59:46.210616 306281 server/status/runtime.go:498  [n3] runtime stats: 1.9 GiB RSS, 1980 goroutines, 120 MiB/47 MiB/205 MiB GO alloc/idle/total, 215 MiB/265 MiB CGO alloc/total, 1016.2 CGO/sec, 270.4/24.6 %(u/s)time, 0.4 %gc (8x), 1.7 MiB/1.7 MiB (r/w)net
I191219 15:59:46.241091 342755 storage/replica_raft.go:247  [n3,s3,r49/2:/{Table/53/1/1…-Max}] proposing SIMPLE(l4) ADD_REPLICA[(n5,s5):4LEARNER]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3 (n5,s5):4LEARNER] next=5
I191219 15:59:46.461494 343004 storage/replica_command.go:1706  [n1,replicate,s1,r59/1:/Table/53/1/{80-100}] change replicas (add [(n5,s5):4LEARNER] remove []): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n3,s3):2, (n4,s4):3, next=4, gen=18, sticky=9223372036.854775807,2147483647]
I191219 15:59:46.507213 338611 storage/replica_command.go:1706  [n1,s1,r59/1:/Table/53/1/{80-100}] change replicas (add [(n2,s2):4LEARNER] remove []): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n3,s3):2, (n4,s4):3, next=4, gen=18, sticky=9223372036.854775807,2147483647]
I191219 15:59:46.619015 342755 storage/store_snapshot.go:977  [n3,s3,r49/2:/{Table/53/1/1…-Max}] sending LEARNER snapshot 58e6eade at applied index 15
I191219 15:59:46.622960 343065 storage/raft_snapshot_queue.go:169  [n3,raftsnapshot,s3,r49/2:/{Table/53/1/1…-Max}] skipping snapshot; replica is likely a learner in the process of being added: (n5,s5):4LEARNER
I191219 15:59:46.645530 342755 storage/store_snapshot.go:1020  [n3,s3,r49/2:/{Table/53/1/1…-Max}] streamed snapshot to (n5,s5):4LEARNER: kv pairs: 46, log entries: 0, rate-limit: 8.0 MiB/sec, 0.06s
I191219 15:59:46.678560 342995 storage/replica_raftstorage.go:798  [n5,s5,r49/4:{-}] applying LEARNER snapshot [id=58e6eade index=15]
I191219 15:59:46.952451 342995 storage/replica_raftstorage.go:819  [n5,s5,r49/4:/{Table/53/1/1…-Max}] applied LEARNER snapshot [total=272ms ingestion=4@130ms id=58e6eade index=15]
I191219 15:59:47.139678 342755 storage/replica_command.go:1706  [n3,s3,r49/2:/{Table/53/1/1…-Max}] change replicas (add [(n5,s5):4VOTER_INCOMING] remove [(n4,s4):3VOTER_DEMOTING]): existing descriptor r49:/{Table/53/1/160-Max} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n5,s5):4LEARNER, next=5, gen=18, sticky=9223372036.854775807,2147483647]
I191219 15:59:47.467481 343004 storage/replica_raft.go:247  [n1,s1,r59/1:/Table/53/1/{80-100}] proposing SIMPLE(l4) ADD_REPLICA[(n5,s5):4LEARNER]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3 (n5,s5):4LEARNER] next=5
I191219 15:59:47.742000 343004 storage/store_snapshot.go:977  [n1,replicate,s1,r59/1:/Table/53/1/{80-100}] sending LEARNER snapshot 56c198ca at applied index 14
I191219 15:59:47.744313 343004 storage/store_snapshot.go:1020  [n1,replicate,s1,r59/1:/Table/53/1/{80-100}] streamed snapshot to (n5,s5):4LEARNER: kv pairs: 26, log entries: 0, rate-limit: 8.0 MiB/sec, 0.11s
I191219 15:59:47.754836 343353 storage/raft_snapshot_queue.go:169  [n1,raftsnapshot,s1,r59/1:/Table/53/1/{80-100}] skipping snapshot; replica is likely a learner in the process of being added: (n5,s5):4LEARNER
I191219 15:59:47.771859 343415 storage/replica_raftstorage.go:798  [n5,s5,r59/4:{-}] applying LEARNER snapshot [id=56c198ca index=14]
I191219 15:59:47.803656 343415 storage/replica_raftstorage.go:819  [n5,s5,r59/4:/Table/53/1/{80-100}] applied LEARNER snapshot [total=31ms ingestion=4@18ms id=56c198ca index=14]
I191219 15:59:47.820433 306838 gossip/gossip.go:566  [n4] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:44169 (3m1s: infos 246/974 sent/received, bytes 57911B/256635B sent/received)
gossip server (0/3 cur/max conns, infos 246/974 sent/received, bytes 57911B/256635B sent/received)
I191219 15:59:48.036627 306842 server/status/runtime.go:498  [n4] runtime stats: 1.9 GiB RSS, 1983 goroutines, 116 MiB/51 MiB/205 MiB GO alloc/idle/total, 211 MiB/257 MiB CGO alloc/total, 1266.3 CGO/sec, 298.9/27.6 %(u/s)time, 0.4 %gc (9x), 1.8 MiB/1.8 MiB (r/w)net
I191219 15:59:48.047147 343004 storage/replica_command.go:1706  [n1,replicate,s1,r59/1:/Table/53/1/{80-100}] change replicas (add [(n5,s5):4VOTER_INCOMING] remove [(n4,s4):3VOTER_DEMOTING]): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n5,s5):4LEARNER, next=5, gen=19, sticky=9223372036.854775807,2147483647]
I191219 15:59:48.320857 343374 storage/replica_command.go:1706  [n3,s3,r35/2:/Table/53/1/{40-60}] change replicas (add [(n2,s2):4LEARNER] remove []): existing descriptor r35:/Table/53/1/{40-60} [(n1,s1):1, (n3,s3):2, (n4,s4):3, next=4, gen=18, sticky=9223372036.854775807,2147483647]
I191219 15:59:48.379167 342755 storage/replica_raft.go:247  [n3,s3,r49/2:/{Table/53/1/1…-Max}] proposing ENTER_JOINT(r3 l3 v4) ADD_REPLICA[(n5,s5):4VOTER_INCOMING], REMOVE_REPLICA[(n4,s4):3VOTER_DEMOTING]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3VOTER_DEMOTING (n5,s5):4VOTER_INCOMING] next=5
I191219 15:59:48.487502 342755 storage/replica_command.go:1706  [n3,s3,r49/2:/{Table/53/1/1…-Max}] change replicas (add [] remove []): existing descriptor r49:/{Table/53/1/160-Max} [(n1,s1):1, (n3,s3):2, (n4,s4):3VOTER_DEMOTING, (n5,s5):4VOTER_INCOMING, next=5, gen=19, sticky=9223372036.854775807,2147483647]
I191219 15:59:48.834572 342755 storage/replica_raft.go:247  [n3,s3,r49/2:/{Table/53/1/1…-Max}] proposing LEAVE_JOINT: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3LEARNER (n5,s5):4] next=5
I191219 15:59:48.961827 342755 storage/replica_command.go:1706  [n3,s3,r49/2:/{Table/53/1/1…-Max}] change replicas (add [] remove [(n4,s4):3LEARNER]): existing descriptor r49:/{Table/53/1/160-Max} [(n1,s1):1, (n3,s3):2, (n4,s4):3LEARNER, (n5,s5):4, next=5, gen=20, sticky=9223372036.854775807,2147483647]
I191219 15:59:49.033923 343511 storage/replica_command.go:1706  [n4,replicate,s4,r59/3:/Table/53/1/{80-100}] change replicas (add [] remove [(n5,s5):4LEARNER]): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n5,s5):4LEARNER, next=5, gen=19, sticky=9223372036.854775807,2147483647]
I191219 15:59:49.090585 343374 storage/replica_raft.go:247  [n3,s3,r35/2:/Table/53/1/{40-60}] proposing SIMPLE(l4) ADD_REPLICA[(n2,s2):4LEARNER]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3 (n2,s2):4LEARNER] next=5
I191219 15:59:49.310302 343505 storage/raft_snapshot_queue.go:169  [n3,raftsnapshot,s3,r35/2:/Table/53/1/{40-60}] skipping snapshot; replica is likely a learner in the process of being added: (n2,s2):4LEARNER
I191219 15:59:49.349941 343374 storage/store_snapshot.go:977  [n3,s3,r35/2:/Table/53/1/{40-60}] sending LEARNER snapshot 7355f8a8 at applied index 17
I191219 15:59:49.386127 343374 storage/store_snapshot.go:1020  [n3,s3,r35/2:/Table/53/1/{40-60}] streamed snapshot to (n2,s2):4LEARNER: kv pairs: 26, log entries: 0, rate-limit: 8.0 MiB/sec, 0.16s
I191219 15:59:49.429949 343717 storage/replica_raftstorage.go:798  [n2,s2,r35/4:{-}] applying LEARNER snapshot [id=7355f8a8 index=17]
I191219 15:59:49.542489 342755 storage/replica_raft.go:247  [n3,s3,r49/2:/{Table/53/1/1…-Max}] proposing SIMPLE(r3) REMOVE_REPLICA[(n4,s4):3LEARNER]: after=[(n1,s1):1 (n3,s3):2 (n5,s5):4] next=5
I191219 15:59:49.644872 343717 storage/replica_raftstorage.go:819  [n2,s2,r35/4:/Table/53/1/{40-60}] applied LEARNER snapshot [total=214ms ingestion=4@194ms id=7355f8a8 index=17]
I191219 15:59:49.658478 306707 storage/store_remove_replica.go:129  [n4,s4,r49/3:/{Table/53/1/1…-Max}] removing replica r49/3
I191219 15:59:49.724806 343374 storage/replica_command.go:1706  [n3,s3,r35/2:/Table/53/1/{40-60}] change replicas (add [(n2,s2):4VOTER_INCOMING] remove [(n4,s4):3VOTER_DEMOTING]): existing descriptor r35:/Table/53/1/{40-60} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n2,s2):4LEARNER, next=5, gen=19, sticky=9223372036.854775807,2147483647]
I191219 15:59:50.079943 307656 gossip/gossip.go:566  [n5] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  3: 127.0.0.1:42291 (3m0s: infos 151/1061 sent/received, bytes 37611B/253750B sent/received)
gossip server (0/3 cur/max conns, infos 157/1095 sent/received, bytes 38942B/270222B sent/received)
I191219 15:59:50.145180 343511 storage/replica_raft.go:247  [n4,s4,r59/3:/Table/53/1/{80-100}] proposing SIMPLE(r4) REMOVE_REPLICA[(n5,s5):4LEARNER]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3] next=5
I191219 15:59:50.194004 343809 storage/replica_command.go:406  [n3,s3,r32/2:/Table/53/1/1{00-60}] initiating a split of this range at key /Table/53/1/140 [r50] (manual)
I191219 15:59:50.329055 343511 storage/replica_command.go:1706  [n4,replicate,s4,r59/3:/Table/53/1/{80-100}] change replicas (add [(n2,s2):5LEARNER] remove []): existing descriptor r59:/Table/53/1/{80-100} [(n1,s1):1, (n3,s3):2, (n4,s4):3, next=5, gen=20, sticky=9223372036.854775807,2147483647]
I191219 15:59:50.344990 338611 storage/replica_command.go:406  [n1,s1,r49/1:/{Table/53/1/1…-Max}] initiating a split of this range at key /Table/53/1/180 [r60] (manual)
I191219 15:59:50.369516 307590 storage/store_remove_replica.go:129  [n5,s5,r59/4:/Table/53/1/{80-100}] removing replica r59/4
I191219 15:59:50.388359 307643 server/status/runtime.go:498  [n5] runtime stats: 1.9 GiB RSS, 1989 goroutines, 123 MiB/41 MiB/205 MiB GO alloc/idle/total, 215 MiB/264 MiB CGO alloc/total, 1364.8 CGO/sec, 314.1/31.3 %(u/s)time, 0.3 %gc (9x), 1.9 MiB/1.9 MiB (r/w)net
I191219 15:59:50.994034 343998 storage/replica_raft.go:247  [n4,s4,r9/2:/Table/1{3-4}] proposing ENTER_JOINT(r3 l3 v4) ADD_REPLICA[(n5,s5):4VOTER_INCOMING], REMOVE_REPLICA[(n4,s4):3VOTER_DEMOTING]: after=[(n1,s1):1 (n3,s3):2 (n4,s4):3VOTER_DEMOTING (n5,s5):4VOTER_INCOMING] next=5

Repro

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@tbg tbg added the branch-master Failures and bugs on the master branch. label Jan 22, 2020
@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@2c8a695ee37bb14d8a937976cf3584267abe3053:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x26ee981]

Stack:

goroutine 356345 [running]:
github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).CreateAndStartJob(0xc0012f2210, 0x69cdbc0, 0xc0046a7bc0, 0xc00500b500, 0xc003ed2ff0, 0x2d, 0xc0045652c0, 0x59, 0x5cf8722, 0x4, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:218 +0x4d1
github.com/cockroachdb/cockroach/pkg/sql.(*createStatsNode).startJob(0xc002d05880, 0x69cdbc0, 0xc0046a7bc0, 0xc00500b500, 0xc003fbcd60, 0xc003fbc390)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/create_stats.go:122 +0x282
github.com/cockroachdb/cockroach/pkg/sql.(*createStatsNode).startExec.func1(0xc002d05880, 0x69cdbc0, 0xc0046a7bc0, 0xc00358d680, 0xc003fbc398)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/create_stats.go:78 +0x90
created by github.com/cockroachdb/cockroach/pkg/sql.(*createStatsNode).startExec
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/create_stats.go:77 +0x122

Log preceding fatal error

I200127 17:55:20.995277 328344 storage/store_remove_replica.go:129  [n1,s1,r40/1:/Table/53/1/{20-40}] removing replica r40/1
I200127 17:55:25.507139 328081 gossip/gossip.go:567  [n1] gossip status (ok, 5 nodes)
gossip client (0/3 cur/max conns)
gossip server (3/3 cur/max conns, infos 2349/570 sent/received, bytes 662722B/199962B sent/received)
  2: 127.0.0.1:37891 (1m59s)
  3: 127.0.0.1:44679 (1m59s)
  4: 127.0.0.1:38455 (1m58s)
gossip connectivity
  n4 [sentinel];
  n2 -> n1; n3 -> n1; n4 -> n1; n5 -> n3;
I200127 17:55:25.565142 328405 server/status/runtime.go:498  [n1] runtime stats: 2.1 GiB RSS, 1585 goroutines, 124 MiB/103 MiB/222 MiB GO alloc/idle/total, 184 MiB/240 MiB CGO alloc/total, 875.5 CGO/sec, 296.6/27.7 %(u/s)time, 0.3 %gc (6x), 1.9 MiB/1.9 MiB (r/w)net
I200127 17:55:26.074112 328193 util/stop/stopper.go:539  quiescing
I200127 17:55:26.110496 356452 util/stop/stopper.go:539  quiescing
I200127 17:55:26.112528 356451 util/stop/stopper.go:539  quiescing
I200127 17:55:26.114986 328217 kv/transport_race.go:108  transport race promotion: ran 113 iterations on up to 18146 requests
W200127 17:55:26.127360 329704 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200127 17:55:26.128091 356485 util/stop/stopper.go:539  quiescing
E200127 17:55:26.148031 329382 jobs/registry.go:376  error while adopting jobs: failed querying for jobs: adopt-job: node unavailable; try another peer
I200127 17:55:26.148935 356486 util/stop/stopper.go:539  quiescing
I200127 17:55:26.159489 356474 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:39387: node unavailable; try another peer
I200127 17:55:26.149148 356487 util/stop/stopper.go:539  quiescing
I200127 17:55:26.163325 356502 kv/transport_race.go:108  transport race promotion: ran 2 iterations on up to 3 requests
E200127 17:55:26.167178 329094 jobs/registry.go:376  error while adopting jobs: failed querying for jobs: adopt-job: node unavailable; try another peer
W200127 17:55:26.169627 330275 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.170280 331025 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.171299 331304 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.172176 329794 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.172895 330364 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.174133 329960 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.190410 331165 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.191553 336521 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.193565 338583 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 2: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.194919 338459 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W200127 17:55:26.196152 336608 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
I200127 17:55:26.200360 356345 rpc/nodedialer/nodedialer.go:160  [intExec=create-stats,n1,txn=627e4f0d] unable to connect to n4: failed to connect to n4 at 127.0.0.1:38455: context canceled
I200127 17:55:26.212698 356345 rpc/nodedialer/nodedialer.go:160  [intExec=create-stats,n1,txn=627e4f0d] unable to connect to n2: failed to connect to n2 at 127.0.0.1:37891: context canceled
W200127 17:55:26.214313 328980 jobs/registry.go:911  job 524541381321490433: canceling due to liveness failure
W200127 17:55:26.215172 338584 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.216534 338426 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.219797 330816 internal/client/txn.go:566  [n5,liveness-hb] failure aborting transaction: node unavailable; try another peer; abort caused by: failed to send RPC: sending to all 1 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W200127 17:55:26.220809 330816 storage/node_liveness.go:471  [n5,liveness-hb] failed node liveness heartbeat: failed to send RPC: sending to all 1 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W200127 17:55:26.221140 330435 gossip/gossip.go:1518  [n5] no incoming or outgoing connections
W200127 17:55:26.225165 340910 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200127 17:55:26.222601 338412 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200127 17:55:26.230344 356536 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:38455: context canceled
I200127 17:55:26.236043 356533 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:39387: context canceled
I200127 17:55:26.240557 356524 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:38455: context canceled
I200127 17:55:26.242448 356535 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:37891: context canceled
I200127 17:55:26.243692 356345 rpc/nodedialer/nodedialer.go:160  [intExec=create-stats,n1,txn=627e4f0d] unable to connect to n3: failed to connect to n3 at 127.0.0.1:44679: context canceled
W200127 17:55:26.244652 356345 internal/client/txn.go:566  [intExec=create-stats] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=failed to connect to n3 at 127.0.0.1:44679: context canceled [exhausted])

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@andreimatei
Copy link
Contributor

This last sigsegv in jobs.Registry happened to me to.

@spaskob
Copy link
Contributor

spaskob commented Jan 29, 2020

#44441
is supposed to fix this
can you confirm you can repro it ?

@andreimatei
Copy link
Contributor

I probably didn't have it. Feel free to unassign yourself from this I guess.
Btw, I think it's a common pattern to put a Touches #43247 in your commit in situations like this. Then I would have seen it.

@spaskob
Copy link
Contributor

spaskob commented Jan 29, 2020 via email

@andreimatei
Copy link
Contributor

This issue is not among the ones you've listed. The syntax you used was messed up because it didn't close the issues, but any #x results in pinging that issue.
Also note that you don't want to close this issue because it has other failures unrelated to your thing.

@spaskob
Copy link
Contributor

spaskob commented Jan 29, 2020 via email

@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@e671a4ef97cbc6cf5d22f8f322fd45733d302094:

Fatal error:

F200214 01:26:27.901755 335319 kv/txn_interceptor_span_refresher.go:231  [n3,merge,s3,r9/5:/Table/1{3-4},txn=ed4ba15a] unexpected WriteTooOld request. ba: [txn: ed4ba15a], EndTxn(commit:false tsflex:false) [/Min]  (txn: "merge" meta={id=ed4ba15a key=/Local/Range/Table/13/RangeDescriptor pri=0.03558537 epo=0 ts=1581643580.019490690,2 min=1581643577.585278048,0 seq=10} rw=true stat=PENDING rts=1581643577.585278048,0 wto=true max=1581643578.085278048,0)

Stack:

goroutine 335319 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa0e3801, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9715c80, 0xc000000004, 0x8e503f1, 0x24, 0xe7, 0xc0035f4000, 0x17d)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:211 +0xbe3
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6cea540, 0xc002fa58f0, 0x4, 0x2, 0x5fb8d39, 0x30, 0xc007cc7468, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6cea540, 0xc002fa58f0, 0x1, 0x4, 0x5fb8d39, 0x30, 0xc007cc7468, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc0014719c0, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:231 +0x606
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc0014719c0, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:179 +0x27d
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc001471910, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:208 +0x1ea
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc0014718f0, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_seq_num_allocator.go:105 +0x2c3
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).SendLocked(0xc001471850, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:168 +0xff
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc001471680, 0x6cea540, 0xc002fa58f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:493 +0x6b7
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).sendUsingSender(0xc00514bb80, 0x6cea500, 0xc00148f680, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:754 +0x174
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send(0xc005117320, 0x6cea500, 0xc00148f680, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:870 +0x1d2
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).rollback(0xc005117320, 0x6cea500, 0xc00148f680, 0x8)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:668 +0x5fc
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).CleanupOnError(0xc005117320, 0x6cea500, 0xc00148f680, 0x6c6a060, 0xc002b459c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:562 +0x9e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).AdminMerge(0xc0051a8900, 0x6cea500, 0xc00148f680, 0xc00023b738, 0x1, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:772 +0x3a6
github.com/cockroachdb/cockroach/pkg/storage.(*mergeQueue).process(0xc00034f400, 0x6cea500, 0xc00148f680, 0xc0051a8900, 0xc00583c780, 0x5f39d6e, 0xd)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/merge_queue.go:353 +0x170b
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica.func1(0x6cea500, 0xc00148f680, 0xdf8475800, 0x6cea500)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:951 +0x24e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x6cea500, 0xc00148f680, 0xc0053a8ca0, 0x1d, 0xdf8475800, 0xc005b99d98, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xde
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica(0xc0013bbe40, 0x6cea540, 0xc006a1a150, 0x6d51780, 0xc0051a8900, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:910 +0x3aa
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1.2(0x6cea540, 0xc006a1a0f0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:838 +0x116
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc0004f4c80, 0x6cea540, 0xc006a1a0f0, 0xc003a75c20, 0x29, 0x0, 0x0, 0xc006a1a120)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:322 +0x163
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:317 +0x14b

Log preceding fatal error

I200214 01:26:27.813882 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (merge): node unavailable; try another peer
W200214 01:26:27.814483 303138 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.814814 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (split): node unavailable; try another peer
I200214 01:26:27.815726 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I200214 01:26:27.816292 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (replicaGC): node unavailable; try another peer
I200214 01:26:27.816887 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (raftlog): node unavailable; try another peer
I200214 01:26:27.817496 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (raftsnapshot): node unavailable; try another peer
I200214 01:26:27.818031 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
I200214 01:26:27.818709 295801 storage/queue.go:577  [n1,s1] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
W200214 01:26:27.818904 297331 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.820117 298176 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
E200214 01:26:27.821228 339653 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
W200214 01:26:27.822406 297419 gossip/gossip.go:1518  [n4] no incoming or outgoing connections
I200214 01:26:27.823440 339533 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:42209: context canceled
I200214 01:26:27.812992 295725 storage/compactor/compactor.go:325  [n1,s1,compactor] purging suggested compaction for range /Table/27 - /Table/28 that contains live data
I200214 01:26:27.824760 295725 storage/compactor/compactor.go:325  [n1,s1,compactor] purging suggested compaction for range /Table/28 - /Table/29 that contains live data
W200214 01:26:27.827732 297182 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
I200214 01:26:27.828630 338660 rpc/nodedialer/nodedialer.go:160  [n3,merge,s3,r9/5:/Table/1{3-4},txn=ed4ba15a] unable to connect to n1: failed to connect to n1 at 127.0.0.1:37079: context canceled
W200214 01:26:27.829480 321814 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.830993 321715 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.831984 338660 rpc/nodedialer/nodedialer.go:160  [n3,merge,s3,r9/5:/Table/1{3-4},txn=ed4ba15a] unable to connect to n2: failed to connect to n2 at 127.0.0.1:44155: context canceled
W200214 01:26:27.814128 300815 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.818194 339450 storage/intentresolver/intent_resolver.go:633  failed to cleanup transaction intents: failed to resolve intents: aborted during DistSender.Send: context canceled
W200214 01:26:27.835295 298348 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.832080 298202 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.836714 339309 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n5,txn=85426195] unable to connect to n3: failed to connect to n3 at 127.0.0.1:42209: context canceled
W200214 01:26:27.840257 297371 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.844103 298111 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.845401 299338 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.846746 339282 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n1,txn=0c4e85fd] unable to connect to n2: failed to connect to n2 at 127.0.0.1:44155: context canceled
I200214 01:26:27.850358 339282 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n1,txn=0c4e85fd] unable to connect to n4: failed to connect to n4 at 127.0.0.1:45163: context canceled
W200214 01:26:27.855058 334866 internal/client/txn.go:566  [n1,merge,s1,r8/1:/Table/1{2-3}] failure aborting transaction: node unavailable; try another peer; abort caused by: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n4 at 127.0.0.1:45163: context canceled
W200214 01:26:27.831174 298792 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.846379 339408 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n3,txn=da2b33ec] unable to connect to n1: failed to connect to n1 at 127.0.0.1:37079: context canceled
W200214 01:26:27.866113 298221 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.859518 301364 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.850706 339386 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n2,txn=0f695e38] unable to connect to n1: failed to connect to n1 at 127.0.0.1:37079: context canceled
W200214 01:26:27.866970 298181 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.867385 298867 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 01:26:27.867659 298841 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
I200214 01:26:27.870000 339675 storage/queue.go:1188  [n1,merge] purgatory is now empty
I200214 01:26:27.870482 339658 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n2: failed to connect to n2 at 127.0.0.1:44155: context canceled
E200214 01:26:27.871477 296566 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n3 at 127.0.0.1:42209: context canceled
E200214 01:26:27.860406 298595 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n3 at 127.0.0.1:42209: context canceled
I200214 01:26:27.878616 339657 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:37079: context canceled
I200214 01:26:27.880259 338660 rpc/nodedialer/nodedialer.go:160  [n3,merge,s3,r9/5:/Table/1{3-4},txn=ed4ba15a] unable to connect to n4: failed to connect to n4 at 127.0.0.1:45163: context canceled
I200214 01:26:27.882859 339676 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n5: failed to connect to n5 at 127.0.0.1:45421: context canceled
I200214 01:26:27.885005 339386 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n2,txn=0f695e38] unable to connect to n4: failed to connect to n4 at 127.0.0.1:45163: context canceled
I200214 01:26:27.885286 339678 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:42209: context canceled
W200214 01:26:27.889703 333162 sql/stats/automatic_stats.go:458  failed to create statistics on table 53: create-stats: job-update: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n3 at 127.0.0.1:42209: context canceled

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(ccl/importccl).TestMultiNodeExportStmt failed on master@e671a4ef97cbc6cf5d22f8f322fd45733d302094:

Fatal error:

F200214 06:43:17.595307 327955 kv/txn_interceptor_span_refresher.go:231  [n2,merge,s2,r10/2:/Table/1{4-5},txn=950b9c5c] unexpected WriteTooOld request. ba: [txn: 950b9c5c], EndTxn(commit:false tsflex:false) [/Min]  (txn: "merge" meta={id=950b9c5c key=/Local/Range/Table/14/RangeDescriptor pri=0.00189980 epo=0 ts=1581662589.117457838,2 min=1581662578.074607211,0 seq=9} rw=true stat=PENDING rts=1581662578.074607211,0 wto=true max=1581662578.574607211,0)

Stack:

goroutine 327955 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xa0e3801, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xc6
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x9715c80, 0xc000000004, 0x8e50439, 0x24, 0xe7, 0xc00419cf00, 0x17d)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:211 +0xbe3
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6cea580, 0xc001782480, 0x4, 0x2, 0x5fb8d79, 0x30, 0xc002397468, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x291
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6cea580, 0xc001782480, 0x1, 0x4, 0x5fb8d79, 0x30, 0xc002397468, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:155
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc002eaa7c0, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:231 +0x606
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc002eaa7c0, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:179 +0x27d
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc002eaa710, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:208 +0x1ea
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc002eaa6f0, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_seq_num_allocator.go:105 +0x2c3
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).SendLocked(0xc002eaa650, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:168 +0xff
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc002eaa480, 0x6cea580, 0xc001782480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:493 +0x6b7
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).sendUsingSender(0xc00392ee80, 0x6cea540, 0xc00234a480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:754 +0x174
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send(0xc000ccf290, 0x6cea540, 0xc00234a480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:870 +0x1d2
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).rollback(0xc000ccf290, 0x6cea540, 0xc00234a480, 0x8)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:668 +0x5fc
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).CleanupOnError(0xc000ccf290, 0x6cea540, 0xc00234a480, 0x6c6a0a0, 0xc001e94ea0)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:562 +0x9e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).AdminMerge(0xc005854900, 0x6cea540, 0xc00234a480, 0xc00353b6a8, 0x1, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:772 +0x3a6
github.com/cockroachdb/cockroach/pkg/storage.(*mergeQueue).process(0xc005a239a0, 0x6cea540, 0xc00234a480, 0xc005854900, 0xc00363f180, 0x5f39dae, 0xd)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/merge_queue.go:353 +0x170b
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica.func1(0x6cea540, 0xc00234a480, 0xdf8475800, 0x6cea540)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:951 +0x24e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x6cea540, 0xc00234a480, 0xc002f4d6c0, 0x1e, 0xdf8475800, 0xc001ccfd98, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xde
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processReplica(0xc0013aa840, 0x6cea580, 0xc00161da10, 0x6d517c0, 0xc005854900, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:910 +0x3aa
github.com/cockroachdb/cockroach/pkg/storage.(*baseQueue).processLoop.func1.2(0x6cea580, 0xc00161d9b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/queue.go:838 +0x116
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc001b6b9a0, 0x6cea580, 0xc00161d9b0, 0xc0043293b0, 0x29, 0x0, 0x0, 0xc00161d9e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:322 +0x163
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:317 +0x14b

Log preceding fatal error

I200214 06:43:17.476707 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (gc): node unavailable; try another peer
I200214 06:43:17.477600 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (merge): node unavailable; try another peer
I200214 06:43:17.478020 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (split): node unavailable; try another peer
I200214 06:43:17.478456 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I200214 06:43:17.478930 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (replicaGC): node unavailable; try another peer
I200214 06:43:17.479236 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (raftlog): node unavailable; try another peer
I200214 06:43:17.467754 334934 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:32799: context canceled
I200214 06:43:17.480512 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (raftsnapshot): node unavailable; try another peer
I200214 06:43:17.480963 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (consistencyChecker): node unavailable; try another peer
I200214 06:43:17.481194 334981 storage/queue.go:1188  [n5,merge] purgatory is now empty
I200214 06:43:17.481302 296809 storage/queue.go:577  [n5,s5] rate limited in MaybeAdd (timeSeriesMaintenance): node unavailable; try another peer
E200214 06:43:17.496583 295054 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: failed to send RPC: sending to all 5 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W200214 06:43:17.503609 294273 storage/store_raft.go:496  [n1,s1,r11/6:/Table/1{5-8}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W200214 06:43:17.504961 295145 gossip/gossip.go:1518  [n2] no incoming or outgoing connections
W200214 06:43:17.505719 297235 storage/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 06:43:17.506321 294325 ts/db.go:194  [n1,ts-poll] error writing time series data: failed to send RPC: sending to all 3 replicas failed; last error: <nil> failed to connect to n5 at 127.0.0.1:33413: context canceled
W200214 06:43:17.506851 302320 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 5: rpc error: code = Unavailable desc = transport is closing:
W200214 06:43:17.507150 294325 ts/db.go:197  [n1,ts-poll] node unavailable; try another peer
I200214 06:43:17.510423 334956 storage/queue.go:1188  [n1,merge] purgatory is now empty
I200214 06:43:17.513431 334915 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:32799: context canceled
W200214 06:43:17.516473 295576 storage/store_raft.go:496  [n3,s3,r11/2:/Table/1{5-8}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W200214 06:43:17.520835 295873 gossip/gossip.go:1518  [n4] no incoming or outgoing connections
W200214 06:43:17.521423 294286 storage/store_raft.go:496  [n1,s1,r1/1:/{Min-System/NodeL…}] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
I200214 06:43:17.522534 334450 rpc/nodedialer/nodedialer.go:160  [intExec=adopt-job,n2,txn=31a28664] unable to connect to n3: failed to connect to n3 at 127.0.0.1:39019: context canceled
W200214 06:43:17.524275 296108 storage/store_raft.go:496  [n4,s4,r11/3:/Table/1{5-8}] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
W200214 06:43:17.526238 296484 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 4: EOF:
W200214 06:43:17.527091 297086 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
I200214 06:43:17.527856 334920 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n4: failed to connect to n4 at 127.0.0.1:33305: context canceled
W200214 06:43:17.544460 296367 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 1: EOF:
W200214 06:43:17.545531 295694 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 1: EOF:
E200214 06:43:17.547281 295714 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W200214 06:43:17.556559 297160 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 2: EOF:
W200214 06:43:17.558102 297750 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 5: EOF:
E200214 06:43:17.566409 297076 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: node unavailable; try another peer
I200214 06:43:17.568330 334897 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n3: failed to connect to n3 at 127.0.0.1:39019: context canceled
I200214 06:43:17.568678 334914 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n5: failed to connect to n5 at 127.0.0.1:33413: context canceled
E200214 06:43:17.570430 334959 vendor/google.golang.org/grpc/pickfirst.go:61  pickfirstBalancer: failed to NewSubConn: rpc error: code = Canceled desc = grpc: the client connection is closing
W200214 06:43:17.576142 295246 storage/store_raft.go:496  [n2,s2,r11/4:/Table/1{5-8}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
W200214 06:43:17.576552 297148 storage/raft_transport.go:637  [n4] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 06:43:17.581044 296745 storage/store_raft.go:496  [n5,s5,r9/2:/Table/1{3-4}] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
I200214 06:43:17.582906 334919 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n1: failed to connect to n1 at 127.0.0.1:32799: context canceled
W200214 06:43:17.583538 297016 server/node.go:807  [n5,summaries] error recording status summaries: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n3 at 127.0.0.1:39019: context canceled
I200214 06:43:17.584177 334918 rpc/nodedialer/nodedialer.go:160  [ct-client] unable to connect to n5: failed to connect to n5 at 127.0.0.1:33413: context canceled
W200214 06:43:17.584767 297454 storage/raft_transport.go:637  [n5] while processing outgoing Raft queue to node 1: EOF:
W200214 06:43:17.586241 301565 storage/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W200214 06:43:17.587190 295052 jobs/registry.go:1055  job 529505291143380993: canceling due to liveness failure
W200214 06:43:17.587604 296756 storage/store_raft.go:496  [n5,s5,r1/2:/{Min-System/NodeL…}] handle raft ready: 0.6s [applied=2, batches=1, state_assertions=0]
W200214 06:43:17.587905 301688 storage/raft_transport.go:637  [n3] while processing outgoing Raft queue to node 2: EOF:
E200214 06:43:17.593555 295316 jobs/registry.go:407  error while adopting jobs: failed querying for jobs: adopt-job: failed to send RPC: sending to all 5 replicas failed; last error: <nil> failed to connect to n1 at 127.0.0.1:32799: context canceled
W200214 06:43:17.594128 334769 internal/client/txn.go:566  failure aborting transaction: node unavailable; try another peer; abort caused by: failed to fetch protectedts metadata: failed to read metadata: protectedts-GetMetadata: context canceled

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestMultiNodeExportStmt PKG=./pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@petermattis
Copy link
Collaborator

I just saw acceptance/bank/node-restart fail on CI with: F200214 21:00:04.728340 322 kv/txn_interceptor_span_refresher.go:231 [n2,liveness-hb,txn=e964cb05] unexpected WriteTooOld request. ba: [txn: e964cb05], EndTxn(commit:false tsflex:true) [/Min] (txn: "unnamed" meta={id=e964cb05 key=/System/NodeLiveness/2 pri=0.00846973 epo=0 ts=1581714002.514443562,1 min=1581714001.528023810,0 seq=2} rw=true stat=PENDING rts=1581714001.528023810,0 wto=true max=1581714002.028023810,0)

See https://teamcity.cockroachdb.com/repository/download/Cockroach_UnitTests_Roachtest/1750448:id/acceptance/bank/node-restart/run_1/2.logs/cockroach.stderr.log

@andreimatei This seems to be up your alley.

@andreimatei
Copy link
Contributor

andreimatei commented Feb 14, 2020 via email

@spaskob spaskob removed their assignment Mar 3, 2020
@dt dt closed this as completed Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

6 participants