Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: TestReplicateQueueDownReplicate failed under stress #28368

Closed
cockroach-teamcity opened this issue Aug 8, 2018 · 12 comments
Closed
Assignees
Labels
A-kv-client Relating to the KV client and the KV interface. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/bf76db84cb64dc90f65d8b2e129c75028127cda2

Parameters:

TAGS=
GOFLAGS=-race

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=823111&tab=buildLog

I180808 05:51:45.863575 137781 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:51:45.878481 142415 storage/replica_raftstorage.go:520  [raftsnapshot,n3,s3,r17/2:/Table/2{0-1}] generated Raft snapshot dea4595f at index 50
I180808 05:51:45.880794 142367 storage/replica_raftstorage.go:743  [n3,s3,r1/?:/{Min-System/}] applied preemptive snapshot in 850ms [clear=0ms batch=0ms entries=741ms commit=108ms]
I180808 05:51:45.887999 142334 storage/store.go:2557  [replicaGC,n1,s1,r23/1:{m-/Table/Syste…}] removing replica
I180808 05:51:46.055654 142348 storage/replica_command.go:745  [replicate,n1,s1,r1/1:/{Min-System/}] change replicas (ADD_REPLICA (n3,s3):4): read existing descriptor r1:/{Min-System/} [(n1,s1):1, (n5,s5):2, (n4,s4):3, next=4]
I180808 05:51:46.067188 142415 storage/store_snapshot.go:605  [raftsnapshot,n3,s3,r17/2:/Table/2{0-1}] streamed snapshot to (n4,s4):5: kv pairs: 15, log entries: 40, rate-limit: 8.0 MiB/sec, 145ms
I180808 05:51:46.071012 142500 storage/replica_raftstorage.go:737  [n4,s4,r17/5:{-}] applying Raft snapshot at index 50 (id=dea4595f, encoded size=11835, 1 rocksdb batches, 40 log entries)
I180808 05:51:46.182262 142485 storage/store.go:2557  [replicaGC,n3,s3,r1/?:/{Min-System/}] removing replica
I180808 05:51:46.300297 142334 storage/replica.go:844  [replicaGC,n1,s1,r23/1:{m-/Table/Syste…}] removed 8 (0+8) keys in 339ms [clear=0ms commit=339ms]
I180808 05:51:46.333926 142485 storage/replica.go:844  [replicaGC,n3,s3,r1/?:/{Min-System/}] removed 33 (24+9) keys in 151ms [clear=0ms commit=150ms]
I180808 05:51:46.394244 142474 storage/store.go:2557  [replicaGC,n1,s1,r12/1:/Table/1{5-6}] removing replica
I180808 05:51:46.637542 138751 gossip/gossip.go:518  [n3] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:35831 (1m1s: infos 26/889 sent/received, bytes 8046B/130855B sent/received)
gossip server (0/3 cur/max conns, infos 26/889 sent/received, bytes 8046B/130855B sent/received)
I180808 05:51:46.691896 142242 storage/replica_raftstorage.go:743  [n4,s4,r16/4:/Table/{19-20}] applied Raft snapshot in 2354ms [clear=0ms batch=0ms entries=2018ms commit=312ms]
I180808 05:51:46.699179 138754 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1243 goroutines, 63 MiB/37 MiB/134 MiB GO alloc/idle/total, 174 MiB/212 MiB CGO alloc/total, 1329.56cgo/sec, 0.97/0.06 %(u/s)time, 0.03 %gc (4x)
W180808 05:51:46.746264 138333 server/node.go:873  [n2,summaries] health alerts detected: {Alerts:[{StoreID:2 Category:METRICS Description:queue.replicate.process.failure Value:1}]}
W180808 05:51:46.786225 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.6s
I180808 05:51:46.851655 142474 storage/replica.go:844  [replicaGC,n1,s1,r12/1:/Table/1{5-6}] removed 8 (0+8) keys in 385ms [clear=0ms commit=385ms]
I180808 05:51:46.974310 138772 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:51:47.147544 141970 storage/replica.go:3645  [n5,s5,r10/3:/Table/1{3-4}] proposing ADD_REPLICA((n3,s3):5): updated=[(n4,s4):4 (n2,s2):2 (n5,s5):3 (n3,s3):5] next=6
I180808 05:51:48.245060 142522 storage/replica_command.go:745  [replicate,n5,s5,r10/3:/Table/1{3-4}] change replicas (REMOVE_REPLICA (n2,s2):2): read existing descriptor r10:/Table/1{3-4} [(n4,s4):4, (n2,s2):2, (n5,s5):3, (n3,s3):5, next=6]
I180808 05:51:48.422463 138913 gossip/gossip.go:518  [n4] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  1: 127.0.0.1:35831 (1m1s: infos 41/622 sent/received, bytes 12424B/95780B sent/received)
gossip server (1/3 cur/max conns, infos 1132/713 sent/received, bytes 169383B/131291B sent/received)
  5: 127.0.0.1:42739 (57s)
W180808 05:51:48.527613 137728 storage/store.go:3742  [n1,s1] handle raft ready: 0.7s [processed=0]
I180808 05:51:48.715645 139108 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1244 goroutines, 93 MiB/14 MiB/134 MiB GO alloc/idle/total, 174 MiB/212 MiB CGO alloc/total, 1290.63cgo/sec, 0.97/0.06 %(u/s)time, 0.03 %gc (4x)
I180808 05:51:48.906413 139110 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:51:49.140029 137766 storage/store.go:3742  [n1,s1] handle raft ready: 0.5s [processed=0]
W180808 05:51:49.778806 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 2.7s
I180808 05:51:49.898285 138507 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180808 05:51:50.056383 142607 rpc/nodedialer/nodedialer.go:92  [n1] connection to n3 established
I180808 05:51:50.191691 142659 storage/replica_raftstorage.go:520  [raftsnapshot,n5,s5,r10/3:/Table/1{3-4}] generated Raft snapshot 162971b7 at index 188
I180808 05:51:50.499491 142659 storage/store_snapshot.go:605  [raftsnapshot,n5,s5,r10/3:/Table/1{3-4}] streamed snapshot to (n3,s3):5: kv pairs: 453, log entries: 43, rate-limit: 8.0 MiB/sec, 201ms
I180808 05:51:50.519090 142648 storage/replica_raftstorage.go:737  [n3,s3,r10/5:{-}] applying Raft snapshot at index 188 (id=162971b7, encoded size=80970, 1 rocksdb batches, 43 log entries)
I180808 05:51:50.659909 139442 gossip/gossip.go:518  [n5] gossip status (ok, 5 nodes)
gossip client (1/3 cur/max conns)
  4: 127.0.0.1:40097 (1m0s: infos 91/1113 sent/received, bytes 35511B/160413B sent/received)
gossip server (0/3 cur/max conns, infos 92/1138 sent/received, bytes 35762B/171264B sent/received)
I180808 05:51:50.863864 139295 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1256 goroutines, 82 MiB/20 MiB/134 MiB GO alloc/idle/total, 175 MiB/213 MiB CGO alloc/total, 1269.51cgo/sec, 0.95/0.06 %(u/s)time, 0.02 %gc (3x)
W180808 05:51:50.921679 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 3.5s
I180808 05:51:51.366095 142500 storage/replica_raftstorage.go:743  [n4,s4,r17/5:/Table/2{0-1}] applied Raft snapshot in 5295ms [clear=0ms batch=0ms entries=5289ms commit=5ms]
I180808 05:51:51.382412 139297 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:51:51.714793 137764 storage/store.go:3742  [n1,s1] handle raft ready: 1.7s [processed=0]
I180808 05:51:51.944314 138808 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
W180808 05:51:52.045060 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 2.4s
I180808 05:51:52.093253 142716 rpc/nodedialer/nodedialer.go:92  [n1] connection to n4 established
I180808 05:51:52.136138 142381 storage/replica_raftstorage.go:743  [n5,s5,r23/?:{m-/Table/Syste…}] applied preemptive snapshot in 6493ms [clear=0ms batch=0ms entries=6491ms commit=0ms]
I180808 05:51:52.764798 142483 storage/replica_command.go:745  [n4,s4,r23/3:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n2,s2):4, (n3,s3):2, (n4,s4):3, next=5]
I180808 05:51:52.923117 142648 storage/replica_raftstorage.go:743  [n3,s3,r10/5:/Table/1{3-4}] applied Raft snapshot in 2404ms [clear=0ms batch=0ms entries=2401ms commit=1ms]
W180808 05:51:53.058819 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 1.1s
W180808 05:51:53.073527 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 5.1s
W180808 05:51:53.167189 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.7s
W180808 05:51:53.450401 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 4.2s
I180808 05:51:54.016200 142370 storage/replica.go:3645  [n2,s2,r16/3:/Table/{19-20}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n4,s4):4 (n5,s5):2 (n2,s2):3] next=5
W180808 05:51:54.379380 139246 storage/store.go:3742  [n5,s5] handle raft ready: 0.5s [processed=0]
I180808 05:51:54.483408 138971 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n1 established
I180808 05:51:54.531882 142799 rpc/nodedialer/nodedialer.go:92  [n1] connection to n5 established
I180808 05:51:54.697492 142775 storage/store.go:2557  [replicaGC,n5,s5,r23/?:{m-/Table/Syste…}] removing replica
I180808 05:51:54.709011 142453 storage/replica.go:3645  [n3,s3,r17/2:/Table/2{0-1}] proposing REMOVE_REPLICA((n2,s2):3): updated=[(n5,s5):4 (n3,s3):2 (n4,s4):5] next=6
I180808 05:51:54.909050 138331 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1232 goroutines, 58 MiB/43 MiB/135 MiB GO alloc/idle/total, 173 MiB/214 MiB CGO alloc/total, 1064.50cgo/sec, 0.96/0.04 %(u/s)time, 0.04 %gc (4x)
I180808 05:51:54.922402 142775 storage/replica.go:844  [replicaGC,n5,s5,r23/?:{m-/Table/Syste…}] removed 8 (0+8) keys in 224ms [clear=0ms commit=224ms]
I180808 05:51:55.007611 139349 storage/store.go:3617  [n1,s1,r16/1:/Table/{19-20}] added to replica GC queue (peer suggestion)
I180808 05:51:55.008259 139349 storage/store.go:3617  [n1,s1,r16/1:/Table/{19-20}] added to replica GC queue (peer suggestion)
I180808 05:51:55.008661 139349 storage/store.go:3617  [n1,s1,r16/1:/Table/{19-20}] added to replica GC queue (peer suggestion)
I180808 05:51:55.195560 142865 storage/replica_raftstorage.go:520  [replicate,n2,s2,r16/3:/Table/{19-20}] generated preemptive snapshot 6bffb5ad at index 41
I180808 05:51:55.252185 142865 storage/store_snapshot.go:605  [replicate,n2,s2,r16/3:/Table/{19-20}] streamed snapshot to (n3,s3):?: kv pairs: 14, log entries: 31, rate-limit: 2.0 MiB/sec, 55ms
I180808 05:51:55.258491 142826 storage/replica_raftstorage.go:737  [n3,s3,r16/?:{-}] applying preemptive snapshot at index 41 (id=6bffb5ad, encoded size=9283, 1 rocksdb batches, 31 log entries)
W180808 05:51:55.370595 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.3s
I180808 05:51:55.416449 142522 storage/replica.go:3645  [n5,s5,r10/3:/Table/1{3-4}] proposing REMOVE_REPLICA((n2,s2):2): updated=[(n4,s4):4 (n3,s3):5 (n5,s5):3] next=6
I180808 05:51:55.433303 138333 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:51:55.483219 137779 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1238 goroutines, 67 MiB/36 MiB/135 MiB GO alloc/idle/total, 173 MiB/214 MiB CGO alloc/total, 1094.46cgo/sec, 0.96/0.04 %(u/s)time, 0.03 %gc (3x)
I180808 05:51:55.715567 141357 storage/store.go:3617  [n2,s2,r17/3:/Table/2{0-1}] added to replica GC queue (peer suggestion)
I180808 05:51:55.716290 141357 storage/store.go:3617  [n2,s2,r17/3:/Table/2{0-1}] added to replica GC queue (peer suggestion)
I180808 05:51:55.716754 141357 storage/store.go:3617  [n2,s2,r17/3:/Table/2{0-1}] added to replica GC queue (peer suggestion)
W180808 05:51:55.722875 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.0s
I180808 05:51:55.868148 137781 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:51:55.990019 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.8s
I180808 05:51:56.410341 141544 storage/store.go:3617  [n2,s2,r10/2:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180808 05:51:56.411055 141544 storage/store.go:3617  [n2,s2,r10/2:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180808 05:51:56.411391 141544 storage/store.go:3617  [n2,s2,r10/2:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180808 05:51:56.411563 141544 storage/store.go:3617  [n2,s2,r10/2:/Table/1{3-4}] added to replica GC queue (peer suggestion)
I180808 05:51:56.640270 138754 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1234 goroutines, 83 MiB/21 MiB/135 MiB GO alloc/idle/total, 173 MiB/214 MiB CGO alloc/total, 1063.35cgo/sec, 0.95/0.05 %(u/s)time, 0.03 %gc (3x)
I180808 05:51:56.668398 142348 storage/replica.go:3645  [n1,s1,r1/1:/{Min-System/}] proposing ADD_REPLICA((n3,s3):4): updated=[(n1,s1):1 (n5,s5):2 (n4,s4):3 (n3,s3):4] next=5
I180808 05:51:56.734271 138748 storage/compactor/compactor.go:329  [n3,s3,compactor] purging suggested compaction for range /Min - /System/"" that contains live data
I180808 05:51:56.734728 138748 storage/compactor/compactor.go:329  [n3,s3,compactor] purging suggested compaction for range /Table/13 - /Table/14 that contains live data
I180808 05:51:56.916075 138772 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:51:57.522463 142900 storage/store.go:2557  [replicaGC,n1,s1,r16/1:/Table/{19-20}] removing replica
I180808 05:51:57.537395 142902 storage/store.go:2557  [replicaGC,n2,s2,r17/3:/Table/2{0-1}] removing replica
I180808 05:51:57.809883 142997 storage/replica_range_lease.go:554  [replicate,n1,s1,r1/1:/{Min-System/}] transferring lease to s4
I180808 05:51:57.953888 142900 storage/replica.go:844  [replicaGC,n1,s1,r16/1:/Table/{19-20}] removed 8 (0+8) keys in 386ms [clear=0ms commit=386ms]
I180808 05:51:57.972117 142826 storage/replica_raftstorage.go:743  [n3,s3,r16/?:/Table/{19-20}] applied preemptive snapshot in 2713ms [clear=0ms batch=0ms entries=2677ms commit=35ms]
I180808 05:51:57.903870 142902 storage/replica.go:844  [replicaGC,n2,s2,r17/3:/Table/2{0-1}] removed 8 (0+8) keys in 214ms [clear=0ms commit=214ms]
W180808 05:51:58.054827 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 1.1s
I180808 05:51:58.131164 142865 storage/replica_command.go:745  [replicate,n2,s2,r16/3:/Table/{19-20}] change replicas (ADD_REPLICA (n3,s3):5): read existing descriptor r16:/Table/{19-20} [(n4,s4):4, (n5,s5):2, (n2,s2):3, next=5]
I180808 05:51:58.231578 142483 storage/replica.go:3645  [n4,s4,r23/3:{m-/Table/Syste…}] proposing ADD_REPLICA((n5,s5):5): updated=[(n2,s2):4 (n3,s3):2 (n4,s4):3 (n5,s5):5] next=6
I180808 05:51:58.526859 139108 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1225 goroutines, 66 MiB/36 MiB/135 MiB GO alloc/idle/total, 172 MiB/214 MiB CGO alloc/total, 1097.26cgo/sec, 0.96/0.04 %(u/s)time, 0.03 %gc (3x)
I180808 05:51:58.959726 139110 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:51:59.254449 137530 storage/store.go:3742  [n1,s1] handle raft ready: 0.6s [processed=0]
I180808 05:51:59.441652 142997 storage/replica_range_lease.go:617  [replicate,n1,s1,r1/1:/{Min-System/}] done transferring lease to s4: <nil>
I180808 05:51:59.474359 143012 storage/store.go:2557  [replicaGC,n2,s2,r10/2:/Table/1{3-4}] removing replica
E180808 05:51:59.487676 143052 storage/queue.go:778  [replicate,n4,s4,r1/3:/{Min-System/}] no removable replicas from range that needs a removal: [raft progress unknown]
I180808 05:51:59.561244 143018 storage/store.go:2557  [replicaGC,n3,s3,r16/?:/Table/{19-20}] removing replica
I180808 05:51:59.566079 142976 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r1/1:/{Min-System/}] generated Raft snapshot 5ee7ab3c at index 245
I180808 05:51:59.669681 142962 storage/replica_raftstorage.go:520  [replicate,n1,s1,r9/1:/Table/1{2-3}] generated preemptive snapshot 80662695 at index 41
I180808 05:51:59.703314 143084 storage/replica_command.go:745  [replicate,n4,s4,r23/3:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n5,s5):5): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n2,s2):4, (n3,s3):2, (n4,s4):3, (n5,s5):5, next=6]
I180808 05:51:59.802877 142962 storage/store_snapshot.go:605  [replicate,n1,s1,r9/1:/Table/1{2-3}] streamed snapshot to (n3,s3):?: kv pairs: 65, log entries: 31, rate-limit: 2.0 MiB/sec, 131ms
I180808 05:51:59.814024 143036 storage/replica_raftstorage.go:737  [n3,s3,r9/?:{-}] applying preemptive snapshot at index 41 (id=80662695, encoded size=24276, 1 rocksdb batches, 31 log entries)
I180808 05:51:59.886233 143018 storage/replica.go:844  [replicaGC,n3,s3,r16/?:/Table/{19-20}] removed 8 (0+8) keys in 324ms [clear=0ms commit=324ms]
I180808 05:51:59.915007 143012 storage/replica.go:844  [replicaGC,n2,s2,r10/2:/Table/1{3-4}] removed 461 (452+9) keys in 439ms [clear=0ms commit=439ms]
W180808 05:52:00.254772 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.0s
I180808 05:52:00.530548 143066 storage/replica_raftstorage.go:520  [replicate,n3,s3,r17/2:/Table/2{0-1}] generated preemptive snapshot bb932ddd at index 59
I180808 05:52:00.715190 139295 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1235 goroutines, 60 MiB/41 MiB/135 MiB GO alloc/idle/total, 173 MiB/214 MiB CGO alloc/total, 1055.71cgo/sec, 0.97/0.05 %(u/s)time, 0.03 %gc (4x)
I180808 05:52:00.738379 143066 storage/store_snapshot.go:605  [replicate,n3,s3,r17/2:/Table/2{0-1}] streamed snapshot to (n2,s2):?: kv pairs: 15, log entries: 49, rate-limit: 2.0 MiB/sec, 145ms
W180808 05:52:00.834236 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 2.2s
I180808 05:52:00.884656 143158 storage/replica_raftstorage.go:737  [n2,s2,r17/?:{-}] applying preemptive snapshot at index 59 (id=bb932ddd, encoded size=13866, 1 rocksdb batches, 49 log entries)
W180808 05:52:00.970983 139110 server/node.go:873  [n4,summaries] health alerts detected: {Alerts:[{StoreID:4 Category:METRICS Description:queue.replicate.process.failure Value:1}]}
I180808 05:52:01.158059 143124 storage/replica_raftstorage.go:520  [raftsnapshot,n4,s4,r23/3:{m-/Table/Syste…}] generated Raft snapshot 64ee7dca at index 50
I180808 05:52:01.192595 143124 storage/store_snapshot.go:605  [raftsnapshot,n4,s4,r23/3:{m-/Table/Syste…}] streamed snapshot to (n5,s5):5: kv pairs: 12, log entries: 40, rate-limit: 8.0 MiB/sec, 33ms
I180808 05:52:01.260584 143156 storage/replica_raftstorage.go:520  [replicate,n5,s5,r10/3:/Table/1{3-4}] generated preemptive snapshot 44573294 at index 201
I180808 05:52:01.303449 143122 storage/replica_raftstorage.go:737  [n5,s5,r23/5:{-}] applying Raft snapshot at index 50 (id=64ee7dca, encoded size=9917, 1 rocksdb batches, 40 log entries)
I180808 05:52:01.388969 139297 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:52:01.502873 143156 storage/store_snapshot.go:605  [replicate,n5,s5,r10/3:/Table/1{3-4}] streamed snapshot to (n2,s2):?: kv pairs: 468, log entries: 56, rate-limit: 2.0 MiB/sec, 183ms
I180808 05:52:01.533414 143103 storage/replica_raftstorage.go:737  [n2,s2,r10/?:{-}] applying preemptive snapshot at index 201 (id=44573294, encoded size=91472, 1 rocksdb batches, 56 log entries)
W180808 05:52:01.572933 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.1s
W180808 05:52:02.025426 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 1.0s
I180808 05:52:02.373068 142865 storage/replica.go:3645  [n2,s2,r16/3:/Table/{19-20}] proposing ADD_REPLICA((n3,s3):5): updated=[(n4,s4):4 (n5,s5):2 (n2,s2):3 (n3,s3):5] next=6
I180808 05:52:02.759082 143195 storage/replica_command.go:745  [replicate,n2,s2,r16/3:/Table/{19-20}] change replicas (REMOVE_REPLICA (n5,s5):2): read existing descriptor r16:/Table/{19-20} [(n4,s4):4, (n5,s5):2, (n2,s2):3, (n3,s3):5, next=6]
I180808 05:52:03.466519 143251 storage/replica_raftstorage.go:520  [raftsnapshot,n2,s2,r16/3:/Table/{19-20}] generated Raft snapshot 15f8f352 at index 47
I180808 05:52:03.531717 143251 storage/store_snapshot.go:605  [raftsnapshot,n2,s2,r16/3:/Table/{19-20}] streamed snapshot to (n3,s3):5: kv pairs: 15, log entries: 37, rate-limit: 8.0 MiB/sec, 64ms
I180808 05:52:03.573492 143201 storage/replica_raftstorage.go:737  [n3,s3,r16/5:{-}] applying Raft snapshot at index 47 (id=15f8f352, encoded size=11289, 1 rocksdb batches, 37 log entries)
I180808 05:52:03.581131 143036 storage/replica_raftstorage.go:743  [n3,s3,r9/?:/Table/1{2-3}] applied preemptive snapshot in 3767ms [clear=0ms batch=0ms entries=3709ms commit=57ms]
I180808 05:52:03.754210 142976 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r1/1:/{Min-System/}] streamed snapshot to (n3,s3):4: kv pairs: 130, log entries: 31, rate-limit: 8.0 MiB/sec, 4187ms
I180808 05:52:03.758483 143169 storage/store.go:2557  [replicaGC,n3,s3,r9/?:/Table/1{2-3}] removing replica
I180808 05:52:03.916317 142962 storage/replica_command.go:745  [replicate,n1,s1,r9/1:/Table/1{2-3}] change replicas (ADD_REPLICA (n3,s3):4): read existing descriptor r9:/Table/1{2-3} [(n1,s1):1, (n2,s2):2, (n4,s4):3, next=4]
W180808 05:52:03.957277 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 1.3s
I180808 05:52:03.977577 143169 storage/replica.go:844  [replicaGC,n3,s3,r9/?:/Table/1{2-3}] removed 61 (55+6) keys in 218ms [clear=51ms commit=167ms]
W180808 05:52:04.344427 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.2s
I180808 05:52:04.565114 143084 storage/replica.go:3645  [n4,s4,r23/3:{m-/Table/Syste…}] proposing REMOVE_REPLICA((n5,s5):5): updated=[(n2,s2):4 (n3,s3):2 (n4,s4):3] next=6
I180808 05:52:04.692090 143122 storage/replica_raftstorage.go:743  [n5,s5,r23/5:{m-/Table/Syste…}] applied Raft snapshot in 3377ms [clear=0ms batch=0ms entries=3316ms commit=55ms]
I180808 05:52:04.890823 138331 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1228 goroutines, 80 MiB/24 MiB/135 MiB GO alloc/idle/total, 172 MiB/214 MiB CGO alloc/total, 1210.29cgo/sec, 0.96/0.06 %(u/s)time, 0.02 %gc (3x)
I180808 05:52:04.914279 143201 storage/replica_raftstorage.go:743  [n3,s3,r16/5:/Table/{19-20}] applied Raft snapshot in 1288ms [clear=0ms batch=0ms entries=1199ms commit=88ms]
I180808 05:52:05.026124 143103 storage/replica_raftstorage.go:743  [n2,s2,r10/?:/Table/1{3-4}] applied preemptive snapshot in 3492ms [clear=0ms batch=0ms entries=3190ms commit=301ms]
I180808 05:52:05.037328 137557 storage/replicate_queue_test.go:218  SucceedsSoon: range not found on store {5 5}
I180808 05:52:05.070556 140495 storage/store.go:3617  [n5,s5,r23/5:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180808 05:52:05.070988 140495 storage/store.go:3617  [n5,s5,r23/5:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180808 05:52:05.071288 140495 storage/store.go:3617  [n5,s5,r23/5:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180808 05:52:05.071438 140495 storage/store.go:3617  [n5,s5,r23/5:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180808 05:52:05.071585 140495 storage/store.go:3617  [n5,s5,r23/5:{m-/Table/Syste…}] added to replica GC queue (peer suggestion)
I180808 05:52:05.136747 143289 storage/replica_raftstorage.go:520  [n4,s4,r23/3:{m-/Table/Syste…}] generated preemptive snapshot e3ede43f at index 57
I180808 05:52:05.226739 143156 storage/replica_command.go:745  [replicate,n5,s5,r10/3:/Table/1{3-4}] change replicas (ADD_REPLICA (n2,s2):6): read existing descriptor r10:/Table/1{3-4} [(n4,s4):4, (n3,s3):5, (n5,s5):3, next=6]
I180808 05:52:05.233032 143158 storage/replica_raftstorage.go:743  [n2,s2,r17/?:/Table/2{0-1}] applied preemptive snapshot in 4348ms [clear=0ms batch=0ms entries=4345ms commit=2ms]
I180808 05:52:05.342935 143278 storage/replica_command.go:745  [replicate,n4,s4,r1/3:/{Min-System/}] change replicas (REMOVE_REPLICA (n1,s1):1): read existing descriptor r1:/{Min-System/} [(n1,s1):1, (n5,s5):2, (n4,s4):3, (n3,s3):4, next=5]
I180808 05:52:05.365612 143280 storage/replica_raftstorage.go:520  [raftsnapshot,n4,s4,r1/3:/{Min-System/}] generated Raft snapshot 4cf4ca80 at index 255
I180808 05:52:05.648387 143229 storage/replica_raftstorage.go:737  [n5,s5,r23/5:{m-/Table/Syste…}] applying preemptive snapshot at index 57 (id=e3ede43f, encoded size=12091, 1 rocksdb batches, 47 log entries)
I180808 05:52:05.368268 143289 storage/store_snapshot.go:605  [n4,s4,r23/3:{m-/Table/Syste…}] streamed snapshot to (n5,s5):?: kv pairs: 13, log entries: 47, rate-limit: 2.0 MiB/sec, 230ms
I180808 05:52:05.503701 143066 storage/replica_command.go:745  [replicate,n3,s3,r17/2:/Table/2{0-1}] change replicas (ADD_REPLICA (n2,s2):6): read existing descriptor r17:/Table/2{0-1} [(n5,s5):4, (n3,s3):2, (n4,s4):5, next=6]
I180808 05:52:05.614655 138333 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:52:05.855412 137779 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1233 goroutines, 94 MiB/14 MiB/135 MiB GO alloc/idle/total, 174 MiB/214 MiB CGO alloc/total, 1131.37cgo/sec, 0.96/0.06 %(u/s)time, 0.02 %gc (4x)
I180808 05:52:05.920835 137781 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:52:06.025337 139382 storage/store.go:3742  [n5,s5] handle raft ready: 0.5s [processed=0]
I180808 05:52:06.144238 143233 storage/replica_raftstorage.go:737  [n3,s3,r1/4:{-}] applying Raft snapshot at index 255 (id=4cf4ca80, encoded size=15797, 1 rocksdb batches, 41 log entries)
W180808 05:52:06.229055 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.2s
I180808 05:52:06.054117 143280 storage/store_snapshot.go:605  [raftsnapshot,n4,s4,r1/3:/{Min-System/}] streamed snapshot to (n3,s3):4: kv pairs: 131, log entries: 41, rate-limit: 8.0 MiB/sec, 355ms
I180808 05:52:06.249412 143229 storage/replica_raftstorage.go:743  [n5,s5,r23/5:{m-/Table/Syste…}] applied preemptive snapshot in 601ms [clear=0ms batch=0ms entries=537ms commit=63ms]
I180808 05:52:06.308721 143233 storage/replica_raftstorage.go:743  [n3,s3,r1/4:/{Min-System/}] applied Raft snapshot in 164ms [clear=0ms batch=0ms entries=162ms commit=1ms]
I180808 05:52:06.499058 143249 storage/store.go:2557  [replicaGC,n2,s2,r10/?:/Table/1{3-4}] removing replica
I180808 05:52:06.732927 138754 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1235 goroutines, 95 MiB/12 MiB/135 MiB GO alloc/idle/total, 173 MiB/215 MiB CGO alloc/total, 1095.65cgo/sec, 0.95/0.06 %(u/s)time, 0.02 %gc (4x)
I180808 05:52:06.821695 143289 storage/replica_command.go:745  [n4,s4,r23/3:{m-/Table/Syste…}] change replicas (ADD_REPLICA (n5,s5):6): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n2,s2):4, (n3,s3):2, (n4,s4):3, next=6]
I180808 05:52:06.951147 138772 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:52:07.097063 143249 storage/replica.go:844  [replicaGC,n2,s2,r10/?:/Table/1{3-4}] removed 460 (452+8) keys in 552ms [clear=0ms commit=552ms]
I180808 05:52:07.297548 137562 rpc/nodedialer/nodedialer.go:92  [intExec=adopt-job] connection to n2 established
I180808 05:52:07.411341 143387 rpc/nodedialer/nodedialer.go:92  [n2] connection to n1 established
I180808 05:52:07.588744 143339 storage/store.go:2557  [replicaGC,n2,s2,r17/?:/Table/2{0-1}] removing replica
I180808 05:52:07.988572 143195 storage/replica.go:3645  [n2,s2,r16/3:/Table/{19-20}] proposing REMOVE_REPLICA((n5,s5):2): updated=[(n4,s4):4 (n3,s3):5 (n2,s2):3] next=6
I180808 05:52:08.200782 143316 storage/store.go:2557  [replicaGC,n5,s5,r23/5:{m-/Table/Syste…}] removing replica
W180808 05:52:08.539062 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 3.1s
I180808 05:52:08.722544 139108 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 1229 goroutines, 95 MiB/12 MiB/135 MiB GO alloc/idle/total, 173 MiB/215 MiB CGO alloc/total, 1016.63cgo/sec, 0.95/0.06 %(u/s)time, 0.02 %gc (4x)
W180808 05:52:08.754285 139408 storage/store.go:3742  [n5,s5] handle raft ready: 0.9s [processed=0]
I180808 05:52:08.831825 143339 storage/replica.go:844  [replicaGC,n2,s2,r17/?:/Table/2{0-1}] removed 7 (0+7) keys in 342ms [clear=0ms commit=342ms]
I180808 05:52:08.888632 143316 storage/replica.go:844  [replicaGC,n5,s5,r23/5:{m-/Table/Syste…}] removed 8 (0+8) keys in 687ms [clear=0ms commit=686ms]
I180808 05:52:08.943429 139110 server/status/recorder.go:652  [n4,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:52:09.261645 139003 storage/store.go:3742  [n4,s4] handle raft ready: 0.5s [processed=0]
W180808 05:52:09.443742 139408 storage/store.go:3742  [n5,s5] handle raft ready: 0.7s [processed=0]
I180808 05:52:09.897789 141516 storage/store.go:3617  [n5,s5,r16/2:/Table/{19-20}] added to replica GC queue (peer suggestion)
W180808 05:52:09.947233 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 4.0s
I180808 05:52:10.771013 139295 server/status/runtime.go:424  [n5] runtime stats: 1.5 GiB RSS, 1235 goroutines, 83 MiB/20 MiB/135 MiB GO alloc/idle/total, 179 MiB/219 MiB CGO alloc/total, 1000.21cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (3x)
I180808 05:52:11.069171 142962 storage/replica.go:3645  [n1,s1,r9/1:/Table/1{2-3}] proposing ADD_REPLICA((n3,s3):4): updated=[(n1,s1):1 (n2,s2):2 (n4,s4):3 (n3,s3):4] next=5
I180808 05:52:11.382587 139297 server/status/recorder.go:652  [n5,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:52:11.602303 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 4.3s
W180808 05:52:11.773779 137757 storage/store.go:3742  [n1,s1] handle raft ready: 0.5s [processed=0]
I180808 05:52:11.935171 143066 storage/replica.go:3645  [n3,s3,r17/2:/Table/2{0-1}] proposing ADD_REPLICA((n2,s2):6): updated=[(n5,s5):4 (n3,s3):2 (n4,s4):5 (n2,s2):6] next=7
I180808 05:52:11.936282 143289 storage/replica.go:3645  [n4,s4,r23/3:{m-/Table/Syste…}] proposing ADD_REPLICA((n5,s5):6): updated=[(n2,s2):4 (n3,s3):2 (n4,s4):3 (n5,s5):6] next=7
I180808 05:52:12.186877 138721 storage/node_liveness.go:817  [n3,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (context done during DistSender.Send: context deadline exceeded)
W180808 05:52:12.187344 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 4.5s
W180808 05:52:12.187428 138721 storage/node_liveness.go:494  [n3,hb] failed node liveness heartbeat: context deadline exceeded
I180808 05:52:12.205273 143545 storage/replica_range_lease.go:554  [replicate,n1,s1,r9/1:/Table/1{2-3}] transferring lease to s4
I180808 05:52:12.623155 143278 storage/replica.go:3645  [n4,s4,r1/3:/{Min-System/}] proposing REMOVE_REPLICA((n1,s1):1): updated=[(n3,s3):4 (n5,s5):2 (n4,s4):3] next=5
I180808 05:52:12.811035 143489 storage/replica_raftstorage.go:520  [raftsnapshot,n1,s1,r9/1:/Table/1{2-3}] generated Raft snapshot 718f5a59 at index 47
I180808 05:52:12.866835 143605 storage/replica_command.go:745  [replicate,n3,s3,r17/2:/Table/2{0-1}] change replicas (REMOVE_REPLICA (n5,s5):4): read existing descriptor r17:/Table/2{0-1} [(n5,s5):4, (n3,s3):2, (n4,s4):5, (n2,s2):6, next=7]
I180808 05:52:13.012276 143489 storage/store_snapshot.go:605  [raftsnapshot,n1,s1,r9/1:/Table/1{2-3}] streamed snapshot to (n3,s3):4: kv pairs: 67, log entries: 37, rate-limit: 8.0 MiB/sec, 200ms
I180808 05:52:13.025093 143651 storage/replica_raftstorage.go:737  [n3,s3,r9/4:{-}] applying Raft snapshot at index 47 (id=718f5a59, encoded size=26739, 1 rocksdb batches, 37 log entries)
W180808 05:52:13.094030 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 3.6s
W180808 05:52:13.103012 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 3.1s
I180808 05:52:13.196176 139038 storage/replica_proposal.go:214  [n4,s4,r9/3:/Table/1{2-3}] new range lease repl=(n4,s4):3 seq=4 start=1533707532.205545851,0 epo=1 pro=1533707532.205564477,0 following repl=(n1,s1):1 seq=3 start=1533707435.338447836,0 epo=1 pro=1533707445.419393441,0
I180808 05:52:13.214760 143551 storage/replica_raftstorage.go:520  [raftsnapshot,n4,s4,r23/3:{m-/Table/Syste…}] generated Raft snapshot 6398e105 at index 62
W180808 05:52:13.310367 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.9s
I180808 05:52:13.365471 143545 storage/replica_range_lease.go:617  [replicate,n1,s1,r9/1:/Table/1{2-3}] done transferring lease to s4: <nil>
I180808 05:52:13.372501 143551 storage/store_snapshot.go:605  [raftsnapshot,n4,s4,r23/3:{m-/Table/Syste…}] streamed snapshot to (n5,s5):6: kv pairs: 14, log entries: 52, rate-limit: 8.0 MiB/sec, 73ms
I180808 05:52:13.383343 143655 storage/replica_raftstorage.go:737  [n5,s5,r23/6:{-}] applying Raft snapshot at index 62 (id=6398e105, encoded size=13687, 1 rocksdb batches, 52 log entries)
I180808 05:52:13.383920 143554 storage/replica_raftstorage.go:520  [replicate,n1,s1,r6/1:{/System/tse-m}] generated preemptive snapshot 2f64ad57 at index 33
I180808 05:52:13.407552 143599 storage/replica_raftstorage.go:520  [raftsnapshot,n3,s3,r17/2:/Table/2{0-1}] generated Raft snapshot 166a44db at index 66
I180808 05:52:13.530027 143599 storage/store_snapshot.go:605  [raftsnapshot,n3,s3,r17/2:/Table/2{0-1}] streamed snapshot to (n2,s2):6: kv pairs: 17, log entries: 56, rate-limit: 8.0 MiB/sec, 121ms
I180808 05:52:13.532590 143562 storage/replica_raftstorage.go:737  [n2,s2,r17/6:{-}] applying Raft snapshot at index 66 (id=166a44db, encoded size=16370, 1 rocksdb batches, 56 log entries)
I180808 05:52:13.543762 143554 storage/store_snapshot.go:605  [replicate,n1,s1,r6/1:{/System/tse-m}] streamed snapshot to (n2,s2):?: kv pairs: 12, log entries: 23, rate-limit: 2.0 MiB/sec, 149ms
I180808 05:52:13.547770 143701 storage/replica_raftstorage.go:737  [n2,s2,r6/?:{-}] applying preemptive snapshot at index 33 (id=2f64ad57, encoded size=8563, 1 rocksdb batches, 23 log entries)
I180808 05:52:14.034422 143156 storage/replica.go:3645  [n5,s5,r10/3:/Table/1{3-4}] proposing ADD_REPLICA((n2,s2):6): updated=[(n4,s4):4 (n3,s3):5 (n5,s5):3 (n2,s2):6] next=7
I180808 05:52:14.187783 139726 storage/store.go:3617  [n1,s1,r1/1:/{Min-System/}] added to replica GC queue (peer suggestion)
I180808 05:52:14.188758 139726 storage/store.go:3617  [n1,s1,r1/1:/{Min-System/}] added to replica GC queue (peer suggestion)
I180808 05:52:14.189489 139726 storage/store.go:3617  [n1,s1,r1/1:/{Min-System/}] added to replica GC queue (peer suggestion)
I180808 05:52:14.311189 143715 storage/store.go:2557  [replicaGC,n1,s1,r1/1:/{Min-System/}] removing replica
I180808 05:52:14.370355 143618 storage/replica_command.go:745  [replicate,n4,s4,r23/3:{m-/Table/Syste…}] change replicas (REMOVE_REPLICA (n5,s5):6): read existing descriptor r23:{m-/Table/SystemConfigSpan/Start} [(n2,s2):4, (n3,s3):2, (n4,s4):3, (n5,s5):6, next=7]
W180808 05:52:14.411761 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.7s
I180808 05:52:14.668371 143715 storage/replica.go:844  [replicaGC,n1,s1,r1/1:/{Min-System/}] removed 34 (24+10) keys in 356ms [clear=0ms commit=356ms]
I180808 05:52:14.700537 143512 storage/store.go:2557  [replicaGC,n5,s5,r16/2:/Table/{19-20}] removing replica
I180808 05:52:14.890886 138331 server/status/runtime.go:424  [n2] runtime stats: 1.5 GiB RSS, 1243 goroutines, 58 MiB/44 MiB/135 MiB GO alloc/idle/total, 180 MiB/220 MiB CGO alloc/total, 950.80cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (4x)
I180808 05:52:14.904395 138409 storage/compactor/compactor.go:374  [n2,s2,compactor] processing compaction #1-9/9 (/System/""-/Max) for 101 KiB (reasons: size=false used=true avail=false)
I180808 05:52:14.975382 143512 storage/replica.go:844  [replicaGC,n5,s5,r16/2:/Table/{19-20}] removed 8 (0+8) keys in 272ms [clear=0ms commit=272ms]
I180808 05:52:15.261718 138409 storage/compactor/compactor.go:390  [n2,s2,compactor] processed compaction #1-9/9 (/System/""-/Max) for 101 KiB in 0.4s
I180808 05:52:15.453442 138333 server/status/recorder.go:652  [n2,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:52:15.529042 137779 server/status/runtime.go:424  [n1] runtime stats: 1.5 GiB RSS, 1243 goroutines, 67 MiB/36 MiB/135 MiB GO alloc/idle/total, 166 MiB/206 MiB CGO alloc/total, 998.37cgo/sec, 0.95/0.05 %(u/s)time, 0.02 %gc (3x)
W180808 05:52:15.575632 139117 storage/node_liveness.go:558  [n4,hb] slow heartbeat took 1.6s
W180808 05:52:15.604753 139032 storage/store.go:3742  [n4,s4] handle raft ready: 0.9s [processed=0]
I180808 05:52:15.705667 143755 storage/replica_range_lease.go:554  [replicate,n5,s5,r10/3:/Table/1{3-4}] transferring lease to s4
I180808 05:52:15.808877 137781 server/status/recorder.go:652  [n1,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
W180808 05:52:16.503045 137798 storage/node_liveness.go:558  [n1,hb] slow heartbeat took 2.0s
I180808 05:52:16.583090 138754 server/status/runtime.go:424  [n3] runtime stats: 1.5 GiB RSS, 1254 goroutines, 86 MiB/18 MiB/135 MiB GO alloc/idle/total, 175 MiB/214 MiB CGO alloc/total, 1047.99cgo/sec, 0.98/0.05 %(u/s)time, 0.02 %gc (3x)
W180808 05:52:16.985104 138661 storage/closedts/provider/provider.go:142  [ct-closer] unable to move closed timestamp forward: not live
W180808 05:52:16.996373 139053 storage/store.go:3742  [n4,s4] handle raft ready: 0.6s [processed=0]
I180808 05:52:17.024499 138772 server/status/recorder.go:652  [n3,summaries] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180808 05:52:17.054950 139051 storage/replica_proposal.go:214  [n4,s4,r10/4:/Table/1{3-4}] new range lease repl=(n4,s4):4 seq=5 start=1533707535.706034414,0 epo=1 pro=1533707535.706075386,0 following repl=(n5,s5):3 seq=4 start=1533707486.361902558,0 epo=1 pro=1533707486.362031892,0
W180808 05:52:17.198969 138452 storage/node_liveness.go:558  [n2,hb] slow heartbeat took 2.3s
I180808 05:52:17.332023 143755 storage/replica_range_lease.go:617  [replicate,n5,s5,r10/3:/Table/1{3-4}] done transferring lease to s4: <nil>
I180808 05:52:17.350376 143694 storage/replica_raftstorage.go:520  [raftsnapshot,n5,s5,r10/3:/Table/1{3-4}] generated Raft snapshot fddb3db3 at index 222
I180808 05:52:17.488006 143694 storage/store_snapshot.go:605  [raftsnapshot,n5,s5,r10/3:/Table/1{3-4}] streamed snapshot to (n2,s2):6: kv pairs: 510, log entries: 77, rate-limit: 8.0 MiB/sec, 80ms
I180808 05:52:17.589299 143787 storage/replica_raftstorage.go:737  [n2,s2,r10/6:{-}] applying Raft snapshot at index 222 (id=fddb3db3, encoded size=115855, 1 rocksdb batches, 77 log entries)
W180808 05:52:17.686283 137707 storage/store.go:3742  [n1,s1] handle raft ready: 0.9s [processed=0]
I180808 05:52:18.025822 137557 storage/replicate_queue_test.go:218  SucceedsSoon: range not found on store {5 5}
I180808 05:52:18.034708 143728 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] storage.RaftTransport: processing snapshot
1      ts.poller: poll
1      node.Node: writing summary
I180808 05:52:18.036191 143727 util/stop/stopper.go:537  quiescing; tasks left:
8      node.Node: batch
1      ts.poller: poll
1      node.Node: writing summary
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
I180808 05:52:18.037309 143729 util/stop/stopper.go:537  quiescing; tasks left:
2      node.Node: batch
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      node.Node: writing summary
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.039051 143730 util/stop/stopper.go:537  quiescing; tasks left:
2      storage.intentResolver: processing intents
2      [async] storage.pendingLeaseRequest: requesting lease
10     node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.replica consistency checker: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.raftlog: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180808 05:52:18.072228 143891 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] kv.TxnCoordSender: heartbeat loop
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.080749 143727 util/stop/stopper.go:537  quiescing; tasks left:
7      node.Node: batch
1      ts.poller: poll
1      node.Node: writing summary
1      [async] transport racer
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
I180808 05:52:18.110127 137520 kv/transport_race.go:67  transport race promotion: ran 35 iterations on up to 3178 requests
W180808 05:52:18.124290 140493 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 4: EOF:
I180808 05:52:18.127667 143730 util/stop/stopper.go:537  quiescing; tasks left:
2      storage.intentResolver: processing intents
2      [async] storage.pendingLeaseRequest: requesting lease
10     node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.replica consistency checker: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
W180808 05:52:18.159383 138849 gossip/gossip.go:1325  [n4] no incoming or outgoing connections
I180808 05:52:18.159706 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: writing summary
1      node.Node: batch
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.RaftTransport: processing snapshot
1      [async] kv.TxnCoordSender: heartbeat loop
W180808 05:52:18.173748 143860 internal/client/txn.go:611  [intExec=adopt-job] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
E180808 05:52:18.175180 138454 jobs/registry.go:285  error while adopting jobs: adopt-job: node unavailable; try another peer
W180808 05:52:18.194234 138900 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 3: EOF:
W180808 05:52:18.204864 143817 internal/client/txn.go:611  [n5,s5,r17/4:/Table/2{0-1}] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=context canceled [exhausted])
I180808 05:52:18.205320 143817 storage/node_liveness.go:817  [n5,s5,r17/4:/Table/2{0-1}] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=context canceled [exhausted])
I180808 05:52:18.217871 143730 util/stop/stopper.go:537  quiescing; tasks left:
2      storage.intentResolver: processing intents
10     node.Node: batch
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.replica consistency checker: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] kv.TxnCoordSender: heartbeat loop
I180808 05:52:18.221275 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] storage.RaftTransport: processing snapshot
1      [async] kv.TxnCoordSender: heartbeat loop
W180808 05:52:18.226777 138879 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = context canceled:
W180808 05:52:18.244351 141324 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = context canceled:
E180808 05:52:18.246660 143799 storage/queue.go:778  [replicaGC,n1,s1,r5/1:/System/ts{d-e}] failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W180808 05:52:18.291874 139853 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180808 05:52:18.234340 140970 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = context canceled:
E180808 05:52:18.322163 143707 storage/consistency_queue.go:125  [replica consistency checker,n4,s4,r9/3:/Table/1{2-3}] result is ambiguous (server shutdown)
E180808 05:52:18.322613 143707 storage/queue.go:778  [replica consistency checker,n4,s4,r9/3:/Table/1{2-3}] result is ambiguous (server shutdown)
W180808 05:52:18.249333 141048 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 4: EOF:
W180808 05:52:18.260195 140585 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180808 05:52:18.260615 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] storage.RaftTransport: processing snapshot
W180808 05:52:18.232705 139728 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 1: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180808 05:52:18.266756 138632 gossip/gossip.go:1325  [n3] no incoming or outgoing connections
E180808 05:52:18.331541 143745 storage/queue.go:778  [replicaGC,n5,s5,r2/3:/System/{-NodeLive…}] failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
I180808 05:52:18.335842 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] storage.RaftTransport: processing snapshot
W180808 05:52:18.268542 141323 storage/raft_transport.go:574  [n3] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180808 05:52:18.270897 138721 internal/client/txn.go:611  [n3,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [exhausted])
I180808 05:52:18.336742 138721 storage/node_liveness.go:817  [n3,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [exhausted])
W180808 05:52:18.281883 143566 storage/intent_resolver.go:642  [n4,s4] failed to push during intent resolution: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
E180808 05:52:18.340367 143878 storage/queue.go:778  [replicaGC,n4,s4,r6/3:{/System/tse-m}] failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180808 05:52:18.341204 143618 internal/client/txn.go:611  [replicate,n4,s4,r23/3:{m-/Table/Syste…}] failure aborting transaction: node unavailable; try another peer; abort caused by: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
E180808 05:52:18.341703 143618 storage/queue.go:778  [replicate,n4,s4,r23/3:{m-/Table/Syste…}] change replicas of r23 failed: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
W180808 05:52:18.343658 139200 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 1: EOF:
W180808 05:52:18.355163 139840 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 5: EOF:
W180808 05:52:18.356458 139347 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 2: EOF:
W180808 05:52:18.283679 138291 gossip/gossip.go:1325  [n2] no incoming or outgoing connections
W180808 05:52:18.366339 143605 internal/client/txn.go:611  [replicate,n3,s3,r17/2:/Table/2{0-1}] failure aborting transaction: node unavailable; try another peer; abort caused by: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
E180808 05:52:18.366708 143605 storage/queue.go:778  [replicate,n3,s3,r17/2:/Table/2{0-1}] change replicas of r17 failed: failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W180808 05:52:18.370344 141514 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180808 05:52:18.372387 139480 internal/client/txn.go:611  [n5,hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (error=context canceled [exhausted])
I180808 05:52:18.544533 139480 storage/node_liveness.go:817  [n5,hb] retrying liveness update after storage.errRetryLiveness: result is ambiguous (error=context canceled [exhausted])
I180808 05:52:18.391471 143727 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: writing summary
1      node.Node: batch
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
W180808 05:52:18.391786 143616 storage/intent_resolver.go:710  [n4,s4] failed to cleanup transaction intents: could not GC completed transaction anchored at /Local/Range/Min/RangeDescriptor: result is ambiguous (server shutdown)
W180808 05:52:18.392302 141348 storage/raft_transport.go:574  [n5] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
W180808 05:52:18.393392 139268 gossip/gossip.go:1325  [n5] no incoming or outgoing connections
W180808 05:52:18.396991 138332 ts/db.go:195  [n2,ts-poll] error writing time series data: failed to send RPC: sending to all 1 replicas failed; last error: <nil> rpc error: code = Unavailable desc = transport is closing
W180808 05:52:18.397458 140598 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 4: rpc error: code = Unavailable desc = transport is closing:
W180808 05:52:18.398528 138333 server/node.go:844  [n2,summaries] error recording status summaries: failed to send RPC: sending to all 1 replicas failed; last error: <nil> rpc error: code = Unavailable desc = transport is closing
I180808 05:52:18.398771 143728 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] storage.RaftTransport: processing snapshot
1      ts.poller: poll
W180808 05:52:18.548886 143690 storage/intent_resolver.go:710  [n5,s5] failed to cleanup transaction intents: failed to resolve intents: failed to send RPC: sending to all 3 replicas failed; last error: <nil> context canceled
E180808 05:52:18.404773 143741 storage/queue.go:778  [replicaGC,n3,s3,r5/2:/System/ts{d-e}] failed to send RPC: sending to all 3 replicas failed; last error: (err: node unavailable; try another peer) <nil>
W180808 05:52:18.407564 141542 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = grpc: the client connection is closing:
E180808 05:52:18.549871 143551 storage/queue.go:778  [raftsnapshot,n4,s4,r23/3:{m-/Table/Syste…}] snapshot failed: (n5,s5):6: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
W180808 05:52:18.550150 138332 ts/db.go:198  [n2,ts-poll] node unavailable; try another peer
I180808 05:52:18.550531 143728 util/stop/stopper.go:537  quiescing; tasks left:
3      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.551989 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.pendingLeaseRequest: requesting lease
1      [async] storage.RaftTransport: processing snapshot
W180808 05:52:18.407916 141443 storage/raft_transport.go:574  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180808 05:52:18.409743 143554 storage/replicate_queue.go:241  [replicate,n1,s1,r6/1:{/System/tse-m}] snapshot failed: (n2,s2):?: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
W180808 05:52:18.411268 139632 storage/raft_transport.go:574  [n1] while processing outgoing Raft queue to node 4: rpc error: code = Canceled desc = grpc: the client connection is closing:
I180808 05:52:18.413921 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: writing summary
1      [async] storage.replicaGC: processing replica
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.554207 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: writing summary
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.416245 143730 util/stop/stopper.go:537  quiescing; tasks left:
2      node.Node: batch
1      ts.poller: poll
1      storage.intentResolver: processing intents
1      [async] storage.raftsnapshot: processing replica
1      [async] kv.TxnCoordSender: heartbeat loop
I180808 05:52:18.542335 139108 server/status/runtime.go:424  [n4] runtime stats: 1.5 GiB RSS, 787 goroutines, 71 MiB/33 MiB/135 MiB GO alloc/idle/total, 138 MiB/195 MiB CGO alloc/total, 1121.49cgo/sec, 0.95/0.05 %(u/s)time, 0.03 %gc (3x)
W180808 05:52:18.556000 137781 server/node.go:844  [n1,summaries] error recording status summaries: node unavailable; try another peer
I180808 05:52:18.556354 143727 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
1      [async] storage.replicate: processing replica
1      [async] storage.raftsnapshot: processing replica
W180808 05:52:18.542614 140490 storage/raft_transport.go:574  [n4] while processing outgoing Raft queue to node 5: rpc error: code = Canceled desc = context canceled:
I180808 05:52:18.556680 143730 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
W180808 05:52:18.585199 138772 server/node.go:873  [n3,summaries] health alerts detected: {Alerts:[{StoreID:0 Category:METRICS Description:liveness.heartbeatfailures Value:1} {StoreID:3 Category:METRICS Description:queue.replicagc.process.failure Value:1} {StoreID:3 Category:METRICS Description:queue.replicate.process.failure Value:1}]}
W180808 05:52:18.621562 143817 internal/client/txn.go:611  [n5,s5,r17/4:/Table/2{0-1}] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
E180808 05:52:18.621967 143817 storage/replica_range_lease.go:294  [n5,s5,r17/4:/Table/2{0-1}] node unavailable; try another peer
I180808 05:52:18.624079 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.raftsnapshot: processing replica
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.624886 143730 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
W180808 05:52:18.626132 138721 internal/client/txn.go:611  [n3,hb] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
W180808 05:52:18.626493 138721 storage/node_liveness.go:558  [n3,hb] slow heartbeat took 1.9s
W180808 05:52:18.719993 138721 storage/node_liveness.go:494  [n3,hb] failed node liveness heartbeat: node unavailable; try another peer
W180808 05:52:18.634722 139480 internal/client/txn.go:611  [n5,hb] failure aborting transaction: kv/txn_interceptor_intent_collector.go:102: cannot commit a read-only transaction; abort caused by: node unavailable; try another peer
W180808 05:52:18.720535 139480 storage/node_liveness.go:558  [n5,hb] slow heartbeat took 2.5s
W180808 05:52:18.720614 139480 storage/node_liveness.go:494  [n5,hb] failed node liveness heartbeat: node unavailable; try another peer
W180808 05:52:18.637190 138772 gossip/infostore.go:303  [n3] node unavailable; try another peer
E180808 05:52:18.637879 138306 storage/replica_proposal.go:189  [n2,s2,r9/2:/Table/1{2-3}] could not run async checksum computation (ID = 3fb85960-13d4-43fe-b33e-14d2bc78443a): node unavailable; try another peer
E180808 05:52:18.646057 143694 storage/queue.go:778  [raftsnapshot,n5,s5,r10/3:/Table/1{3-4}] snapshot failed: (n2,s2):6: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
I180808 05:52:18.648893 143727 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
1      [async] storage.raftsnapshot: processing replica
I180808 05:52:18.724913 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.RaftTransport: processing snapshot
E180808 05:52:18.725218 143599 storage/queue.go:778  [raftsnapshot,n3,s3,r17/2:/Table/2{0-1}] snapshot failed: (n2,s2):6: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
E180808 05:52:18.725551 137704 storage/replica_proposal.go:189  [n1,s1,r9/1:/Table/1{2-3}] could not run async checksum computation (ID = 3fb85960-13d4-43fe-b33e-14d2bc78443a): node unavailable; try another peer
I180808 05:52:18.725802 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: writing summary
1      [async] storage.RaftTransport: processing snapshot
E180808 05:52:18.726884 143489 storage/queue.go:778  [raftsnapshot,n1,s1,r9/1:/Table/1{2-3}] snapshot failed: (n3,s3):4: remote failed to apply snapshot: rpc error: code = Canceled desc = grpc: the client connection is closing
E180808 05:52:18.727179 139033 storage/replica_proposal.go:189  [n4,s4,r9/3:/Table/1{2-3}] could not run async checksum computation (ID = 3fb85960-13d4-43fe-b33e-14d2bc78443a): node unavailable; try another peer
I180808 05:52:18.727477 143727 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      node.Node: batch
W180808 05:52:18.732530 138772 server/node.go:844  [n3,summaries] error recording status summaries: node unavailable; try another peer
I180808 05:52:18.732643 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
1      [async] storage.RaftTransport: processing snapshot
W180808 05:52:18.757656 137707 gossip/infostore.go:303  [n1] node unavailable; try another peer
I180808 05:52:18.773390 143701 storage/replica_raftstorage.go:743  [n2,s2,r6/?:{/System/tse-m}] applied preemptive snapshot in 5225ms [clear=0ms batch=0ms entries=5146ms commit=63ms]
W180808 05:52:18.774314 143701 storage/store.go:1700  [n2,s2,r6/?:{/System/tse-m}] unable to gossip on capacity change: node unavailable; try another peer
I180808 05:52:18.775529 143728 util/stop/stopper.go:537  quiescing; tasks left:
2      [async] storage.RaftTransport: processing snapshot
I180808 05:52:18.824395 143651 storage/replica_raftstorage.go:743  [n3,s3,r9/4:/Table/1{2-3}] applied Raft snapshot in 5799ms [clear=0ms batch=0ms entries=5797ms commit=1ms]
W180808 05:52:18.825252 143651 storage/store.go:1700  [n3,s3,r9/4:/Table/1{2-3}] unable to gossip on capacity change: node unavailable; try another peer
W180808 05:52:18.831920 143651 rpc/nodedialer/nodedialer.go:89  [n3] unable to connect to n1: unable to dial n1: breaker open
I180808 05:52:18.837138 143729 util/stop/stopper.go:537  quiescing; tasks left:
1      ts.poller: poll
W180808 05:52:18.886962 139110 server/node.go:844  [n4,summaries] error recording status summaries: node unavailable; try another peer
W180808 05:52:18.916636 139432 rpc/nodedialer/nodedialer.go:89  [n5] unable to connect to n4: unable to dial n4: breaker open
I180808 05:52:18.941116 138748 storage/compactor/compactor.go:329  [n3,s3,compactor] purging suggested compaction for range /Table/12 - /Table/13 that contains live data
W180808 05:52:19.010389 139296 ts/db.go:195  [n5,ts-poll] error writing time series data: node unavailable; try another peer
I180808 05:52:19.053281 143562 storage/replica_raftstorage.go:743  [n2,s2,r17/6:/Table/2{0-1}] applied Raft snapshot in 5520ms [clear=0ms batch=0ms entries=5516ms commit=3ms]
I180808 05:52:19.055909 143728 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] storage.RaftTransport: processing snapshot
I180808 05:52:19.063168 143655 storage/replica_raftstorage.go:743  [n5,s5,r23/6:{m-/Table/Syste…}] applied Raft snapshot in 5679ms [clear=0ms batch=0ms entries=5668ms commit=10ms]
W180808 05:52:19.066185 139076 rpc/nodedialer/nodedialer.go:89  [n4] unable to connect to n3: context canceled
W180808 05:52:19.097035 139296 ts/db.go:198  [n5,ts-poll] node unavailable; try another peer
I180808 05:52:19.097404 143891 util/stop/stopper.go:537  quiescing; tasks left:
1      [async] storage.RaftTransport: processing snapshot
W180808 05:52:19.097702 143655 storage/store.go:1700  [n5,s5,r23/6:{m-/Table/Syste…}] unable to gossip on capacity change: node unavailable; try another peer
W180808 05:52:19.188925 137780 ts/db.go:195  [n1,ts-poll] error writing time series data: node unavailable; try another peer
W180808 05:52:19.189366 137780 ts/db.go:198  [n1,ts-poll] node unavailable; try another peer
I180808 05:52:19.189608 143727 util/stop/stopper.go:537  quiescing; tasks left:
1      node.Node: batch
I180808 05:52:19.312269 143787 storage/replica_raftstorage.go:743  [n2,s2,r10/6:/Table/1{3-4}] applied Raft snapshot in 1723ms [clear=0ms batch=0ms entries=1661ms commit=60ms]
W180808 05:52:19.322134 139432 rpc/nodedialer/nodedialer.go:89  [n5] unable to connect to n3: context canceled
W180808 05:52:19.455613 138741 rpc/nodedialer/nodedialer.go:89  [n3] unable to connect to n2: unable to dial n2: breaker open
W180808 05:52:19.456795 138741 rpc/nodedialer/nodedialer.go:89  [n3] unable to connect to n4: unable to dial n4: breaker open
W180808 05:52:19.560033 139076 rpc/nodedialer/nodedialer.go:89  [n4] unable to connect to n5: context canceled
W180808 05:52:19.560738 139076 rpc/nodedialer/nodedialer.go:89  [n4] unable to connect to n2: context canceled
W180808 05:52:19.863422 139109 ts/db.go:195  [n4,ts-poll] error writing time series data: node unavailable; try another peer
W180808 05:52:19.900766 139109 ts/db.go:198  [n4,ts-poll] node unavailable; try another peer
W180808 05:52:19.915194 137769 rpc/nodedialer/nodedialer.go:89  [n1] unable to connect to n2: context canceled
W180808 05:52:19.933823 137769 rpc/nodedialer/nodedialer.go:89  [n1] unable to connect to n5: unable to dial n5: breaker open
W180808 05:52:19.915884 139432 rpc/nodedialer/nodedialer.go:89  [n5] unable to connect to n2: context canceled
W180808 05:52:20.079588 139109 ts/db.go:198  [n4,ts-poll] node unavailable; try another peer
W180808 05:52:20.083846 138771 ts/db.go:195  [n3,ts-poll] error writing time series data: node unavailable; try another peer
W180808 05:52:20.109921 138771 ts/db.go:198  [n3,ts-poll] node unavailable; try another peer
--- FAIL: TestReplicateQueueDownReplicate (105.35s)
	soon.go:49: condition failed to evaluate within 45s: range not found on store {5 5}
		goroutine 137557 [running]:
		runtime/debug.Stack(0xa7a358200, 0xc4216579b0, 0x39e2080)
			/usr/local/go/src/runtime/debug/stack.go:24 +0xb5
		github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon(0x3a3f2e0, 0xc420ea6000, 0xc4217c5c70)
			/go/src/github.com/cockroachdb/cockroach/pkg/testutils/soon.go:50 +0x172
		github.com/cockroachdb/cockroach/pkg/storage_test.TestReplicateQueueDownReplicate(0xc420ea6000)
			/go/src/github.com/cockroachdb/cockroach/pkg/storage/replicate_queue_test.go:218 +0x341
		testing.tRunner(0xc420ea6000, 0x33251c8)
			/usr/local/go/src/testing/testing.go:777 +0x16e
		created by testing.(*T).Run
			/usr/local/go/src/testing/testing.go:824 +0x565

@cockroach-teamcity cockroach-teamcity added this to the 2.1 milestone Aug 8, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Aug 8, 2018
@vivekmenezes vivekmenezes removed their assignment Aug 8, 2018
@tbg
Copy link
Member

tbg commented Aug 14, 2018

seeing "range not found on store {5 5}" repeatedly from this code, presumably while handling the server whose nodeID is 5.

		testutils.SucceedsSoon(t, func() error {
			_, err := tc.AddReplicas(testKey, roachpb.ReplicationTarget{
				NodeID:  nodeID,
				StoreID: server.GetFirstStoreID(),
			})
			if testutils.IsError(err, allowedErrs) {
				return nil
			}
			return err
		})

Looking at AddReplicas this makes sense: that code upreplicates and then waits for the replica to appear. But it might've been GC'ed in the meantime, for example here:

I180808 05:52:14.700537 143512 storage/store.go:2557  [replicaGC,n5,s5,r16/2:/Table/{19-20}] removing replica

So this looks like a problem in AddReplicas.

@tbg tbg added the A-kv-client Relating to the KV client and the KV interface. label Aug 14, 2018
@tbg tbg self-assigned this Aug 14, 2018
@petermattis
Copy link
Collaborator

It is possible this was fixed by #28877. I'm going to try reproducing before and after that change.

@petermattis
Copy link
Collaborator

Nope, this still happens on current master with make stressrace PKG=./storage/ TESTS=TestReplicateQueueDownReplicate

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9ee43b32c83db6418070f92d9d698c1890d80b9e

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueDownReplicate PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=861628&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3cdd221e01048154f705e1efb130afb42338b462

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueDownReplicate PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=862681&tab=buildLog

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d57a1f0f985b95226e38f20a2afa006b9e48e92e

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestReplicateQueueDownReplicate PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=868416&tab=buildLog

tbg added a commit to tbg/cockroach that referenced this issue Sep 20, 2018
AddReplicas was verifying that a replica had indeed been added, but
there's no guarantee that the replicate queue wouldn't have removed
it in the meantime. Attempt to work around this somewhat. The real
solution is not to provide that guarantee, but some tests likely
rely on it (and the failure is extremely rare, i.e. the new for
loop basically never runs).

Observed in cockroachdb#28368.

Release note: None
@tbg
Copy link
Member

tbg commented Sep 20, 2018

Hmm, reproing my earlier observation is difficult as this test likes to just completely slam the CPUs and times out under stressrace.

I opened #30455, but will close this as it doesn't appear to repro in nightlies any more.

@tbg tbg closed this as completed Sep 20, 2018
craig bot pushed a commit that referenced this issue Sep 20, 2018
30405: roachtest: mark acceptance as stable r=petermattis a=tschottdorf

all of its subtests are already stable, but in running a test locally I
noticed that the top-level test was marked as passing as unstable. I'm
not sure, but this might mean that the top-level test would actually not
fail? Either way, better to mark it as stable explicitly.

We should also spend some thought on how diverging notions of Stable in
sub vs top level test are treated, not sure that this is well-defined.

Release note: None

30446: opt: fix panic when srf used with GROUP BY r=rytaft a=rytaft

Instead of panicking, we now throw an appropriate error.

Fixes #30412

Release note (bug fix): Fixed a panic that occurred when a
generator function such as unnest was used in the SELECT list
in the presence of GROUP BY.

30450: roachtest: remove now-unnecessary hack r=petermattis a=tschottdorf

Closes #27717.

Release note: None

30451: storage: give TestReplicateRemovedNodeDisruptiveElection more time r=petermattis a=tschottdorf

Perhaps:

Fixes #27253.

Release note: None

30452: storage: de-flake TestReplicaIDChangePending r=petermattis a=tschottdorf

setReplicaID refreshes the proposal and was thus synchronously writing
to the commandProposed chan. This channel could have filled up due to
an earlier reproposal already, deadlocking the test.

Fixes #28132.

Release note: None

30455: testcluster: improve AddReplicas check r=petermattis a=tschottdorf

AddReplicas was verifying that a replica had indeed been added, but
there's no guarantee that the replicate queue wouldn't have removed
it in the meantime. Attempt to work around this somewhat. The real
solution is not to provide that guarantee, but some tests likely
rely on it (and the failure is extremely rare, i.e. the new for
loop basically never runs).

Observed in #28368.

Release note: None

30456: storage: unskip TestClosedTimestampCanServe for non-race r=petermattis a=tschottdorf

Fixes #28607.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this issue Sep 21, 2018
AddReplicas was verifying that a replica had indeed been added, but
there's no guarantee that the replicate queue wouldn't have removed
it in the meantime. Attempt to work around this somewhat. The real
solution is not to provide that guarantee, but some tests likely
rely on it (and the failure is extremely rare, i.e. the new for
loop basically never runs).

Observed in cockroachdb#28368.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants