Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: kv/contention/nodes=4 failed #50864

Closed
cockroach-teamcity opened this issue Jul 1, 2020 · 4 comments
Closed

roachtest: kv/contention/nodes=4 failed #50864

cockroach-teamcity opened this issue Jul 1, 2020 · 4 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).kv/contention/nodes=4 failed on master@3a03f3843a8cdf04f82c52753c61cf01b0d2ddcd:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/contention/nodes=4/run_1
	ts_util.go:48,ts_util.go:102,kv.go:259,cluster.go:2445,errgroup.go:57: Post "http://35.238.255.11:26258/ts/query": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

	cluster.go:2467,kv.go:262,test_runner.go:757: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.registerKVContention.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/kv.go:262
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2425
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5420
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *errors.errorString

More

Artifacts: /kv/contention/nodes=4
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 1, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jul 1, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).kv/contention/nodes=4 failed on master@3e0de239121813ea4d47873388a2828a66d9edf7:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/contention/nodes=4/run_1
	ts_util.go:130,kv.go:259,cluster.go:2445,errgroup.go:57: spent 75.438596% of time below target of 100.000000 txn/s, wanted no more than 5.000000%

	cluster.go:2467,kv.go:262,test_runner.go:757: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.registerKVContention.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/kv.go:262
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2425
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5420
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *errors.errorString

More

Artifacts: /kv/contention/nodes=4
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@darinpp
Copy link
Contributor

darinpp commented Jul 7, 2020

Hits the 10m timeout with this message repeating:
refreshing table: 0 lease failed: descriptor not found

@irfansharif
Copy link
Contributor

Second failure.

_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
  901.0s        0          561.0          632.4     14.2   1342.2   3355.4   7516.2 write
  902.0s        0          704.0          632.4     24.1   1006.6   2818.6   7516.2 write
  903.0s        0          760.8          632.6     22.0    503.3   1946.2   7516.2 write
  904.0s        0          679.1          632.6     14.2    973.1   1543.5  10200.5 write
  905.0s        0          351.1          632.3      9.4   1040.2   1610.6   9663.7 write
  906.0s        0          637.9          632.3     17.8   1677.7   3221.2   7784.6 write
  907.0s        0          549.9          632.2     13.1    738.2   2147.5  10200.5 write
  908.0s        0          543.1          632.1     13.1   1342.2   4160.7  11274.3 write
  909.0s        0          477.0          632.0     14.7   1275.1   2013.3   6979.3 write
  910.0s        0           92.0          631.4     13.1   1543.5   6979.3   8589.9 write
  911.0s        0            8.0          630.7    436.2   2281.7   2281.7   2281.7 write
  912.0s        0           11.0          630.0    469.8   3355.4   3489.7   3489.7 write
  913.0s        0            7.0          629.3    604.0   4026.5   4026.5   4026.5 write
  914.0s        0            9.0          628.7    637.5   7516.2   7516.2   7516.2 write
  915.0s        0            5.0          628.0    570.4    637.5    637.5    637.5 write
  916.0s        0            4.0          627.3    453.0    704.6    704.6    704.6 write
  917.0s        0            2.0          626.6    520.1    520.1    520.1    520.1 write
  918.0s        0            2.0          625.9    536.9    570.4    570.4    570.4 write
  919.0s        0            0.0          625.3      0.0      0.0      0.0      0.0 write
  920.0s        0            0.0          624.6      0.0      0.0      0.0      0.0 write

Test was happily chugging along before zero-ing out for good. Couldn't fetch logs/debug zip after.

 pghosts: GetInternalIP: failed to execute hostname on teamcity-2063962-1593929497-19-n5cpu4:1:: exit status 255

Is this an infra-flake? Saw it happen #50763 (comment).


First failure is different, and more interesting. I think this is something real we should look at.

Seeing a "clogging" of the txn wait queue:

W200701 07:17:56.790170 2385768 kv/kvserver/txnwait/queue.go:517  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher b3aa9fa7: have been waiting 60.00s for pushee 7cc13337
W200701 07:17:56.790232 2385770 kv/kvserver/txnwait/queue.go:517  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher 656c5154: have been waiting 60.00s for pushee 7cc13337
W200701 07:17:56.790283 2385769 kv/kvserver/txnwait/queue.go:517  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher ead50c0d: have been waiting 60.00s for pushee 7cc13337
W200701 07:19:13.401635 2385755 kv/kvserver/txnwait/queue.go:524  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher 1b1c7514: finished waiting after 136.62s for pushee 7cc13337
W200701 07:19:13.387942 2385769 kv/kvserver/txnwait/queue.go:524  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher ead50c0d: finished waiting after 136.60s for pushee 7cc13337
W200701 07:19:13.407373 2385735 kv/kvserver/txnwait/queue.go:524  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher 5c17a21c: finished waiting after 136.62s for pushee 691a2226
W200701 07:19:13.407652 2385748 kv/kvserver/txnwait/queue.go:524  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher 0c0bc0f9: finished waiting after 136.62s for pushee 691a2226
W200701 07:19:13.407819 2385756 kv/kvserver/txnwait/queue.go:524  [n3,s3,r53/3:/Table/53/1/-7{77988…-37869…}] pusher 76428fbb: finished waiting after 136.62s for pushee 691a2226

Elsewhere:

W200701 08:01:15.010964 3035403 kv/kvserver/txnwait/queue.go:524  [n4,s4,r35/2:/Table/53{-/1/-7378…}] pusher bb549a41: finished waiting after 2520.68s for pushee 4038db62
W200701 08:01:15.011053 3019350 kv/kvserver/txnwait/queue.go:524  [n4,s4,r35/2:/Table/53{-/1/-7378…}] pusher 78f8f6e4: finished waiting after 2625.09s for pushee 01191fd9
W200701 08:01:15.012172 3035396 kv/kvserver/txnwait/queue.go:524  [n4,s4,r35/2:/Table/53{-/1/-7378…}] pusher 468b03f7: finished waiting after 2520.68s for pushee 4038db62
W200701 08:01:15.013202 3035390 kv/kvserver/txnwait/queue.go:524  [n4,s4,r35/2:/Table/53{-/1/-7378…}] pusher 2bbdc0c7: finished waiting after 2520.68s for pushee 4038db62

Assigning myself to bring it up tomorrow.

@knz
Copy link
Contributor

knz commented Jul 22, 2020

most recent: #45698

@knz knz closed this as completed Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

6 participants