Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: TestZip failed #110164

Closed
cockroach-teamcity opened this issue Sep 7, 2023 · 1 comment
Closed

cli: TestZip failed #110164

cockroach-teamcity opened this issue Sep 7, 2023 · 1 comment
Assignees
Labels
A-observability-inf branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 7, 2023

cli.TestZip failed with artifacts on release-22.2 @ af6752b2faf39115fd322e9db76421e71b76507d:

I230907 09:54:38.039173 13895236 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r46/1:/Table/4{5-6}] 472  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.080911 13897202 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r11/1:/Table/{8-11}] 473  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.103720 13897886 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r20/1:/Table/{19-20}] 474  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.109921 13898622 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r26/1:/Table/2{5-6}] 475  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.121637 13898907 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r28/1:/Table/2{7-8}] 476  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.134691 13899617 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r35/1:/Table/3{4-5}] 477  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.142739 13900172 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r37/1:/Table/3{6-7}] 478  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.156549 13900541 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r41/1:/Table/4{0-1}] 479  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.169382 13900902 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r42/1:/Table/4{1-2}] 480  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.173351 13901354 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r43/1:/Table/4{2-3}] 481  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.181582 13901576 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r44/1:/Table/4{3-4}] 482  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.189638 13901914 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r45/1:/Table/4{4-5}] 483  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.199787 13902467 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r47/1:/Table/4{6-7}] 484  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:38.208794 13902758 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r49/1:/Table/{48-50}] 485  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:39.024863 13933313 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r25/1:/Table/2{4-5}] 486  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:40.025940 13953789 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r33/1:/Table/3{2-3}] 487  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:41.037113 13587318 2@util/log/event_log.go:32  [n1] 488 ={"Timestamp":1694080481037101215,"EventType":"runtime_stats","MemRSSBytes":515031040,"GoroutineCount":1521,"MemStackSysBytes":6717440,"GoAllocBytes":193288648,"GoTotalBytes":371677592,"GoStatsStaleness":0.002554685,"HeapFragmentBytes":91571768,"HeapReservedBytes":32735232,"HeapReleasedBytes":334192640,"CGoAllocBytes":22116000,"CGoTotalBytes":42631168,"CGoCallRate":0.00004978453,"CPUUserPercent":0.000018240575,"CPUSysPercent":0.000000964535,"GCPausePercent":0.000000034220356,"GCRunCount":255,"NetHostRecvBytes":4674205,"NetHostSendBytes":4674205}
I230907 09:54:41.037454 13587318 2@server/status/runtime_log.go:47  [n1] 489  runtime stats: 491 MiB RSS, 1521 goroutines (stacks: 6.4 MiB), 184 MiB/354 MiB Go alloc/total(stale) (heap fragmentation: 87 MiB, heap reserved: 31 MiB, heap released: 319 MiB), 21 MiB/41 MiB CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (255x), 4.5 MiB/4.5 MiB (r/w)net
I230907 09:54:41.056199 13974844 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r22/1:/Table/2{1-2}] 490  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:41.103800 13976690 13@kv/kvserver/replicate_queue.go:840  [n1,replicate,s1,r6/1:/Table/{0-3}] 491  error processing replica: 0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster
I230907 09:54:41.466220 13672762 2@rpc/context.go:1393  [-] 492  closing [{n0: 127.0.0.1:46811 (system)}]
I230907 09:54:41.466206 13672443 2@rpc/context.go:1393  [-] 493  closing [{n0: 127.0.0.1:46811 (system)}]
I230907 09:54:41.475409 13580698 cli/testutils.go:250  [-] 494  stopping server and cleaning up CLI test
I230907 09:54:41.475521 13580698 cli/testutils.go:207  [-] 495  stopping server at 127.0.0.1:46811 / 127.0.0.1:46559
I230907 09:54:41.476491 13653753 2@rpc/context.go:1393  [n1] 496  closing [{n1: 127.0.0.1:46811 (default)} {n0: 127.0.0.1:46811 (default)}]
I230907 09:54:41.477065 13590737 sql/stats/automatic_stats.go:509  [n1] 497  quiescing auto stats refresher
W230907 09:54:41.478606 13590978 jobs/registry.go:827  [n1] 498  canceling all adopted jobs due to stopper quiescing
    zip_test.go:151: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/3173/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/cli/cli_test_/cli_test.runfiles/com_github_cockroachdb_cockroach/pkg/cli/testdata/zip/testzip:1: 
        output didn't match expected:
        @@ -103,11 +103,13 @@
         [node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
         [node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
         [node 1] requesting heap file list... received response... done
         [node ?] ? heap profiles found
         [node 1] requesting goroutine dump list... received response... done
        -[node 1] 0 goroutine dumps found
        +[node 1] 1 goroutine dumps found
        +[node 1] retrieving goroutine_dump.2023-09-07T09_54_41.037.double_since_last_dump.000001521.txt.gz... received response... done
        +[node 1] writing dump... writing binary output: debug/nodes/1/goroutines/goroutine_dump.2023-09-07T09_54_41.037.double_since_last_dump.000001521.txt.gz... done
         [node 1] requesting log file ...
         [node 1] 0 log file ...
         [node 1] requesting ranges... received response... done
         [node 1] 52 ranges found
         [node 1] writing range 1... converting to JSON... writing binary output: debug/nodes/1/ranges/1.json... done
I230907 09:54:41.479957 13616900 jobs/registry.go:1227  [n1] 499  AUTO SPAN CONFIG RECONCILIATION job 897876200959344641: stepping through state succeeded
W230907 09:54:41.481450 13590162 sql/sqlliveness/slinstance/slinstance.go:242  [n1] 500  exiting heartbeat loop
W230907 09:54:41.481771 13616900 kv/txn.go:705  [n1] 501  failure aborting transaction: node unavailable; try another peer; abort caused by: txn exec: context canceled
W230907 09:54:41.492584 13616900 jobs/adopt.go:454  [n1] 502  could not clear job claim: clear-job-claim: failed to send RPC: sending to all replicas failed; last error: ba: Scan [/Table/15/1/897876200959344641,/Table/15/1/897876200959344642), [txn: 64d9907e], [can-forward-ts] RPC error: node unavailable; try another peer
--- FAIL: TestZip (10.99s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/server

This test on roachdash | Improve this report!

Jira issue: CRDB-31292

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 7, 2023
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Sep 7, 2023
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Sep 7, 2023
@knz knz added T-observability-inf and removed T-server-and-security DB Server & Security labels Sep 13, 2023
@dhartunian dhartunian self-assigned this Sep 13, 2023
@abarganier abarganier assigned abarganier and unassigned dhartunian Sep 20, 2023
@abarganier
Copy link
Contributor

abarganier commented Sep 20, 2023

On master, this has already been fixed.

Many debug zip tests make use of eraseNonDeterministicZipOutput. This function has been updated on multiple occasions throughout 2023 to de-flake various zip tests. A couple of those updates (2ee4e49, #98778) added logic to remove goroutine dump-related outputs such as this that are non-deterministic.

Seems like the solution is to just backport these to release-22.2. I'll get a PR up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

4 participants