Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/version-upgrade failed #54079

Closed
cockroach-teamcity opened this issue Sep 9, 2020 · 14 comments · Fixed by #54114 or #54194
Closed

roachtest: acceptance/version-upgrade failed #54079

cockroach-teamcity opened this issue Sep 9, 2020 · 14 comments · Fixed by #54114 or #54194
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).acceptance/version-upgrade failed on master@a1f6efaf64f3c539187cf5f09c1ce2b4dd79e021:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:261,versionupgrade.go:344,versionupgrade.go:190,versionupgrade.go:178,acceptance.go:58,acceptance.go:95,test_runner.go:754: EOF

	cluster.go:1651,context.go:135,cluster.go:1640,test_runner.go:823: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 2: dead
		3: 24019
		1: 24027
		4: 24017
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1143
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:267
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1839
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 9, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Sep 9, 2020
@asubiotto
Copy link
Contributor

job 588317108461633540: skipping: no liveness record for the job's node 4
(1) attached stack trace
  -- stack trace:
  | runtime.gopanic
  | 	/usr/local/go/src/runtime/panic.go:679
  | [...repeated from below...]
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/util/log.ReportOrPanic
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:338
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).deprecatedMaybeAdoptJob
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/deprecated.go:202
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start.func7
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:654
  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).Start.func10
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:739
  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347
  | runtime.goexit
  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
Wraps: (3) job 588317108461633540: skipping: no liveness record for the job's node 4
Error types: (1) *withstack.withStack (2) *withstack.withStack (3) *errutil.leafError

@asubiotto
Copy link
Contributor

Dup of #54082 but leaving the closing to the test owner.

@tbg
Copy link
Member

tbg commented Sep 9, 2020

cc @ajwerner

I don't think this is related to multi-tenancy as version-upgrade does not exercise that at all. Whatever we are picking up in

for _, liveness := range nl.GetLivenesses() {
nodeStatusMap[liveness.NodeID] = &nodeStatus{
isLive: liveness.IsLive(now),
}

is not complete. This would make sense, vaguely, as this is the gossiped information, so why wouldn't you be missing some nodeIDs, particularly early in the start sequence. My take here, given that this is on the deprecated path, too, would be to remove the assertion. Open question is why we're hitting this only now.

tbg added a commit to tbg/cockroach that referenced this issue Sep 9, 2020
cockroachdb#54079

Release justification: non-production changes
Release note: None
@tbg
Copy link
Member

tbg commented Sep 9, 2020

Saw this again on a master merge attempt, so pulling the trigger to skip. We should get this fixed ASAP though.

@cockroach-teamcity

This comment has been minimized.

@irfansharif
Copy link
Contributor

ERROR: ERROR: cockroach server exited with error: failed to create engines: pebble: error when replaying WAL: pebble/record: zeroed chunk

The last failure is the same as #54164 (comment), which @jbowens is looking at elsewhere.

@thoszhang
Copy link
Contributor

What's still left unresolved here? The 2020-09-10 failure is fixed by #54194, so it's just the node liveness assertion, right? I'm removing the automatic release blocker tag (unless @ajwerner you think it should be a release blocker).

@thoszhang thoszhang removed branch-release-20.2 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 21, 2020
@ajwerner
Copy link
Contributor

I don't think there's anything left except to understand which change lead to the problem. @irfansharif was eager to understand and I had indicated I'd help track it down. Thanks for removing the labels

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@a51a8eb6d00304c6233e79c2448efd0bf5bc84c6:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:446,versionupgrade.go:418,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:58,acceptance.go:95,test_runner.go:755: pq: internal error: StartableJob 598788097801650180 cannot be started without sqlliveness session

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@8504961f0a7156e2d346af25a6beab3bcc32f129:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:446,versionupgrade.go:418,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:58,acceptance.go:95,test_runner.go:755: pq: internal error: StartableJob 599984015336046593 cannot be started without sqlliveness session

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

pq: internal error: StartableJob 598788097801650180 cannot be started without sqlliveness session

@ajwerner, know anything about this?

@ajwerner
Copy link
Contributor

Yes: #55524, will work to get that over the finish line later today.

@irfansharif
Copy link
Contributor

Feel free to close out this issue after. I don't think we'll have much more to do about the original failures here anymore.

@ajwerner
Copy link
Contributor

Closed by #55524.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
7 participants