Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: run scheduled backup init without timeout #97495

Merged
merged 1 commit into from
Feb 22, 2023

Conversation

msbutler
Copy link
Collaborator

Previously, several roachtests failed during a cluster restart because a node serving the default scheduled backup command was not ready to serve requests. At this time, when roachprod start returns, not every node may be ready to serve requests.

To prevent this failure mode, this patch changes the scheduled backup cmd during roachprod.Start() to run with infinite timeout and only on the the first node in the cluster.

Fixes #97010, #97232

Release note: None

Epic: none

@msbutler msbutler added the T-testeng TestEng Team label Feb 22, 2023
@msbutler msbutler requested a review from renatolabs February 22, 2023 18:14
@msbutler msbutler requested a review from a team as a code owner February 22, 2023 18:14
@msbutler msbutler removed the request for review from a team February 22, 2023 18:14
@msbutler msbutler self-assigned this Feb 22, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler
Copy link
Collaborator Author

@renatolabs this patch seems to work on a few roachtests which run restarts. Happy run it on the whole nightly suite if you'd like.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Previously, several roachtests failed during a cluster restart because a node
serving the default scheduled backup command was not ready to serve requests.
At this time, when roachprod start returns, not every node may be ready to
serve requests.

To prevent this failure mode, this patch changes the scheduled backup cmd
during roachprod.Start() to run with infinite timeout and only on the the first
node in the cluster.

Fixes cockroachdb#97010, cockroachdb#97232

Release note: None

Epic: none
@msbutler msbutler force-pushed the butler-deflake-roachtest branch from abbeda4 to c1a3eed Compare February 22, 2023 19:00
@msbutler
Copy link
Collaborator Author

TFTR!

bors r=renatolabs

@craig
Copy link
Contributor

craig bot commented Feb 22, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-testeng TestEng Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: tpccbench/nodes=12/cpu=16 failed
3 participants