Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change SubmitBackup to only reboot in Attrition workload for UpgradeAndBackupRestore-1 [release-7.2] #9231

Merged
merged 4 commits into from
Feb 1, 2023

Conversation

jzhou77
Copy link
Contributor

@jzhou77 jzhou77 commented Jan 25, 2023

cherrypick #9240

Otherwise, the Attrition can RebootAndDelete tlogs in remote DC such that the remote is unusable and blocking recovery to fully_recovered state. In fact, the FirstCycleTest can only reach the accepting_commits state.

In the part 2 of the restarting test, the runTests() wait for quietDatabase() to reach fully fully_recovered state, but was stuck in the accepting_commits state.

Added a new option "disabledFailureInjectionWorkloads" to toml test spec so that we can disallow randomly injecting Attrition workload that may RebootAndDelete tlogs for this test.

An alternative fix is to allow consistency check in the SubmitBackup, which will call repairDeadDatacenter() to "fix" the cluster. But I feel explicitly only allow reboot is a better fix.

100k restarting tests 20230126-014856-jzhou-b54dd8d549048658

20230125-233600-jzhou-36c52798dec7f51d

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 0730700
  • Duration 1:01:28
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 0730700
  • Duration 1:02:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 0730700
  • Duration 1:27:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 0bd3f8a
  • Duration 0:10:40
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 0bd3f8a89dc0be2f72887d49b0d3e5a418eb6ad7
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 0bd3f8a
  • Duration 0:10:46
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 0bd3f8a89dc0be2f72887d49b0d3e5a418eb6ad7
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Monterey 12.x

  • Commit ID: 0bd3f8a
  • Duration 0:11:36
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 0bd3f8a89dc0be2f72887d49b0d3e5a418eb6ad7
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Monterey 12.x

  • Commit ID: 0bd3f8a
  • Duration 0:11:41
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 0bd3f8a89dc0be2f72887d49b0d3e5a418eb6ad7
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 0bd3f8a
  • Duration 0:12:14
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 0bd3f8a89dc0be2f72887d49b0d3e5a418eb6ad7
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: b5c6c6a
  • Duration 0:45:19
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: b5c6c6a
  • Duration 0:52:39
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 971d5d6
  • Duration 0:30:39
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Monterey 12.x

  • Commit ID: 971d5d6
  • Duration 0:40:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 971d5d6
  • Duration 0:46:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Monterey 12.x

  • Commit ID: 971d5d6
  • Duration 0:47:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: b5c6c6a
  • Duration 1:26:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 971d5d6
  • Duration 1:23:30
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@jzhou77 jzhou77 marked this pull request as ready for review January 26, 2023 02:07
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 20e9263
  • Duration 1:00:01
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 20e9263
  • Duration 1:08:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 20e9263
  • Duration 1:24:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@jzhou77 jzhou77 requested review from xis19 and halfprice February 1, 2023 01:41
Otherwise, the Attrition can RebootAndDelete tlogs in remote DC such that the
remote is unusable and blocking recovery to fully_recovered state. In fact,
the FirstCycleTest can only reach the accepting_commits state.

In the part 2 of the restarting test, the runTests() wait for quietDatabase()
to reach fully fully_recovered state, but was stuck in the accepting_commits
state.
This is needed for UpgradeAndBackupRestore-1 to make sure the DB is recoverable
so that the part 2 can start.
Because of the new option "disabledFailureInjectionWorkloads" is not available
until 7.2.4.
@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@halfprice
Copy link
Contributor

May we lose some covered scenarios as before?

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: e1bf6d5
  • Duration 1:10:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: e1bf6d5
  • Duration 1:16:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: e1bf6d5
  • Duration 1:18:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jzhou77 jzhou77 merged commit ed482ed into apple:release-7.2 Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants