Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing SkewedReadWrite to load its metadata in a transactionally consistent way #9274

Merged
merged 1 commit into from
Jan 31, 2023

Conversation

sfc-gh-jslocum
Copy link
Collaborator

@sfc-gh-jslocum sfc-gh-jslocum commented Jan 31, 2023

The servers and key mapping weren't being loaded in a way that was guaranteed to be transactionally consistent.
This lead to a concrete problem where "range" was loaded with version A, "serverList" was loaded with version C, and a server was removed by DD from the cluster at version B, where A < B < C.
This resulted in serverShards having a mapping to a server not in serverInterfaces, which caused serverInterfaces.at in setHotServers to segfault.

This fixes the problem by loading both as part of the same transaction with the same read version, guaranteeing they are transactionally consistent.

Passes 1k of this test in correctness

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@fdb-windows-ci
Copy link
Collaborator

Doxense CI Report for Windows 10

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: 163d021
  • Duration 0:17:34
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Monterey 12.x

  • Commit ID: 163d021
  • Duration 0:33:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 163d021
  • Duration 0:38:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 163d021
  • Duration 0:41:42
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Monterey 12.x

  • Commit ID: 163d021
  • Duration 0:44:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Collaborator

@sfc-gh-xwang sfc-gh-xwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thanks

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 163d021
  • Duration 1:13:22
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@sfc-gh-jslocum sfc-gh-jslocum merged commit 77875d7 into apple:main Jan 31, 2023
sfc-gh-nwijetunga added a commit to sfc-gh-nwijetunga/foundationdb that referenced this pull request Feb 1, 2023
* main: (22 commits)
  move feed cleanup check to after data is guaranteed to be available for granule (apple#9283)
  remove test timeout
  Reduce logging level for verbose events
  Added documentation for consistencyscan CLI command.
  Fix audit_storage issues (apple#9265)
  Update bindings/bindingtester/spec/tenantTester.md
  Update bindings/bindingtester/spec/tenantTester.md
  update bindingtester spec
  Fixing SkewedReadWrite to load its metadata in a transactionally consistent way (apple#9274)
  push string onto stack when active tenant is set
  Add comments on why custom encoding is needed
  patch to fix some existing bindingtester issues
  add arg and return type to the c_api for impl.py
  Fix includes
  Add from_7.0.0_until_7.2.0 for UpgradeAndBackupRestore tests
  Change UpgradeAndBackupRestore to from_7.2.4
  Add a new toml option to disable failure injection workload
  Change SubmitBackup to only reboot in Attrition
  add method to return idfuture
  add to java and python stack tester
  ...
sfc-gh-nwijetunga added a commit to sfc-gh-nwijetunga/foundationdb that referenced this pull request Feb 2, 2023
* main: (23 commits)
  Handle EKP Tenant Not Found Errors (apple#9261)
  move feed cleanup check to after data is guaranteed to be available for granule (apple#9283)
  remove test timeout
  Reduce logging level for verbose events
  Added documentation for consistencyscan CLI command.
  Fix audit_storage issues (apple#9265)
  Update bindings/bindingtester/spec/tenantTester.md
  Update bindings/bindingtester/spec/tenantTester.md
  update bindingtester spec
  Fixing SkewedReadWrite to load its metadata in a transactionally consistent way (apple#9274)
  push string onto stack when active tenant is set
  Add comments on why custom encoding is needed
  patch to fix some existing bindingtester issues
  add arg and return type to the c_api for impl.py
  Fix includes
  Add from_7.0.0_until_7.2.0 for UpgradeAndBackupRestore tests
  Change UpgradeAndBackupRestore to from_7.2.4
  Add a new toml option to disable failure injection workload
  Change SubmitBackup to only reboot in Attrition
  add method to return idfuture
  ...
jzhou77 pushed a commit to jzhou77/foundationdb that referenced this pull request Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants