-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external_storage: implement locking #56597
Conversation
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
Hi @YuJuncen. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #56597 +/- ##
=================================================
- Coverage 73.3217% 60.0809% -13.2409%
=================================================
Files 1631 1878 +247
Lines 451083 713403 +262320
=================================================
+ Hits 330742 428619 +97877
- Misses 100025 259249 +159224
- Partials 20316 25535 +5219
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| Verify() → **OK** | | | ||
| | Write("LOCK_Bob", "") → **OK** | | ||
| Write("LOCK_Alice", "") → **OK** | | | ||
| | Verify() → **Failed! "LOCK_Alice" exists** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When contention is high no one can commit? It doesn't scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a randomized back off algorithm can help. Anyway this is like some sort of optimistic lock, hence not pretty effective in the scenario that tons of clients racing for one lock...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK if there will not be too many clients in the use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bunch of nits, thanks for the change!
|
||
- When compacting / restoring, we want to block migrating to a new version. | ||
- When migrating the backup storage to a new version, we want to forbid reading. | ||
- When truncating the storage, we don't want another trancating operation happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo trancating
- When truncating the storage, we don't want another trancating operation happen. | ||
- When backing up, we don't want another backup uses the same storage. | ||
|
||
But external storage locking isn't trivial. Simply putting a lock file isn't safe enough: because after checking there isn't such a lock file, another one may write it immediately. Object locks provide stronger consistency, but also require extra configuration and permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I thought the new feature conditional put
from S3 can solve this race condition, other vendors also have it. I remember you mentioned this feature can provide RW support, maybe that's where it differs from simply using conditional put
on a lock file? I feel like it could be better to mention the comparison between the two.
br/pkg/storage/locking.go
Outdated
// over the same file was success. | ||
// | ||
// For more details, check docs/design/2024-10-11-put-and-verify-transactions-for-external-storages.md. | ||
type ExclusiveWrite struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like CondtionalWrite
or ConditionalPut
is more descriptive. And VerifyWriteContext
could be ConditionCheck
or Precondtion
, something like that. What do you think?
br/pkg/storage/locking.go
Outdated
"go.uber.org/zap" | ||
) | ||
|
||
// ExclusiveWrite is a write that in a strong consistency storage. | ||
// | ||
// It is pretty like a "write if not exist", but it is atomic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure this description is accurate cuz the VerifyWriteContext
can be arbitrary
br/pkg/storage/locking.go
Outdated
lock.path = target | ||
lock.txnID, err = writer.CommitTo(ctx, storage) | ||
if err != nil { | ||
err = errors.Annotatef(err, "there is something about the lock: %s", tryFetchRemoteLock(ctx, storage, writeLock)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, maybe failed to commit the lock
? I feel like error message should describe what the action was for better debugging.
br/pkg/storage/locking.go
Outdated
func TryLockRemoteRead(ctx context.Context, storage ExternalStorage, path, hint string) (lock RemoteLock, err error) { | ||
readID := rand.Int63() | ||
target := fmt.Sprintf("%s.READ.%016x", path, readID) | ||
writeLock := fmt.Sprintf("%s.WRIT", target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo WRIT
? maybe put the READ, WRITE, INTENT
into a constant cuz multiple methods are using it, changing one place and forgot to change other will cause problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix WRIT
is just for aligning it with READ
.
br/pkg/storage/locking.go
Outdated
res, err := json.Marshal(meta) | ||
if err != nil { | ||
log.Panic( | ||
"Unreachable: a trivial object cannot be marshaled to JSON.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, what does trivial at here mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some how like Trivial types
in C++. Anyway it can be more clear.
@Tristan1900: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
return err | ||
} | ||
return ErrLocked{Meta: meta} | ||
func TryLockRemote(ctx context.Context, storage ExternalStorage, path, hint string) (lock RemoteLock, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use TryLockRemoteWrite
instead of TryLockRemote
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for compatibility. Write locks requires an extra suffix.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 3pointer, Leavrth, Tristan1900 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all |
1 similar comment
/test all |
@purelind: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@purelind: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@YuJuncen: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/retest-required |
/test unit-test |
@YuJuncen: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@YuJuncen: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@ti-chi-bot[bot]: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest-required |
@YuJuncen: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/test mysql-test |
@YuJuncen: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: close #56523
Problem Summary:
See the issue. We need a safer method to lock the external storage.
What changed and how does it work?
This PR implemented the "put and verify" txn, also a document was provided.
Check List
Tests
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.