-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment: Improve concurrent merge performance by weakly owning branch updates #8268
Conversation
♻️ PR Preview d352071 has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
@arielshaqed looks like amazing work, thank you!
|
I should probably mention that if we go in this direction, we can deploy this and then improve it. The next step is to add fairness: earlier merge attempts should have a better chance of being next. Otherwise when there are multiple concurrent merges some of the earlier merge will repeatedly lose and time out. If everything is fair then we expect fewer timeouts, and in general the tail latency will shorten. |
pkg/kv/util/weak_owner.go
Outdated
"now": now, | ||
}).Trace("Try to take ownership") | ||
if errors.Is(err, kv.ErrNotFound) { | ||
err = kv.SetMsg(ctx, w.Store, weakOwnerPartition, prefixedKey, &ownership) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, no? multiple routines can succeed at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, yeah, this was supposed to be a SetIf with the "not present" predicate - and I forgot to find out the correct value for that predicate.
Thanks for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that one's embarrassing.
pkg/kv/util/weak_owner.go
Outdated
"now": now, | ||
}).Trace("Try to take ownership") | ||
if errors.Is(err, kv.ErrNotFound) { | ||
err = kv.SetMsg(ctx, w.Store, weakOwnerPartition, prefixedKey, &ownership) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, yeah, this was supposed to be a SetIf with the "not present" predicate - and I forgot to find out the correct value for that predicate.
Thanks for catching this!
Weak ownership is a best-effort lock that occasionally fails. This one can fail when a goroutine is delayed for a long interval. This is fine if the calling code relies on ownership for performance but not for correctness. E.g. merges and commits.
This includes merges. Only one concurrent `BranchUpdate` operation can succeed, so unless many long-lived such operations can fail there is little point in running multiple concurrent updates.
Performs multiple small merges in parallel.
This shows results, even on local! When I run lakefs (by default weak ownership is OFF) I get 6.6% errors with concurrency 50. Rate is <50/s. Also the long tail is _extremely_ long. When I switch weak ownership ON, using the default parameters, I get **0** errors with concurrency 50. Rate is about the same, except that the tail (when load drops) is _short_. See the difference [here][merge-abuse-speed-chart]: it's faster _and_ returns 0 errors. The distribution of actual successful merge times is somewhat slower - possibly because of the time to lock, possibly because of the fact that errors in the really slow cases cause those slow cases to be dropped. Finally, note that because we do not queue, some merges take a *long* time under sustained load. We could improve weak ownership to hold an actual queue of work. This would make merges _fair_: merges will occur roughly in order of request arrival. ==== Weak ownership OFF ==== ``sh ❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main Source branch: lakefs://abuse/main merge - completed: 34, errors: 0, current rate: 33.81 done/second merge - completed: 80, errors: 0, current rate: 45.98 done/second merge - completed: 128, errors: 0, current rate: 48.02 done/second merge - completed: 177, errors: 0, current rate: 49.03 done/second merge - completed: 222, errors: 0, current rate: 44.97 done/second merge - completed: 265, errors: 3, current rate: 43.03 done/second merge - completed: 308, errors: 9, current rate: 42.97 done/second merge - completed: 357, errors: 15, current rate: 49.01 done/second merge - completed: 406, errors: 21, current rate: 49.03 done/second merge - completed: 451, errors: 22, current rate: 44.97 done/second merge - completed: 499, errors: 29, current rate: 48.01 done/second merge - completed: 545, errors: 30, current rate: 46.01 done/second merge - completed: 585, errors: 31, current rate: 39.97 done/second merge - completed: 632, errors: 33, current rate: 47.04 done/second merge - completed: 679, errors: 37, current rate: 47.00 done/second merge - completed: 728, errors: 46, current rate: 48.96 done/second merge - completed: 768, errors: 49, current rate: 40.04 done/second merge - completed: 808, errors: 53, current rate: 39.98 done/second merge - completed: 854, errors: 57, current rate: 45.99 done/second merge - completed: 891, errors: 58, current rate: 37.00 done/second merge - completed: 935, errors: 64, current rate: 44.00 done/second merge - completed: 972, errors: 66, current rate: 36.98 done/second merge - completed: 990, errors: 66, current rate: 18.00 done/second merge - completed: 995, errors: 66, current rate: 5.00 done/second merge - completed: 996, errors: 66, current rate: 1.00 done/second merge - completed: 998, errors: 66, current rate: 2.00 done/second merge - completed: 999, errors: 66, current rate: 1.00 done/second merge - completed: 999, errors: 66, current rate: 0.00 done/second merge - completed: 999, errors: 66, current rate: 0.00 done/second completed: 1000, errors: 66, current rate: 5.27 done/second Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 601 100 671 250 672 350 672 500 696 750 740 1000 765 5000 896 min 54 max 12022 total 934 ``` ==== Weak ownership ON ==== ```sh ❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main Source branch: lakefs://abuse/main merge - completed: 36, errors: 0, current rate: 35.23 done/second merge - completed: 86, errors: 0, current rate: 49.98 done/second merge - completed: 136, errors: 0, current rate: 50.03 done/second merge - completed: 185, errors: 0, current rate: 48.99 done/second merge - completed: 236, errors: 0, current rate: 51.02 done/second merge - completed: 286, errors: 0, current rate: 49.99 done/second merge - completed: 337, errors: 0, current rate: 50.97 done/second merge - completed: 390, errors: 0, current rate: 53.03 done/second merge - completed: 438, errors: 0, current rate: 48.01 done/second merge - completed: 487, errors: 0, current rate: 49.00 done/second merge - completed: 534, errors: 0, current rate: 46.98 done/second merge - completed: 581, errors: 0, current rate: 46.99 done/second merge - completed: 632, errors: 0, current rate: 51.00 done/second merge - completed: 680, errors: 0, current rate: 48.04 done/second merge - completed: 725, errors: 0, current rate: 44.98 done/second merge - completed: 771, errors: 0, current rate: 45.99 done/second merge - completed: 815, errors: 0, current rate: 44.02 done/second merge - completed: 861, errors: 0, current rate: 46.01 done/second merge - completed: 905, errors: 0, current rate: 43.98 done/second merge - completed: 947, errors: 0, current rate: 42.00 done/second merge - completed: 977, errors: 0, current rate: 30.01 done/second merge - completed: 997, errors: 0, current rate: 19.99 done/second completed: 1000, errors: 0, current rate: 4.92 done/second Histogram (ms): 1 0 2 0 5 0 7 0 10 0 15 0 25 0 50 0 75 457 100 464 250 468 350 468 500 642 750 647 1000 729 5000 952 min 54 max 13744 total 1000 ```
- Add some jitter when acquiring ownership on a branch - Refresh _for_ refresh interval, twice _every_ refresh interval - nolint unjustified warnings
Use SetIf with a nil predicate.
ca436b4
to
c6b412c
Compare
Thanks!
Yes. Firstly, this PR adds a
In the C++ and Rust standard libraries, "weak" in this case refers to the pointer. (In other cases in the C++ standard library, "weak" can refer to any variant which is logically weaker, for instance we also have What we have in this PR indeed uses a lease-based implementation. But it's not ownership, it just behaves like ownership in almost all reasonable cases. A better term might be "mostly correct ownership"1. In fact, I believe that such an unwieldy name would be a good choice! If you agree I will change to use it. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed everything except for the tests. Some really good things here!
pkg/kv/util/weak_owner.go
Outdated
// | ||
// So it *cannot* guarantee correctness. However it usually works, and if | ||
// it does work, the owning goroutine wins all races by default. | ||
type WeakOwner struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this shouldn't be in kv pkg. It deserves a standalone ownership
/owner
/ anything else package. The kv is an implementation detail of the ownership, so this struct should be called KVWeakOwner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let's do this after we agree on a name? @yonipeleg33 requested a name change, I'd rather do a single name change towards the end of the review.
docs/reference/cli.md
Outdated
@@ -3031,6 +3031,25 @@ lakectl abuse list <source ref URI> [flags] | |||
|
|||
|
|||
|
|||
### lakectl abuse merge | |||
|
|||
Merge nonconflicting objects to the source branch in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge nonconflicting objects to the source branch in parallel | |
Merge non-conflicting objects to the source branch in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will auto-generate this :-)
pkg/graveler/ref/manager.go
Outdated
requestID = *requestIDPtr | ||
} | ||
if m.branchOwnership != nil { | ||
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID)) | |
ownRelease, err := m.branchOwnership.Own(ctx, requestID, string(branchID)) |
or
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID)) | |
ownClose, err := m.branchOwnership.Own(ctx, requestID, string(branchID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Changed to release()
, which I hope is even nicer.
pkg/kv/util/weak_owner.go
Outdated
//nolint:stylecheck | ||
var finished = errors.New("finished") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix linting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a sentinel not an actual error. See for instance io.EOF. It is used only in sleepFor to indicate that the context expired with no error. But it turns out that there are special cases in contest.WithCancelCause so we don't need it.
Removed, thanks!
pkg/kv/util/weak_owner.go
Outdated
// Do NOT attempt to delete, to avoid | ||
// knock-on effects destroying the new | ||
// owner. | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you check the error and exit the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reported the error, and I do not think there is any useful action to take.
Aborting the action here, in particular, will probably livelock in some situations!
This goroutine has nothing waiting for it, and there really is nothing much to do with broken ownership. "All" that happens is that performance will suffer! This goroutine for the former owner thinks that another action stole its lease. So it makes sense to continue - it is likely to win the race to update, because it has already performed some work. Meanwhile the other action thinks that the former owner has stopped and will never finish - it should continue to avoid zombielock (a name I just invented for livelock against a stuck thread).
Which one is right? Well, if this goroutine is still running then the other action will most likely suffer. If it isn't running, it cannot do anything useful anyway.
pkg/kv/util/weak_owner.go
Outdated
log.WithFields(logging.Fields{ | ||
"new_owner": ownership.Owner, | ||
}).Info("Lost ownership race") | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why the code continues to try and own the key. Assuming that it should, I worry that this across multiple routines may have a cascading effect of kv calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment to explain. But note that "all" that happens is that we fall back to the previous behaviour. Frankly I believe that the common cause for A losing a race to B will be that the clock on B is a bit too fast; this does not cause chains of lost races. The other common cause for A losing a race to B will be that A was stuck; again, that does not give it or any other thread an advantage against B. The final case is when all are badly stuck - for instance if you configure very short acquire and refresh intervals. In that case you do get what you would expect: a stuck system.
cmd/lakectl/cmd/abuse_merge.go
Outdated
|
||
var abuseMergeCmd = &cobra.Command{ | ||
Use: "merge <branch URI>", | ||
Short: "Merge nonconflicting objects to the source branch in parallel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short: "Merge nonconflicting objects to the source branch in parallel", | |
Short: "Merge non-conflicting objects to the source branch in parallel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lakectl/cmd/abuse_merge.go
Outdated
amount := Must(cmd.Flags().GetInt("amount")) | ||
parallelism := Must(cmd.Flags().GetInt("parallelism")) | ||
|
||
fmt.Println("Source branch:", u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt.Println("Source branch:", u) | |
fmt.Println("Source branch: ", u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review!
PTAL...
cmd/lakectl/cmd/abuse_merge.go
Outdated
|
||
var abuseMergeCmd = &cobra.Command{ | ||
Use: "merge <branch URI>", | ||
Short: "Merge nonconflicting objects to the source branch in parallel", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/reference/cli.md
Outdated
@@ -3031,6 +3031,25 @@ lakectl abuse list <source ref URI> [flags] | |||
|
|||
|
|||
|
|||
### lakectl abuse merge | |||
|
|||
Merge nonconflicting objects to the source branch in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will auto-generate this :-)
pkg/graveler/ref/manager.go
Outdated
// branch IF a previous worker crashed while owning that branch. It | ||
// has no effect when there are no crashes. Reducing it increases | ||
// write load on the branch ownership record when concurrent | ||
// operations occur. This. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/graveler/ref/manager.go
Outdated
requestIDPtr := httputil.RequestIDFromContext(ctx) | ||
requestID := "<unknown>" | ||
if requestIDPtr != nil { | ||
requestID = *requestIDPtr | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Either way will have strange performance characteristics, of course.
pkg/kv/util/weak_owner.go
Outdated
// owning key. | ||
func (w *WeakOwner) Own(ctx context.Context, owner, key string) (func(), error) { | ||
prefixedKey := []byte(fmt.Sprintf("%s/%s", w.Prefix, key)) | ||
err := w.startOwningKey(context.Background(), owner, prefixedKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusion about how cancellation would work. Restored, thanks!
pkg/kv/util/weak_owner.go
Outdated
stopOwning := func() { | ||
defer refreshCancel() | ||
// Use the original context - in case cancelled twice. | ||
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why not: this key is on my partition, with my prefix, and I set it starting at l. 245. It serves to indicate ownership, and its value holds only ownership. I think I can delete it.
pkg/kv/util/weak_owner.proto
Outdated
message WeakOwnership { | ||
string owner = 1; | ||
google.protobuf.Timestamp expires = 2; | ||
string comment = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose a system is behaving really strangely. I can look at its ownership partition and determine which operation owns it -- that's what I put here when I took ownership. I call it "comment" rather than something else because I don't want to commit to any format of what exactly goes in there.
pkg/kv/util/weak_owner_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
This does not depend on KV implementation, so just add a matrix to one of the AWS/S3 flavours.
Otherwise no way to tell Esti _really_ ran it with ownership.
6ff978c
to
c7ecd5f
Compare
@arielshaqed could you open an issue and link to it? This is hardly a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review wip
.github/workflows/esti.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/kv/util/weak_owner.go
Outdated
if errors.Is(err, kv.ErrNotFound) { | ||
predicate = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope kv contract tests cover this
pkg/kv/util/weak_owner.go
Outdated
stopOwning := func() { | ||
defer refreshCancel() | ||
// Use the original context - in case cancelled twice. | ||
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It holds ownership, but it's not guaranteed to be your routine ownership (unless I'm missing something). Some other thread might successfully set itself as owner and your cleanup code breaks it.
#8288 :-) |
If ownership has already been lost to another thread, do NOT delete ownership when released. - KV does not provide a DeleteIf operation. Instead, use SetIf with an always-expired timestamp. - Along the way, ensure "owner" string is truly unique by stringing a nanoid onto it. Currently owner is the request ID, which should be unique - but adding randomness ensures it will always be unique regardless of the calling system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
PTAaaAAaaAAL...
pkg/kv/util/weak_owner.go
Outdated
stopOwning := func() { | ||
defer refreshCancel() | ||
// Use the original context - in case cancelled twice. | ||
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A DeleteIf operation sure would be nice here. Since we didn't spec one out, I'll replace it with a sentinel value instead.
Sorry for the late feedback, I am reviewing it very cautiously. I wonder if wrapping |
I'm not sure this matters. There is no connection between the two update operations: my commit can succeed even if you were the last to seal staffing tokens. And of course sealing is cheap. So I would not expect this to affect performance. Of course the easy way to be sure will be to perform the experiment. |
Otherwise it doesn't even say how long it took - which is the most interesting part for `abuse merge`.
Testing on lakeFS Cloud (staging)Setup
Results
AnalysisWithout weak ownership, contention always caused failures. With concurrency 20, > 25% of merges failed. Each failure actually helps concurrent merges, so failures speed things up. With weak ownership, we saw no errors. The merge rate improved slightly from 3 to 6, and then stayed flat. |
Only report errors. Obviously if not all branches deleted then we left a mess, which is too bad. But the performance test itself succeeded, which is the (more) important thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with few minor comments.
Thank you for your patience!
pkg/kv/util/weak_owner.go
Outdated
log.Info("Lost ownership race before trying to release") | ||
return nil | ||
} | ||
ownership.Owner = fmt.Sprintf("[released] %s", ownership.Owner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to change this. Let's keep the Owner as is, you have the 0 Expires
field and Comment
if you need to add info. It's best to keep the same format if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/kv/util/weak_owner.go
Outdated
"prefixed_key": string(prefixedKey), | ||
"owner": owner, | ||
"new_owner": ownership.Owner, | ||
}).Info("Lost ownership race while trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}).Info("Lost ownership race while trying to release") | |
}).Debug("Lost ownership race while trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not done because I like knowing about these races.
pkg/kv/util/weak_owner.go
Outdated
log = log.WithField("new_owner", ownership.Owner) | ||
|
||
if ownership.Owner != owner { | ||
log.Info("Lost ownership race before trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Info("Lost ownership race before trying to release") | |
log.Debug("Lost ownership race before trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, as rare as before, and even more interesting. Keeping it at info.
pkg/config/defaults.go
Outdated
// write operation. | ||
// | ||
// - While ownership is held, add 2.5 read and 2.5 write operation | ||
// per second, and an additional ~7 read and write operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
// per second, and an additional ~7 read and write operations | |
// per second, and an additional ~7 read operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! But also an additional write operation actually to acquire it. So made the explanation even longer.
// - While ownership is held, add 2.5 read and 2.5 write operation | ||
// per second, and an additional ~7 read and write operations | ||
// per second per branch operation waiting to acquire ownership. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we added some one time costs with releaseIf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, added those!
pkg/kv/util/weak_owner.go
Outdated
if ownership.Owner != owner { | ||
log.WithFields(logging.Fields{ | ||
"new_owner": ownership.Owner, | ||
}).Info("Lost ownership race") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}).Info("Lost ownership race") | |
}).Trace("Lost ownership race") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done. Losing an ownership race should be rare. And when it does happen, it stops the loop.
pkg/kv/util/weak_owner.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to move this to a proper location, this is not a kv util :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder! Next commit :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now... move and rename!
@yonipeleg33 & @itaiad200 : I found no recognized term for this kind of technique. I will go with "mostly-correct ownership", based on this paper by Dr. M Max.
pkg/config/defaults.go
Outdated
// write operation. | ||
// | ||
// - While ownership is held, add 2.5 read and 2.5 write operation | ||
// per second, and an additional ~7 read and write operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! But also an additional write operation actually to acquire it. So made the explanation even longer.
// - While ownership is held, add 2.5 read and 2.5 write operation | ||
// per second, and an additional ~7 read and write operations | ||
// per second per branch operation waiting to acquire ownership. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, added those!
pkg/kv/util/weak_owner.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder! Next commit :-)
pkg/kv/util/weak_owner.go
Outdated
if ownership.Owner != owner { | ||
log.WithFields(logging.Fields{ | ||
"new_owner": ownership.Owner, | ||
}).Info("Lost ownership race") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done. Losing an ownership race should be rare. And when it does happen, it stops the loop.
pkg/kv/util/weak_owner.go
Outdated
log = log.WithField("new_owner", ownership.Owner) | ||
|
||
if ownership.Owner != owner { | ||
log.Info("Lost ownership race before trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, as rare as before, and even more interesting. Keeping it at info.
pkg/kv/util/weak_owner.go
Outdated
log.Info("Lost ownership race before trying to release") | ||
return nil | ||
} | ||
ownership.Owner = fmt.Sprintf("[released] %s", ownership.Owner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/kv/util/weak_owner.go
Outdated
"prefixed_key": string(prefixedKey), | ||
"owner": owner, | ||
"new_owner": ownership.Owner, | ||
}).Info("Lost ownership race while trying to release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not done because I like knowing about these races.
1. It's not a KV util, move it out into distributed. 1. "Weak" is overloaded, so instead call it "mostly correct" ownership. It precisely describes what it is so it must be a good term of art for what we do here. 1. Use the term "branch approximate ownership" at a higher-level in the ref manager. This context doesn't particularly mind the specific properties of ownership, and "approximate" is a good fit there.
Thanks! Pulling... time flies by so quickly when you're having fun... |
" Run latest lakeFS app on AWS S3 Expected — Waiting for status to be reported "
|
What
When enabled, this feature improves performance of concurrent merges.
No seriously, what?
lakeFS retains branch heads on KV. An operation that updates the branch head - typically a commit or a merge - performs a 3-phase commit protocol:
When the third phase fails, the operation fails. We do retry automatically on the server, but obviously under load it will still fail. Indeed we see this happen to some users who have heavy merge loads. In this case each retry is actually more expensive, because the gap between the source commit and the destination branch tends to grow with every successive commit to the destination. Failure is pretty much guaranteed once sustained load crosses some critical threshold.
How
This PR is an experiment to use "weak ownership" to avoid having to retry. Broadly speaking, every branch update operation takes ownership of its branch. This excludes all other operations from updating that branch. This exclusion does not reduce performance - only one concurrent branch operation can ever succeed, and almost all concurrent branch operations can succeed. In fact, it increases performance because it prevents wasting resources on all the concurrent failing operations.
What about CAP?
Distributed locking is impossible by the CAP theorem. That's why we use weak ownership. This gives up on C (consistency) and some P (partition resistance) in favour of keeping A (availability). Specifically, a thread takes ownership of a branch by asserting ownership as a record on KV. This ownership has an expiry time; the thread refreshes its ownership until it is done, at which point it deletes the record.
This ownership is weak: usually a single thread owns the branch, but sometimes 2 threads can end up owning the same branch! If the thread fails to refresh on time (due to a partition or slowness or poor clock sync, for instance) then another thread will also acquire ownership! When that happens we have concurrent branch updates. Because these are correct in lakeFS, we lose performance and may need to retry, but we never give incorrect answers.
Results
This PR also adds a
lakectl abuse merge
command to allow us to measure. b0cfca3 has some numbers when running locally: with sustained concurrency 50, we run faster and get 0 failures instead of 6.6% failures. More details there why this is even better than it sounds. Graph available here, here's a spoiler: