Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: Improve concurrent merge performance by weakly owning branch updates #8268

Merged
merged 18 commits into from
Oct 28, 2024

Conversation

arielshaqed
Copy link
Contributor

What

When enabled, this feature improves performance of concurrent merges.

No seriously, what?

lakeFS retains branch heads on KV. An operation that updates the branch head - typically a commit or a merge - performs a 3-phase commit protocol:

  • Get the branch head from KV
  • Create the metarange on block storage based on that branch head
  • Create a new commit with that metarange whose parent is that branch head, IF the branch head has not moved

When the third phase fails, the operation fails. We do retry automatically on the server, but obviously under load it will still fail. Indeed we see this happen to some users who have heavy merge loads. In this case each retry is actually more expensive, because the gap between the source commit and the destination branch tends to grow with every successive commit to the destination. Failure is pretty much guaranteed once sustained load crosses some critical threshold.

How

This PR is an experiment to use "weak ownership" to avoid having to retry. Broadly speaking, every branch update operation takes ownership of its branch. This excludes all other operations from updating that branch. This exclusion does not reduce performance - only one concurrent branch operation can ever succeed, and almost all concurrent branch operations can succeed. In fact, it increases performance because it prevents wasting resources on all the concurrent failing operations.

What about CAP?

Distributed locking is impossible by the CAP theorem. That's why we use weak ownership. This gives up on C (consistency) and some P (partition resistance) in favour of keeping A (availability). Specifically, a thread takes ownership of a branch by asserting ownership as a record on KV. This ownership has an expiry time; the thread refreshes its ownership until it is done, at which point it deletes the record.

This ownership is weak: usually a single thread owns the branch, but sometimes 2 threads can end up owning the same branch! If the thread fails to refresh on time (due to a partition or slowness or poor clock sync, for instance) then another thread will also acquire ownership! When that happens we have concurrent branch updates. Because these are correct in lakeFS, we lose performance and may need to retry, but we never give incorrect answers.

Results

This PR also adds a lakectl abuse merge command to allow us to measure. b0cfca3 has some numbers when running locally: with sustained concurrency 50, we run faster and get 0 failures instead of 6.6% failures. More details there why this is even better than it sounds. Graph available here, here's a spoiler:
graph showing speedup when enabling weak ownership

@arielshaqed arielshaqed added area/cataloger Improvements or additions to the cataloger performance include-changelog PR description should be included in next release changelog labels Oct 7, 2024
Copy link

github-actions bot commented Oct 7, 2024

♻️ PR Preview d352071 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

github-actions bot commented Oct 7, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link

github-actions bot commented Oct 7, 2024

E2E Test Results - Quickstart

11 passed

@yonipeleg33
Copy link
Contributor

@arielshaqed looks like amazing work, thank you!
I haven't reviewed it yet, but a few general notes:

  1. Do we have an experiment plan for this feature? Which customers/users, for how long, etc.
  2. The term "weak" is a bit inaccurate here IMO:
    The term "weak" as defined in places like CPP's std::weak_ptr or Rust's std::sync::Weak means a "non-owning reference", which is different than how this mechanism of branch locking works - each worker owns the branch exclusively, but only for a short period, and with a graceful mechanism for giving up exclusive ownership.
    The term "lease-based ownership" suites better here IMO.

@arielshaqed
Copy link
Contributor Author

I should probably mention that if we go in this direction, we can deploy this and then improve it. The next step is to add fairness: earlier merge attempts should have a better chance of being next. Otherwise when there are multiple concurrent merges some of the earlier merge will repeatedly lose and time out. If everything is fair then we expect fewer timeouts, and in general the tail latency will shorten.
One way to make things fairer is to manage an actual queue of tasks waiting to own the branch. Then either they're thread gets priority to pick up the task (it's allowed to pick up tasks sooner after their expiry, for instance task n in the queue waits until $t_{expiry} + n\cdot dt_{check}$) or the thread that owns the branch just processes all merges in the queue.
Anyway, we don't need that to run this experiment.

"now": now,
}).Trace("Try to take ownership")
if errors.Is(err, kv.ErrNotFound) {
err = kv.SetMsg(ctx, w.Store, weakOwnerPartition, prefixedKey, &ownership)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, no? multiple routines can succeed at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, yeah, this was supposed to be a SetIf with the "not present" predicate - and I forgot to find out the correct value for that predicate.
Thanks for catching this!

@arielshaqed arielshaqed added the minor-change Used for PRs that don't require issue attached label Oct 8, 2024
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that one's embarrassing.

"now": now,
}).Trace("Try to take ownership")
if errors.Is(err, kv.ErrNotFound) {
err = kv.SetMsg(ctx, w.Store, weakOwnerPartition, prefixedKey, &ownership)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, yeah, this was supposed to be a SetIf with the "not present" predicate - and I forgot to find out the correct value for that predicate.
Thanks for catching this!

Weak ownership is a best-effort lock that occasionally fails.  This one
can fail when a goroutine is delayed for a long interval.

This is fine if the calling code relies on ownership for performance but not
for correctness.  E.g. merges and commits.
This includes merges.  Only one concurrent `BranchUpdate` operation can
succeed, so unless many long-lived such operations can fail there is little
point in running multiple concurrent updates.
Performs multiple small merges in parallel.
This shows results, even on local!

When I run lakefs (by default weak ownership is OFF) I get 6.6% errors with
concurrency 50.  Rate is <50/s.  Also the long tail is _extremely_ long.

When I switch weak ownership ON, using the default parameters, I get **0**
errors with concurrency 50.  Rate is about the same, except that the
tail (when load drops) is _short_.

See the difference [here][merge-abuse-speed-chart]: it's faster _and_
returns 0 errors.  The distribution of actual successful merge times is
somewhat slower - possibly because of the time to lock, possibly because of
the fact that errors in the really slow cases cause those slow cases to be
dropped.

Finally, note that because we do not queue, some merges take a *long* time
under sustained load.  We could improve weak ownership to hold an actual
queue of work.  This would make merges _fair_: merges will occur roughly in
order of request arrival.

==== Weak ownership OFF ====

``sh
❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main
Source branch: lakefs://abuse/main
merge - completed: 34, errors: 0, current rate: 33.81 done/second
merge - completed: 80, errors: 0, current rate: 45.98 done/second
merge - completed: 128, errors: 0, current rate: 48.02 done/second
merge - completed: 177, errors: 0, current rate: 49.03 done/second
merge - completed: 222, errors: 0, current rate: 44.97 done/second
merge - completed: 265, errors: 3, current rate: 43.03 done/second
merge - completed: 308, errors: 9, current rate: 42.97 done/second
merge - completed: 357, errors: 15, current rate: 49.01 done/second
merge - completed: 406, errors: 21, current rate: 49.03 done/second
merge - completed: 451, errors: 22, current rate: 44.97 done/second
merge - completed: 499, errors: 29, current rate: 48.01 done/second
merge - completed: 545, errors: 30, current rate: 46.01 done/second
merge - completed: 585, errors: 31, current rate: 39.97 done/second
merge - completed: 632, errors: 33, current rate: 47.04 done/second
merge - completed: 679, errors: 37, current rate: 47.00 done/second
merge - completed: 728, errors: 46, current rate: 48.96 done/second
merge - completed: 768, errors: 49, current rate: 40.04 done/second
merge - completed: 808, errors: 53, current rate: 39.98 done/second
merge - completed: 854, errors: 57, current rate: 45.99 done/second
merge - completed: 891, errors: 58, current rate: 37.00 done/second
merge - completed: 935, errors: 64, current rate: 44.00 done/second
merge - completed: 972, errors: 66, current rate: 36.98 done/second
merge - completed: 990, errors: 66, current rate: 18.00 done/second
merge - completed: 995, errors: 66, current rate: 5.00 done/second
merge - completed: 996, errors: 66, current rate: 1.00 done/second
merge - completed: 998, errors: 66, current rate: 2.00 done/second
merge - completed: 999, errors: 66, current rate: 1.00 done/second
merge - completed: 999, errors: 66, current rate: 0.00 done/second
merge - completed: 999, errors: 66, current rate: 0.00 done/second
completed: 1000, errors: 66, current rate: 5.27 done/second

Histogram (ms):
1       0
2       0
5       0
7       0
10      0
15      0
25      0
50      0
75      601
100     671
250     672
350     672
500     696
750     740
1000    765
5000    896
min     54
max     12022
total   934
```

==== Weak ownership ON ====

```sh
❯ go run ./cmd/lakectl abuse merge --amount 1000 --parallelism 50 lakefs://abuse/main
Source branch: lakefs://abuse/main
merge - completed: 36, errors: 0, current rate: 35.23 done/second
merge - completed: 86, errors: 0, current rate: 49.98 done/second
merge - completed: 136, errors: 0, current rate: 50.03 done/second
merge - completed: 185, errors: 0, current rate: 48.99 done/second
merge - completed: 236, errors: 0, current rate: 51.02 done/second
merge - completed: 286, errors: 0, current rate: 49.99 done/second
merge - completed: 337, errors: 0, current rate: 50.97 done/second
merge - completed: 390, errors: 0, current rate: 53.03 done/second
merge - completed: 438, errors: 0, current rate: 48.01 done/second
merge - completed: 487, errors: 0, current rate: 49.00 done/second
merge - completed: 534, errors: 0, current rate: 46.98 done/second
merge - completed: 581, errors: 0, current rate: 46.99 done/second
merge - completed: 632, errors: 0, current rate: 51.00 done/second
merge - completed: 680, errors: 0, current rate: 48.04 done/second
merge - completed: 725, errors: 0, current rate: 44.98 done/second
merge - completed: 771, errors: 0, current rate: 45.99 done/second
merge - completed: 815, errors: 0, current rate: 44.02 done/second
merge - completed: 861, errors: 0, current rate: 46.01 done/second
merge - completed: 905, errors: 0, current rate: 43.98 done/second
merge - completed: 947, errors: 0, current rate: 42.00 done/second
merge - completed: 977, errors: 0, current rate: 30.01 done/second
merge - completed: 997, errors: 0, current rate: 19.99 done/second
completed: 1000, errors: 0, current rate: 4.92 done/second

Histogram (ms):
1       0
2       0
5       0
7       0
10      0
15      0
25      0
50      0
75      457
100     464
250     468
350     468
500     642
750     647
1000    729
5000    952
min     54
max     13744
total   1000
```
- Add some jitter when acquiring ownership on a branch
- Refresh _for_ refresh interval, twice _every_ refresh interval
- nolint unjustified warnings
@arielshaqed
Copy link
Contributor Author

@arielshaqed looks like amazing work, thank you! I haven't reviewed it yet, but a few general notes:

Thanks!

  1. Do we have an experiment plan for this feature? Which customers/users, for how long, etc.

Yes. Firstly, this PR adds a lakectl abuse merge command to measure. After we pull I will deploy to Cloud staging, switch it on for a single private installation, and measure. Also as you probably know, we have a user for whom I plan this. I will communicate with them to measure on their system; obviously timing for that depends on the customer timing and how comfortable they feel switching it on.

  1. The term "weak" is a bit inaccurate here IMO:
    The term "weak" as defined in places like CPP's std::weak_ptr or Rust's std::sync::Weak means a "non-owning reference", which is different than how this mechanism of branch locking works - each worker owns the branch exclusively, but only for a short period, and with a graceful mechanism for giving up exclusive ownership.
    The term "lease-based ownership" suites better here IMO.

In the C++ and Rust standard libraries, "weak" in this case refers to the pointer. (In other cases in the C++ standard library, "weak" can refer to any variant which is logically weaker, for instance we also have class weak_ordering. In this PR "weak" is supposed to weaken "ownership".

What we have in this PR indeed uses a lease-based implementation. But it's not ownership, it just behaves like ownership in almost all reasonable cases.

A better term might be "mostly correct ownership"1. In fact, I believe that such an unwieldy name would be a good choice! If you agree I will change to use it.

Footnotes

  1. The technical term "mostly" here is intended in same sense that Miracle Max uses it.

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything except for the tests. Some really good things here!

//
// So it *cannot* guarantee correctness. However it usually works, and if
// it does work, the owning goroutine wins all races by default.
type WeakOwner struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that this shouldn't be in kv pkg. It deserves a standalone ownership/owner/ anything else package. The kv is an implementation detail of the ownership, so this struct should be called KVWeakOwner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's do this after we agree on a name? @yonipeleg33 requested a name change, I'd rather do a single name change towards the end of the review.

@@ -3031,6 +3031,25 @@ lakectl abuse list <source ref URI> [flags]



### lakectl abuse merge

Merge nonconflicting objects to the source branch in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Merge nonconflicting objects to the source branch in parallel
Merge non-conflicting objects to the source branch in parallel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will auto-generate this :-)

pkg/graveler/ref/manager.go Show resolved Hide resolved
requestID = *requestIDPtr
}
if m.branchOwnership != nil {
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID))
ownRelease, err := m.branchOwnership.Own(ctx, requestID, string(branchID))

or

Suggested change
own, err := m.branchOwnership.Own(ctx, requestID, string(branchID))
ownClose, err := m.branchOwnership.Own(ctx, requestID, string(branchID))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Changed to release(), which I hope is even nicer.

Comment on lines 17 to 18
//nolint:stylecheck
var finished = errors.New("finished")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix linting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a sentinel not an actual error. See for instance io.EOF. It is used only in sleepFor to indicate that the context expired with no error. But it turns out that there are special cases in contest.WithCancelCause so we don't need it.

Removed, thanks!

// Do NOT attempt to delete, to avoid
// knock-on effects destroying the new
// owner.
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you check the error and exit the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported the error, and I do not think there is any useful action to take.

Aborting the action here, in particular, will probably livelock in some situations!

This goroutine has nothing waiting for it, and there really is nothing much to do with broken ownership. "All" that happens is that performance will suffer! This goroutine for the former owner thinks that another action stole its lease. So it makes sense to continue - it is likely to win the race to update, because it has already performed some work. Meanwhile the other action thinks that the former owner has stopped and will never finish - it should continue to avoid zombielock (a name I just invented for livelock against a stuck thread).

Which one is right? Well, if this goroutine is still running then the other action will most likely suffer. If it isn't running, it cannot do anything useful anyway.

log.WithFields(logging.Fields{
"new_owner": ownership.Owner,
}).Info("Lost ownership race")
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining why the code continues to try and own the key. Assuming that it should, I worry that this across multiple routines may have a cascading effect of kv calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment to explain. But note that "all" that happens is that we fall back to the previous behaviour. Frankly I believe that the common cause for A losing a race to B will be that the clock on B is a bit too fast; this does not cause chains of lost races. The other common cause for A losing a race to B will be that A was stuck; again, that does not give it or any other thread an advantage against B. The final case is when all are badly stuck - for instance if you configure very short acquire and refresh intervals. In that case you do get what you would expect: a stuck system.


var abuseMergeCmd = &cobra.Command{
Use: "merge <branch URI>",
Short: "Merge nonconflicting objects to the source branch in parallel",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Short: "Merge nonconflicting objects to the source branch in parallel",
Short: "Merge non-conflicting objects to the source branch in parallel",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

amount := Must(cmd.Flags().GetInt("amount"))
parallelism := Must(cmd.Flags().GetInt("parallelism"))

fmt.Println("Source branch:", u)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Println("Source branch:", u)
fmt.Println("Source branch: ", u)

cmd/lakectl/cmd/abuse_merge.go Show resolved Hide resolved
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review!

PTAL...


var abuseMergeCmd = &cobra.Command{
Use: "merge <branch URI>",
Short: "Merge nonconflicting objects to the source branch in parallel",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/lakectl/cmd/abuse_merge.go Show resolved Hide resolved
@@ -3031,6 +3031,25 @@ lakectl abuse list <source ref URI> [flags]



### lakectl abuse merge

Merge nonconflicting objects to the source branch in parallel
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will auto-generate this :-)

// branch IF a previous worker crashed while owning that branch. It
// has no effect when there are no crashes. Reducing it increases
// write load on the branch ownership record when concurrent
// operations occur. This.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 440 to 444
requestIDPtr := httputil.RequestIDFromContext(ctx)
requestID := "<unknown>"
if requestIDPtr != nil {
requestID = *requestIDPtr
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Either way will have strange performance characteristics, of course.

// owning key.
func (w *WeakOwner) Own(ctx context.Context, owner, key string) (func(), error) {
prefixedKey := []byte(fmt.Sprintf("%s/%s", w.Prefix, key))
err := w.startOwningKey(context.Background(), owner, prefixedKey)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusion about how cancellation would work. Restored, thanks!

pkg/kv/util/weak_owner.go Outdated Show resolved Hide resolved
stopOwning := func() {
defer refreshCancel()
// Use the original context - in case cancelled twice.
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why not: this key is on my partition, with my prefix, and I set it starting at l. 245. It serves to indicate ownership, and its value holds only ownership. I think I can delete it.

message WeakOwnership {
string owner = 1;
google.protobuf.Timestamp expires = 2;
string comment = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose a system is behaving really strangely. I can look at its ownership partition and determine which operation owns it -- that's what I put here when I took ownership. I call it "comment" rather than something else because I don't want to commit to any format of what exactly goes in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

This does not depend on KV implementation, so just add a matrix to one of
the AWS/S3 flavours.
Otherwise no way to tell Esti _really_ ran it with ownership.
@itaiad200
Copy link
Contributor

@arielshaqed could you open an issue and link to it? This is hardly a minor-change :)

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review wip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 215 to 217
if errors.Is(err, kv.ErrNotFound) {
predicate = nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hope kv contract tests cover this

stopOwning := func() {
defer refreshCancel()
// Use the original context - in case cancelled twice.
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It holds ownership, but it's not guaranteed to be your routine ownership (unless I'm missing something). Some other thread might successfully set itself as owner and your cleanup code breaks it.

@arielshaqed
Copy link
Contributor Author

@arielshaqed could you open an issue and link to it? This is hardly a minor-change :)

#8288 :-)

@arielshaqed
Copy link
Contributor Author

I really hope kv contract tests cover this

That line no longer occurs directly. We do still have it implied. I believe that it is tested by kvtest here and here.

If ownership has already been lost to another thread, do NOT delete
ownership when released.

- KV does not provide a DeleteIf operation.  Instead, use SetIf with an
  always-expired timestamp.
- Along the way, ensure "owner" string is truly unique by stringing a nanoid
  onto it.  Currently owner is the request ID, which should be unique - but
  adding randomness ensures it will always be unique regardless of the
  calling system.
Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

PTAaaAAaaAAL...

stopOwning := func() {
defer refreshCancel()
// Use the original context - in case cancelled twice.
err := w.Store.Delete(ctx, []byte(weakOwnerPartition), prefixedKey)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DeleteIf operation sure would be nice here. Since we didn't spec one out, I'll replace it with a sentinel value instead.

@itaiad200
Copy link
Contributor

Sorry for the late feedback, I am reviewing it very cautiously.

I wonder if wrapping BranchUpdate is the appropriate location to "Own" a branch. Every major branch changing operation, like commit & merge has 2 BranchUpdate calls. One for changing the staging token and one for the actual operation. If these 2 operations don't happen one after the other.
The current ownership model only wraps a single BranchUpdate call, thus releasing the lock prematurely.

@arielshaqed
Copy link
Contributor Author

arielshaqed commented Oct 14, 2024

Sorry for the late feedback, I am reviewing it very cautiously.

I wonder if wrapping BranchUpdate is the appropriate location to "Own" a branch. Every major branch changing operation, like commit & merge has 2 BranchUpdate calls. One for changing the staging token and one for the actual operation. If these 2 operations don't happen one after the other. The current ownership model only wraps a single BranchUpdate call, thus releasing the lock prematurely.

I'm not sure this matters. There is no connection between the two update operations: my commit can succeed even if you were the last to seal staffing tokens. And of course sealing is cheap. So I would not expect this to affect performance.

Of course the easy way to be sure will be to perform the experiment.

Otherwise it doesn't even say how long it took - which is the most
interesting part for `abuse merge`.
@arielshaqed
Copy link
Contributor Author

Testing on lakeFS Cloud (staging)

Setup

  • Cloud installation on staging (codename popular-mastodon 🐘).
    • Boosted it to 500 RCUs + 50 WCUs.
  • Replaced image with an experimental image built from the branch, tagged v1.38.0-15-g0ec60--hinted-merger.1.
  • Run the lakectl abuse merge command that this PR provides.
  • Run from remote (home); this does not matter much because we perform 1.5-2 merges/s.
  • In one case (* below) I noticed a gateway error so I reran to get better numbers. This was a test with the feature turned off.

Results

Weak ownershipParallelismResults
3*
          First error: merge from 92: [502 Bad Gateway]: request failed
          200 total with 15 failures in 2m0.220660011s
          1.663608 successes/s
          0.124771 failures/s
Gateway failure and 11 branches actually not created. Re-ran.
3
	  First error: merge from 88: [423 Locked]: Too many attempts, try again later request failed
          200 total with 1 failures in 2m11.462751261s
          1.521343 successes/s
          0.007607 failures/s
6
	  First error: merge from 2: [423 Locked]: Too many attempts, try again later request failed
          200 total with 19 failures in 1m49.366288753s
          1.828717 successes/s
          0.173728 failures/s
20 (fastest! worst!)
          First error: merge from 22: [423 Locked]: Too many attempts, try again later request failed
	  200 total with 52 failures in 1m36.082933636s
          2.081535 successes/s
          0.541199 failures/s
3
	  200 total with 0 failures in 2m1.140085198s
          1.650981 successes/s
          0.000000 failures/s
6
          200 total with 0 failures in 1m46.569144906s
          1.876716 successes/s
          0.000000 failures/s
20
	  200 total with 0 failures in 1m49.595498987s
          1.824892 successes/s
          0.000000 failures/s

Analysis

Without weak ownership, contention always caused failures. With concurrency 20, > 25% of merges failed. Each failure actually helps concurrent merges, so failures speed things up.

With weak ownership, we saw no errors. The merge rate improved slightly from 3 to 6, and then stayed flat.

Only report errors.  Obviously if not all branches deleted then we left a
mess, which is too bad.  But the performance test itself succeeded, which is
the (more) important thing.
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with few minor comments.
Thank you for your patience!

log.Info("Lost ownership race before trying to release")
return nil
}
ownership.Owner = fmt.Sprintf("[released] %s", ownership.Owner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to change this. Let's keep the Owner as is, you have the 0 Expires field and Comment if you need to add info. It's best to keep the same format if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"prefixed_key": string(prefixedKey),
"owner": owner,
"new_owner": ownership.Owner,
}).Info("Lost ownership race while trying to release")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}).Info("Lost ownership race while trying to release")
}).Debug("Lost ownership race while trying to release")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not done because I like knowing about these races.

log = log.WithField("new_owner", ownership.Owner)

if ownership.Owner != owner {
log.Info("Lost ownership race before trying to release")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("Lost ownership race before trying to release")
log.Debug("Lost ownership race before trying to release")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, as rare as before, and even more interesting. Keeping it at info.

// write operation.
//
// - While ownership is held, add 2.5 read and 2.5 write operation
// per second, and an additional ~7 read and write operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

Suggested change
// per second, and an additional ~7 read and write operations
// per second, and an additional ~7 read operations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But also an additional write operation actually to acquire it. So made the explanation even longer.

// - While ownership is held, add 2.5 read and 2.5 write operation
// per second, and an additional ~7 read and write operations
// per second per branch operation waiting to acquire ownership.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we added some one time costs with releaseIf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, added those!

if ownership.Owner != owner {
log.WithFields(logging.Fields{
"new_owner": ownership.Owner,
}).Info("Lost ownership race")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}).Info("Lost ownership race")
}).Trace("Lost ownership race")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done. Losing an ownership race should be rare. And when it does happen, it stops the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to move this to a proper location, this is not a kv util :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! Next commit :-)

Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now... move and rename!

@yonipeleg33 & @itaiad200 : I found no recognized term for this kind of technique. I will go with "mostly-correct ownership", based on this paper by Dr. M Max.

// write operation.
//
// - While ownership is held, add 2.5 read and 2.5 write operation
// per second, and an additional ~7 read and write operations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But also an additional write operation actually to acquire it. So made the explanation even longer.

// - While ownership is held, add 2.5 read and 2.5 write operation
// per second, and an additional ~7 read and write operations
// per second per branch operation waiting to acquire ownership.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, added those!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! Next commit :-)

if ownership.Owner != owner {
log.WithFields(logging.Fields{
"new_owner": ownership.Owner,
}).Info("Lost ownership race")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done. Losing an ownership race should be rare. And when it does happen, it stops the loop.

log = log.WithField("new_owner", ownership.Owner)

if ownership.Owner != owner {
log.Info("Lost ownership race before trying to release")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, as rare as before, and even more interesting. Keeping it at info.

log.Info("Lost ownership race before trying to release")
return nil
}
ownership.Owner = fmt.Sprintf("[released] %s", ownership.Owner)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"prefixed_key": string(prefixedKey),
"owner": owner,
"new_owner": ownership.Owner,
}).Info("Lost ownership race while trying to release")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not done because I like knowing about these races.

1. It's not a KV util, move it out into distributed.
1. "Weak" is overloaded, so instead call it "mostly correct" ownership.  It
   precisely describes what it is so it must be a good term of art for what
   we do here.
1. Use the term "branch approximate ownership" at a higher-level in the ref
   manager.  This context doesn't particularly mind the specific properties
   of ownership, and "approximate" is a good fit there.
@arielshaqed
Copy link
Contributor Author

Thanks! Pulling... time flies by so quickly when you're having fun...

@arielshaqed arielshaqed enabled auto-merge (squash) October 28, 2024 10:56
@arielshaqed
Copy link
Contributor Author

" Run latest lakeFS app on AWS S3 Expected — Waiting for status to be reported "

  • this is literally finished, today I am not a GitHub fan.

@arielshaqed arielshaqed merged commit ae4bc01 into master Oct 28, 2024
39 checks passed
@arielshaqed arielshaqed deleted the feature/single-hinted-merger branch October 28, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cataloger Improvements or additions to the cataloger include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experiment: Reduce contention on branch HEADs to improve merge performance
3 participants