Fix merge large changes performance #1652

guy-har · 2021-03-18T12:48:48Z

No description provided.

arielshaqed

Reviewed apply and apply_test (only). Not sure how it passes tests; I think there's a bug, which would mean we need more test coverage. OR (even more likely) I don't understand the code. Either is a good reason to pause reviewing and give you a chance to respond!
Sorry...

pkg/graveler/committed/apply.go

arielshaqed · 2021-03-18T13:34:58Z

pkg/graveler/committed/apply.go

+		iterValue, iterRange := iter.Value()
+		if iterValue == nil {
+			if iterRange.IsTombstone() {
+				// internal error but no data lost: deletion requested of a


This is not counted, I assume on purpose because it's a probably bug? Maybe still count unjustified deletes just to be sure...

applyFromIter only returns addition because it runs at when there is no iterator to compare with, so it should have only additions. I could add "deletions" to it, but I don't think It should be counted. @arielshaqed what do you think ?

Dunno. I'd count it as a separate category maybe, that way if it starts showing up we have a chance of someone complaining.

pkg/graveler/committed/apply.go

arielshaqed · 2021-03-18T13:40:42Z

pkg/graveler/committed/apply.go

+			switch {
+			case bytes.Compare(diffRange.MaxKey, sourceRange.MinKey) < 0:
+				// insert diff
+				writer.WriteRange(*diffRange)


Need to check all Write* values :-/

Still need to check all Write* values.

arielshaqed · 2021-03-18T13:42:39Z

pkg/graveler/committed/apply.go

+				addIntoDiffSummary(&ret, graveler.DiffTypeRemoved, int(diffRange.Count))
+				haveSource = source.Next()
+				haveDiffs = diffs.Next()
+			default:


👉 I don't understand: are these all the options? What about when sourceRange and diffRange have overlapping keys?

source [min].....[max] ------------------------------------------------------------------------------------------------ diffs [min...........max]

In case of overlapping keys, we call next on both ranges In order to compare keys.

pkg/graveler/committed/apply.go

codecov-io · 2021-03-21T07:26:06Z

Codecov Report

Merging #1652 (9ba730e) into master (35e5261) will increase coverage by 0.48%.
The diff coverage is 73.63%.

❗ Current head 9ba730e differs from pull request most recent head dda29e5. Consider uploading reports for the commit dda29e5 to get more accurate results

@@            Coverage Diff             @@
##           master    #1652      +/-   ##
==========================================
+ Coverage   39.28%   39.76%   +0.48%     
==========================================
  Files         167      171       +4     
  Lines       13563    14000     +437     
==========================================
+ Hits         5328     5567     +239     
- Misses       7471     7652     +181     
- Partials      764      781      +17

Impacted Files	Coverage Δ
pkg/graveler/committed/manager.go	`24.13% <18.18%> (+24.13%)`	⬆️
pkg/graveler/committed/diff_values.go	`38.46% <38.46%> (ø)`
pkg/graveler/committed/wrappers.go	`42.85% <42.85%> (ø)`
pkg/graveler/committed/apply.go	`57.62% <56.39%> (+3.59%)`	⬆️
pkg/graveler/committed/merge_iterator.go	`90.14% <89.71%> (-1.11%)`	⬇️
pkg/graveler/committed/diff.go	`96.85% <95.09%> (-3.15%)`	⬇️
pkg/graveler/committed/merge_value_iterator.go	`100.00% <100.00%> (ø)`
pkg/graveler/committed/meta_range.go	`100.00% <100.00%> (ø)`
pkg/graveler/committed/range.go	`91.66% <100.00%> (+4.16%)`	⬆️
pkg/graveler/committed/value.go	`74.35% <0.00%> (-4.02%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e5261...dda29e5. Read the comment docs.

guy-har · 2021-03-21T07:58:40Z

@arielshaqed Thanks,
I made some changes and added some tests.
I am planning on adding some more tests.

arielshaqed

Thanks! Nice work, but many details in a large 🐘 PR.

A bit hard to review because some comments appear to have no responses and also no changes. Specifically I am unsure why we poll on cancellation.

I have some (ok, many) places I did not understand, and would prefer to go over those before I read the tests more deeply.

pkg/graveler/committed/apply_test.go

pkg/graveler/committed/diff.go

arielshaqed · 2021-03-21T10:03:03Z

pkg/graveler/committed/merge_iterator.go

-				continue
+			switch typ {
+			case graveler.DiffTypeAdded:
+				// exists on source, but not on dest


Can you document DiffTypeAdded better? I would expect this to be "exists on dest, but not on source" -- here it's the reverse?! (Maybe add pretty pictures...?)

This also wasn't added as part of this PR, the reason is that we are holding a diff iterator that does a 2 way diff between destination and source. I could also change it if it makes more sense. prefer not as part of this PR. @arielshaqed what do you think?

pkg/graveler/committed/merge_iterator.go

pkg/graveler/committed/merge_iterator_test.go

arielshaqed · 2021-03-21T10:08:27Z

pkg/graveler/committed/meta_range.go

+	Next() bool
+	// NextRange skips the current range
+	// possible only if we are currently inside a range
+	// the next value could be Range or Value


We could only get a range when one of the ranges is completely before the other range,
In case we have overlapping ranges we could not return range.

pkg/graveler/committed/meta_range.go

pkg/graveler/committed/merge_iterator_flat.go

pkg/graveler/committed/diff.go

pkg/graveler/committed/manager.go

pkg/graveler/committed/range.go

pkg/graveler/committed/diff_values.go

pkg/graveler/committed/apply.go

pkg/graveler/committed/apply_test.go

johnnyaug · 2021-03-22T17:07:59Z

Great optimization!!

pkg/graveler/committed/apply_test.go

guy-har · 2021-03-29T05:35:37Z

I have made changes according to the comments and added tests.
@arielshaqed @johnnyaug PTAL.

johnnyaug · 2021-03-29T10:50:35Z

pkg/graveler/committed/diff_test.go

-				actualDiffKeys = append(actualDiffKeys, string(it.Value().Key))
-				diffs = append(diffs, it.Value())
+				diff, rng := it.Value()
+				actualDiffRanges = append(actualDiffRanges, newExpectedRange(rng))


Looks like you're adding an "expected range" into an array of "actual range" - which means maybe the object type should simply be "range" or "testRange"?

both are already in use, changing to diffTestRange

pkg/graveler/committed/diff_test.go

pkg/graveler/committed/merge_iterator.go

pkg/graveler/committed/merge_iterator_test.go

arielshaqed · 2021-03-30T08:05:50Z

@guy-har , it is very hard for me to review a monolithic PR such as this one. I wrote in the previous round that:

A bit hard to review because some comments appear to have no responses and also no changes. Specifically I am unsure why we poll on cancellation.

Please note that I am trying to be very explicit about both these requests. The second remains open -- we poll on cancellation throughout the code, I am not sure why.

About the second: You marked most changes as resolved, but not all -- some just vanish into an "outdated" section. This makes it harder for me to track what came of each one. Sometimes the comment is marked "resolved" and but I do not understand the resolution. For instance I did not understand how you resolved this comment.

A short comment might help, particularly when the fix is nontrivial.

In any case, apologies for an expected long round-trip time.

guy-har · 2021-03-30T08:51:25Z

@arielshaqed I marked the resolved after changing according to the comment, I will go over them again.
according to polling on cancelation. I answered in both comments and marked as resolved, which made it not visible (sorry about that). My answer to the polling was that it was added as part of #1603, I basically just moved it around. We could decide we wan't to remove it, but I don't think it should be part of this PR.

arielshaqed

Thanks!

Nothing major (I hope), but must do these before pulling:

Document merge iterator NextRange and explain diff iterator NextRange when there is no range.
AFAICT one test uses deep.Equal on a struct with lowercase field names, so it does very little.

arielshaqed · 2021-03-30T15:06:21Z

pkg/graveler/committed/apply.go


-func incrementDiffSummary(d *graveler.DiffSummary, typ graveler.DiffType) {
-	addIntoDiffSummary(d, typ, 1)
+type applier struct {


I don't understand why this is an object. The only state it holds is a diff summary, which is anyway implemented as a function, and the two have* members. It is created once, then its apply method is called -- with no parameters. AFAICT it is just a function that has been represented as an object. Having it as an object makes me have to worry about whether it does actually have important state. Calls to functions that modify have* could be modeled as re-assigning some function-local state variables. Alternatively, if there is a useful abstraction for "iterator-and-the-last-value-from-next", then let's implement that and put the X Iterator and haveX into a single class named something like IteratorWithEnd.

If the goal is to have a nice struct with all the used parameters in it then just have a struct, no need to attach methods to it. But I would really question why passing a single parameter named a to a function is clearer than passing the named parameters to that function. E.g. as it stands, the type signature for addIntoDiffSumary doesn't indicate that it actually changes only a.summary, but does not advance any iterators or write ranges. Sure, the call a.addIntoDiffSummary(typ, n) is shorter than addIntoDiffSummary(summary, type, n), but I claim that is just because it gives less information, and forces me to go read the code.

Also nit: applier sounds like Go naming for an interface.

arielshaqed · 2021-03-30T15:11:24Z

pkg/graveler/committed/apply.go

+		iterValue, iterRange := iter.Value()
+		if iterValue == nil {
+			if iterRange.IsTombstone() {
+				// internal error but no data lost: deletion requested of a


Dunno. I'd count it as a separate category maybe, that way if it starts showing up we have a chance of someone complaining.

arielshaqed · 2021-03-30T15:12:44Z

pkg/graveler/committed/apply.go

-// applyFromSource applies all changes from source to writer.
-func applyFromSource(ctx context.Context, logger logging.Logger, writer MetaRangeWriter, source Iterator) error {
+// applyAll applies all changes from Iterator to writer and returns the number of writes
+func (a *applier) applyAll(iter Iterator) (int, error) {


E.g. because of the iterator/have separation above, I think this function is unsafe to call on iterator X unless haveX.

arielshaqed · 2021-03-30T15:13:54Z

pkg/graveler/committed/apply.go

-		case <-ctx.Done():
-			return 0, ctx.Err()
-		default:
+func (a *applier) hasChanges(summary graveler.DiffSummary) bool {


This is a pure function not a method.

arielshaqed · 2021-03-30T15:15:55Z

pkg/graveler/committed/apply.go

 				break
 			}
 		}
 	}
-	return source.Err()
+	return count, iter.Err()


E.g. after calling this method the caller has to increment the correct statistics field -- even though it sort-of has access to the statistics (it just doesn't know what type to increment).

arielshaqed · 2021-03-30T15:48:08Z

pkg/graveler/committed/merge_iterator_test.go

+	return string(rng.ID)
+}
+
+func TestMergeRange(t *testing.T) {


Nice test! 😎

Thanks 😄

pkg/graveler/committed/merge_iterator_test.go

arielshaqed · 2021-03-30T15:51:53Z

pkg/graveler/committed/meta_range.go

+	// the next value could be Range or Value
+	NextRange() bool
+	// Value returns a nil ValueRecord and a Range before starting a Range, or a Value and that Range when inside a Range.
+	// In contrast to Iterator, the DiffIterator might not have a current range -  this could happen if the current value exists in two different ranges


Can we explain what NextRange does in this case?

@arielshaqed , I changed the documentation of the DiffIterator, PTAL

pkg/graveler/committed/range.go

arielshaqed

Great pic!

LEBTM (looks even better to me), thanks.

arielshaqed · 2021-03-31T08:00:34Z

pkg/graveler/committed/meta_range.go

+// DiffIterator might contain ranges without headers
+// for example:
+//
+// left        [min].R1.[max]                     [min].R3.[max]        [min]...............R5..............[max]
+//        ------------------------------------------------------------------------------------------------
+// right                        [min].R2.[max]     [min.....R4....max]  [min].R6.[max] [min].R7.[max]
+//
+// R1 -  will return as diff with header
+// R2 - will return as diff with header
+// R3 and R4 - could not return a header because we must enter the ranges in order to get some header values (such as count)
+// R5 and R6 - same as R3 and R4
+// R7 - in case R5 has no values in the R7 range, R7 would return as a diff with header


Neat explanation, great drawing! I would maybe rephrase the tense to make it a bit shorter.

Suggested change

// DiffIterator might contain ranges without headers

// for example:

//

// left [min].R1.[max] [min].R3.[max] [min]...............R5..............[max]

// ------------------------------------------------------------------------------------------------

// right [min].R2.[max] [min.....R4....max] [min].R6.[max] [min].R7.[max]

//

// R1 - will return as diff with header

// R2 - will return as diff with header

// R3 and R4 - could not return a header because we must enter the ranges in order to get some header values (such as count)

// R5 and R6 - same as R3 and R4

// R7 - in case R5 has no values in the R7 range, R7 would return as a diff with header

// DiffIterator may contain ranges without headers. For instance, consider the diff of

// these ranges:

//

// left [min].R1.[max] [min].R3.[max] [min]...............R5..............[max]

// ------------------------------------------------------------------------------------------------

// right [min].R2.[max] [min.....R4....max] [min].R6.[max] [min].R7.[max]

//

// R1 - returned as a diff inside range R1.

// R2 - returned as a diff inside range R2.

// R3 and R4 - not in any existing range (and creating a new range would require consuming them).

// R5 and R6 - as above, not in any existing range.

// R7 - returned as a diff inside range R7 if R5 has no keys inside the range of R7.

Thanks 🙏

Merge with diff

30b932e

guy-har requested review from johnnyaug and arielshaqed March 18, 2021 12:48

arielshaqed requested changes Mar 18, 2021

View reviewed changes

Fix Append and add tests

d4a2190

guy-har requested a review from arielshaqed March 21, 2021 07:58

arielshaqed requested changes Mar 21, 2021

View reviewed changes

Code review changes

10ca997

johnnyaug suggested changes Mar 22, 2021

View reviewed changes

johnnyaug reviewed Mar 22, 2021

View reviewed changes

pkg/graveler/committed/apply_test.go Outdated Show resolved Hide resolved

Add tests and code review fixes

3819fdd

guy-har requested review from johnnyaug and arielshaqed March 29, 2021 05:35

johnnyaug suggested changes Mar 29, 2021

View reviewed changes

Code review changes

137625c

guy-har requested a review from johnnyaug March 30, 2021 06:23

johnnyaug approved these changes Mar 30, 2021

View reviewed changes

Add more tests

904458a

arielshaqed mentioned this pull request Mar 30, 2021

Check context during apply, diff and merge loops #1603

Merged

Remove new lines

9ba730e

arielshaqed approved these changes Mar 30, 2021

View reviewed changes

Code review changes

dda29e5

arielshaqed approved these changes Mar 31, 2021

View reviewed changes

guy-har linked an issue Mar 31, 2021 that may be closed by this pull request

Merge large changes performance #1439

Closed

guy-har changed the title ~~Merge with diff~~ Fix merge large changes performance Mar 31, 2021

guy-har merged commit d5605ac into master Mar 31, 2021

guy-har deleted the fix/merge-with-range branch March 31, 2021 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix merge large changes performance #1652

Fix merge large changes performance #1652

guy-har commented Mar 18, 2021

arielshaqed left a comment

arielshaqed Mar 18, 2021

guy-har Mar 30, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 18, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 18, 2021

guy-har Mar 21, 2021

codecov-io commented Mar 21, 2021 •

edited

Loading

guy-har commented Mar 21, 2021

arielshaqed left a comment

arielshaqed Mar 21, 2021

guy-har Mar 30, 2021

arielshaqed Mar 21, 2021

guy-har Mar 21, 2021

johnnyaug commented Mar 22, 2021

guy-har commented Mar 29, 2021

johnnyaug Mar 29, 2021

guy-har Mar 30, 2021

arielshaqed commented Mar 30, 2021

guy-har commented Mar 30, 2021

arielshaqed left a comment

arielshaqed Mar 30, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 30, 2021

arielshaqed Mar 30, 2021

guy-har Mar 31, 2021

arielshaqed Mar 30, 2021

guy-har Mar 31, 2021

arielshaqed left a comment

arielshaqed Mar 31, 2021

guy-har Mar 31, 2021

Fix merge large changes performance #1652

Fix merge large changes performance #1652

Conversation

guy-har commented Mar 18, 2021

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 21, 2021 • edited Loading

Codecov Report

guy-har commented Mar 21, 2021

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnnyaug commented Mar 22, 2021

guy-har commented Mar 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed commented Mar 30, 2021

guy-har commented Mar 30, 2021

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielshaqed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 21, 2021 •

edited

Loading