-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor part 2 - separate merge from diff iterator [change-log] optimize merge by 20 percent #2884
Conversation
pkg/graveler/committed/manager.go
Outdated
defer metaRangeIterator.Close() | ||
summary, err = Apply(ctx, mwWriter, metaRangeIterator, diffs, &ApplyOptions{}) | ||
|
||
summary, err = Merge(ctx, mwWriter, baseIt, sourceIt, destIt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove summary if it isn't supported anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
summary graveler.DiffSummary | ||
} | ||
|
||
// baseGE returns range from base iterator, which is greater or equal than the given key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name and description are quite confusing, it doesn't really do as described.
IIUC the function does seekGE but from the current location. This isn't intuitive when using iterators. please document well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
// baseGE returns value from base iterator, which is greater or equal than the given key | ||
func (m *merger) baseKeyGE(key graveler.Key) (*graveler.ValueRecord, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in baseRangeGE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
if err := m.base.Err(); err != nil { | ||
return nil, err | ||
} | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := m.base.Err(); err != nil { | |
return nil, err | |
} | |
return nil, nil | |
return nil, m.base.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if err := m.base.Err(); err != nil { | ||
return nil, err | ||
} | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := m.base.Err(); err != nil { | |
return nil, err | |
} | |
return nil, nil | |
return nil, m.base.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
if baseValue != nil && bytes.Equal(sourceValue.Identity, baseValue.Identity) { // dest deleted this record | ||
m.haveSource = m.source.Next() | ||
} else { | ||
if baseValue != nil && bytes.Equal(sourceValue.Key, baseValue.Key) { // conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if baseValue != nil && bytes.Equal(sourceValue.Key, baseValue.Key) { // conflict | |
if baseValue != nil && bytes.Equal(sourceValue.Key, baseValue.Key) { // deleted by dest and changed by source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
if baseValue != nil && bytes.Equal(destValue.Identity, baseValue.Identity) { // source deleted this record | ||
m.haveDest = m.dest.Next() | ||
} else { | ||
if baseValue != nil && bytes.Equal(destValue.Key, baseValue.Key) { // conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if baseValue != nil && bytes.Equal(destValue.Key, baseValue.Key) { // conflict | |
if baseValue != nil && bytes.Equal(destValue.Key, baseValue.Key) { // deleted by source added by dest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
return graveler.ErrConflictFound | ||
} | ||
} | ||
// both added the same record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not precise, there is also the (more common) case of no change
sourceValue.Identity == destValue.Identity == baseValue.Identity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
m.haveDest = m.dest.NextRange() | ||
return nil | ||
} else { | ||
return graveler.ErrConflictFound // conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No conflicts on ranges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work!
I didn't finish reviewing yet, but since there is already one critical comment I'm publishing my review. Will continue tomorrow
nessie/identity_test.go
Outdated
@@ -53,7 +53,7 @@ func TestIdentity(t *testing.T) { | |||
|
|||
resp, err := client.MergeIntoBranchWithResponse(ctx, repo, branch1, branch2, api.MergeIntoBranchJSONRequestBody{}) | |||
require.NoError(t, err, "error during merge") | |||
require.NotNil(t, resp.JSON400, "merge should fail since there are no changes between the branches") | |||
require.Nil(t, resp.JSON400, "allow merge with no changes between the branches") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a strange test to check specifically that the 400 error is empty - should probably check that the returned code is not empty (200?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/api/controller_test.go
Outdated
if err != nil { | ||
t.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil { | |
t.Fatal(err) | |
} | |
testutil.Must(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge_test.go
Outdated
"fmt" | ||
"testing" | ||
|
||
"github.com/go-test/deep" | ||
"github.com/stretchr/testify/assert" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
organize imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done - there's an empty line here
@@ -84,7 +86,6 @@ func Test_merge(t *testing.T) { | |||
destRange []testRange | |||
expectedActions []writeAction | |||
conflictExpectedIdx *int | |||
expectedSummary graveler.DiffSummary | |||
expectedErr error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always nil in this test: not needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -239,8 +235,14 @@ func Test_merge(t *testing.T) { | |||
conflictExpectedIdx: nil, | |||
expectedActions: []writeAction{ | |||
{ | |||
action: actionTypeWriteRange, | |||
rng: committed.Range{ID: "base:k1-k3", MinKey: committed.Key("k1"), MaxKey: committed.Key("k3"), Count: 2, EstimatedSize: 1024}, | |||
action: actionTypeWriteRecord, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change mean there was a regression in the optimization? (also applies to other changes in this test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not a common case so I think we can skip it
pkg/graveler/committed/merge.go
Outdated
// applyBothRanges applies merging when both source and dest on range header | ||
func (m *merger) applyBothRanges(sourceRange *Range, destRange *Range) error { | ||
switch { | ||
case sourceRange.ID == destRange.ID: // source and dest added the same range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// source and dest added the same range
Or, more likely, nobody added anything, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
pkg/graveler/committed/merge.go
Outdated
func (m *merger) baseRangeGE(key graveler.Key) (*Range, error) { | ||
for { | ||
_, baseRange := m.base.Value() | ||
if baseRange != nil && bytes.Compare(baseRange.MaxKey, key) >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for iterating rather than doing SeekGE?
Is it a performance thing?
Naming suggestion: can be called something like "findRangeInBaseIterator" (not good enough, just suggesting a sentiment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It's a performance improvement. Instead of doing seekGE from the beginning of base iterator each time, it continues from its current location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a benchmark to validate that these two functions really improve something, and document it?
pkg/graveler/committed/merge.go
Outdated
if baseRange != nil && (sourceRange.ID == baseRange.ID || destRange.ID == baseRange.ID) { | ||
if sourceRange.ID == baseRange.ID { // dest added changes | ||
err = m.writeRange(destRange) | ||
} else { | ||
err = m.writeRange(sourceRange) // source added changes | ||
} | ||
if err != nil { | ||
return err | ||
} | ||
m.haveSource = m.source.NextRange() | ||
m.haveDest = m.dest.NextRange() | ||
} else { // enter both ranges | ||
m.haveSource = m.source.Next() | ||
m.haveDest = m.dest.Next() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, more readable like this, but it's matter of preference:
if baseRange != nil && (sourceRange.ID == baseRange.ID || destRange.ID == baseRange.ID) { | |
if sourceRange.ID == baseRange.ID { // dest added changes | |
err = m.writeRange(destRange) | |
} else { | |
err = m.writeRange(sourceRange) // source added changes | |
} | |
if err != nil { | |
return err | |
} | |
m.haveSource = m.source.NextRange() | |
m.haveDest = m.dest.NextRange() | |
} else { // enter both ranges | |
m.haveSource = m.source.Next() | |
m.haveDest = m.dest.Next() | |
} | |
if baseRange == nil { | |
fallthrough; | |
} | |
if sourceRange.ID == baseRange.ID { | |
err = m.writeRange(destRange) | |
} else if destRange.ID == baseRange.ID { | |
err = m.writeRange(sourceRange) | |
} else { | |
fallthrough; | |
} | |
if err != nil { | |
return err | |
} | |
m.haveSource = m.source.NextRange() | |
m.haveDest = m.dest.NextRange() |
pkg/graveler/committed/merge.go
Outdated
m.haveSource = m.source.NextRange() | ||
return nil | ||
} | ||
return graveler.ErrConflictFound // conflict - both changed this range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+add test case that covers this code
pkg/graveler/committed/merge.go
Outdated
return nil | ||
} | ||
if baseRange == nil || destRange.ID == baseRange.ID { // source added this range | ||
err = m.writeRange(sourceRange) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bug here, adding a test case which I think will reflect it (add this to Test_merge
).
I hope this is a valid case.
@guy-har, would love your input here as well.
"is it a bug?": {
baseRange: []testRange{
{
rng: committed.Range{ID: "base:b-c", MinKey: committed.Key("b"), MaxKey: committed.Key("c"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "b", identity: "b"}, {key: "c", identity: "c"}},
},
{
rng: committed.Range{ID: "base:d-e", MinKey: committed.Key("d"), MaxKey: committed.Key("e"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "d", identity: "d"}, {key: "e", identity: "e"}},
},
},
sourceRange: []testRange{
{
rng: committed.Range{ID: "source:a-d", MinKey: committed.Key("a"), MaxKey: committed.Key("d"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "a", identity: "a"}, {key: "b", identity: "b"}, {key: "d", identity: "d"}},
},
{
rng: committed.Range{ID: "source:e", MinKey: committed.Key("e"), MaxKey: committed.Key("e"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "e", identity: "e1"}},
},
},
destRange: []testRange{
{
rng: committed.Range{ID: "base:b-c", MinKey: committed.Key("b"), MaxKey: committed.Key("c"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "b", identity: "b"}, {key: "c", identity: "c"}},
},
{
rng: committed.Range{ID: "dest:d-e", MinKey: committed.Key("d"), MaxKey: committed.Key("e"), Count: 2, EstimatedSize: 1024},
records: []testValueRecord{{key: "d", identity: "d1"}, {key: "e", identity: "e"}},
},
},
conflictExpectedIdx: nil,
expectedActions: []writeAction{
{
action: actionTypeWriteRecord,
key: "a",
identity: "a",
},
{
action: actionTypeWriteRecord,
key: "b",
identity: "b",
},
{
action: actionTypeWriteRecord,
key: "d",
identity: "d1",
},
{
action: actionTypeWriteRecord,
key: "e",
identity: "e1",
},
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the bug and added this test, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting changes because the fast forward optimization for source == base
was removed
pkg/graveler/committed/manager.go
Outdated
if source == base { | ||
// no changes on source | ||
return "", summary, graveler.ErrNoChanges | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed? even if it's not an error anymore we could still return destination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it wasn't supposed to be removed. I added it back without error.
pkg/graveler/graveler.go
Outdated
@@ -640,7 +640,7 @@ type CommittedManager interface { | |||
// Merge applies changes from 'source' to 'destination', relative to a merge base 'base' and | |||
// returns the ID of the new metarange and a summary of diffs. This is similar to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// returns the ID of the new metarange and a summary of diffs. This is similar to a | |
// returns the ID of the new metarange. This is similar to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sourceRange: []testRange{ | ||
{ | ||
rng: committed.Range{ID: "source:a-d", MinKey: committed.Key("a"), MaxKey: committed.Key("d"), Count: 2, EstimatedSize: 1024}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the test is called dest change key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed its name
pkg/graveler/committed/merge_test.go
Outdated
expectedErr: graveler.ErrNoChanges, | ||
expectedErr: nil, | ||
}, | ||
"source and dest change same range": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"source and dest change same range": { | |
"dest removed range and added range after source removed range edges": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is the best name, but it's a bit of a weird test, it's not that both changed the same range. It's that source changes are included in dest ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge_test.go
Outdated
}, | ||
}, | ||
}, | ||
"dest over source left": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dest over source left": { | |
"dest removed all source added": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"same identity different key": { | ||
baseRange: []testRange{ | ||
{ | ||
rng: committed.Range{ID: "base:a-d", MinKey: committed.Key("a"), MaxKey: committed.Key("d"), Count: 2, EstimatedSize: 1024}, | ||
records: []testValueRecord{{key: "a", identity: "a"}, {key: "d", identity: "d"}}, | ||
}, | ||
}, | ||
sourceRange: []testRange{ | ||
{ | ||
rng: committed.Range{ID: "source:a-d", MinKey: committed.Key("a"), MaxKey: committed.Key("d"), Count: 4, EstimatedSize: 1024}, | ||
records: []testValueRecord{{key: "a", identity: "a"}, {key: "a1", identity: "a"}, {key: "c", identity: "c"}}, | ||
}, | ||
}, | ||
destRange: []testRange{ | ||
{ | ||
rng: committed.Range{ID: "dest:a-d", MinKey: committed.Key("a"), MaxKey: committed.Key("d"), Count: 2, EstimatedSize: 1024}, | ||
records: []testValueRecord{{key: "a", identity: "a"}, {key: "d", identity: "d"}}, | ||
}, | ||
}, | ||
conflictExpectedIdx: nil, | ||
expectedActions: []writeAction{ | ||
{ | ||
action: actionTypeWriteRecord, | ||
key: "a", | ||
identity: "a", | ||
}, | ||
{ | ||
action: actionTypeWriteRecord, | ||
key: "a1", | ||
identity: "a", | ||
}, | ||
{ | ||
action: actionTypeWriteRecord, | ||
key: "c", | ||
identity: "c", | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
pkg/catalog/model.go
Outdated
@@ -46,7 +46,6 @@ type CommitLog struct { | |||
} | |||
|
|||
type MergeResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is not really needed anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -18,27 +18,8 @@ type compareIterator struct { | |||
err error | |||
} | |||
|
|||
type mergeIterator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
pkg/graveler/committed/merge.go
Outdated
func (m *merger) baseRangeGE(key graveler.Key) (*Range, error) { | ||
for { | ||
_, baseRange := m.base.Value() | ||
if baseRange != nil && bytes.Compare(baseRange.MaxKey, key) >= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a benchmark to validate that these two functions really improve something, and document it?
pkg/graveler/committed/merge_test.go
Outdated
"fmt" | ||
"testing" | ||
|
||
"github.com/go-test/deep" | ||
"github.com/stretchr/testify/assert" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done - there's an empty line here
pkg/graveler/committed/merge.go
Outdated
} | ||
|
||
// moveBaseToGERange moves base iterator (from current point) to range which is greater or equal than the given key | ||
func (m *merger) moveBaseToGERange(key graveler.Key) (*Range, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Range which is greater or equal than the given key" is not too well-defined: does it mean that this range starts after this key?
I think this method can be changed so that the code calling it will become simpler (see other comments)
pkg/graveler/committed/merge.go
Outdated
m.haveSource = m.source.NextRange() | ||
return nil | ||
} | ||
if baseRange == nil || destRange.ID == baseRange.ID { // source added this range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check destRange.ID == baseRange.ID
is irrelevant, and causes an optimization bug. At this point, destRange
starts after sourceRange
ends. Whether or not baseRange
is the same range is not interesting. Only thing that matters is if baseRange intersects with sourceRange or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments. Looks very good!
Minor comments + would still love to see a benchmark justifying the existence of getNextGEKey
pkg/api/controller.go
Outdated
} | ||
} | ||
return result | ||
writeResponse(w, http.StatusOK, MergeResult{Reference: res}) // optimize returning unknown summary = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not clear - can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/graveler/committed/merge.go
Outdated
return nil, nil | ||
} | ||
|
||
// getNextOverlappingFromBase moves base iterator (from current point) to the next overlap range with rangeToOverlap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// getNextOverlappingFromBase moves base iterator (from current point) to the next overlap range with rangeToOverlap | |
// getNextOverlappingFromBase moves base iterator from its current position to the next range overlapping with rangeToOverlap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for the review @guy-har @johnnyaug ! |
GREAT JOB 👍We really needed this refactor, Thanks!! |
closes #2784
Done - Refactor part 1: separate commit and merge
Refactor part 2:
Next: