Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindMergeBase - Removing redundant commit node visits #2968

Merged
merged 6 commits into from
Feb 27, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 42 additions & 13 deletions pkg/graveler/ref/merge_base_finder.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,28 +26,57 @@ func FindMergeBase(ctx context.Context, getter CommitGetter, repositoryID gravel
reached := make(map[graveler.CommitID]reachedFlags)
reached[rightID] |= fromRight
reached[leftID] |= fromLeft
// create an hypothetical commit with given nodes as parents, and insert it to the queue
heap.Push(&queue, &graveler.CommitRecord{
Commit: &graveler.Commit{Parents: []graveler.CommitID{leftID, rightID}},
})
commit, err := getCommitAndEnqueue(ctx, getter, &queue, repositoryID, leftID)
if err != nil {
return nil, err
}
if leftID == rightID {
return commit, nil
}

_, err = getCommitAndEnqueue(ctx, getter, &queue, repositoryID, rightID)
if err != nil {
return nil, err
}
Comment on lines +29 to +40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this part?
Wouldn't it work with the hypothetical commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnnyaug - The fix is relying on the reached map to identify commit nodes that were already visited. Since the initial state marks the 2 target commit as reached, they will not be queued for further traversal when "reached" from the hypothetical node, so I had to enqueue both of the nodes, as part of the initial state

for {
if queue.Len() == 0 {
return nil, nil
}
commitRecord = heap.Pop(&queue).(*graveler.CommitRecord)
commitFlags := reached[commitRecord.CommitID]
if commitFlags&fromLeft != 0 && commitFlags&fromRight != 0 {
// commit was reached from both left and right nodes
return commitRecord.Commit, nil
}
for _, parent := range commitRecord.Parents {
parentCommit, err := getter.GetCommit(ctx, repositoryID, parent)
if err != nil {
return nil, err
if _, exist := reached[parent]; !exist {
// parent commit is queued only if it was not handled before. Otherwise it, and
// all its ancestors were already queued and so, will have entries in 'reached' map
_, err := getCommitAndEnqueue(ctx, getter, &queue, repositoryID, parent)
if err != nil {
return nil, err
}
}
heap.Push(&queue, &graveler.CommitRecord{CommitID: parent, Commit: parentCommit})
// mark the parent with the flag values from its descendents:
// mark the parent with the flag values from its descendents. This is done regardless
// of whether this parent commit is being queued in the current iteration or not. In
// both cases, if the 'reached' update signifies it was reached from both left and
// right nodes - it is the requested parent node
reached[parent] |= commitFlags
if reached[parent]&fromLeft != 0 && reached[parent]&fromRight != 0 {
// commit was reached from both left and right nodes
// Since it is not certain that the commit exist in the commits queue, a call
Copy link
Contributor

@talSofer talSofer Feb 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itai-david Can you give an example when the parent commit is not in the queue at this point?
at the first time the parent is reached it will be enqueued in line 51.

@johnnyaug @arielshaqed you may know the answer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is misleading - it can be removed.
We simply don't have the commit in memory at this point, so GetCommit is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we cache commits in memory, it will be. Otherwise we might have read it or not, depending on the structure of the graph. Agree wtih @johnnyaug the comment can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any comment is needed - it confuses more than it clears anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment, thanks

// to GetCommit is issued
commit, err := getter.GetCommit(ctx, repositoryID, parent)
if err != nil {
return nil, err
}
return commit, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
commit, err := getter.GetCommit(ctx, repositoryID, parent)
if err != nil {
return nil, err
}
return commit, nil
return getter.GetCommit(ctx, repositoryID, parent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
}
}

func getCommitAndEnqueue(ctx context.Context, getter CommitGetter, queue *CommitsGenerationPriorityQueue, repositoryID graveler.RepositoryID, commitID graveler.CommitID) (*graveler.Commit, error) {
commit, err := getter.GetCommit(ctx, repositoryID, commitID)
if err != nil {
return nil, err
}
heap.Push(queue, &graveler.CommitRecord{CommitID: commitID, Commit: commit})
return commit, nil
}