Accelerate FindMergeBase by caching commits #2981

talSofer · 2022-02-27T16:04:33Z

This is a copy of @arielshaqed 's comment

#2968 is very nice :-)

But it still performs many GetCommit calls, these are unnecessarily slow and in fact may be cached so are probably unnecessary.

GetCommit is slow

This is intentional: this call is batched. This batching makes sense when performing other operations (see @tzahij's blog post) because they involve concurrency at load. For instance, when a large-scale Spark job starts reading from lakeFS, it performs multiple concurrent reads, and each fetches the same HEAD commit. Batching means responses arrive later than they might.

But batching does not make sense in FindMergeBase: this function sequentially fetches commits, so if we batch the commits we slow each iteration down. One speedup might be to pass this function an unbatched CommitGetter.

But we can do even better...

GetCommit should be cached

There can be many thousands of commits. However, a quick inspection of graveler.go suggests that the relation graveler_commits is never updated other than by FillGenerations / LoadCommits. Commits are immutable.

So we can cache commits. Every call to GetCommit should go through a cache. The cache can (easily) have size 10_000, and will hold all relevant commits in memory. FillGenerations can simply invalidate the cache :-) , it is anyway only used by restore-refs. And, if we use our cache.Cache then we don't even need to batch: concurrent GetCommit operations only call out once to the database, and the result is cached for later operations.

arielshaqed · 2022-03-17T11:42:35Z

Just "versioning". You lot get to have all the fun on this one.

github-actions · 2023-11-01T14:47:18Z

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

github-actions · 2023-11-12T01:50:09Z

Closing this issue because it has been stale for 7 days with no activity.

arielshaqed · 2023-11-14T17:56:08Z

This has actually been the case for a while, now. :-)

nopcoder added team/versioning-engine Team versioning engine team/ecosystem Team Ecosystem labels Mar 16, 2022

arielshaqed removed the team/ecosystem Team Ecosystem label Mar 17, 2022

github-actions bot added the stale label Nov 1, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate FindMergeBase by caching commits #2981

Accelerate FindMergeBase by caching commits #2981

talSofer commented Feb 27, 2022

arielshaqed commented Mar 17, 2022

github-actions bot commented Nov 1, 2023

github-actions bot commented Nov 12, 2023

arielshaqed commented Nov 14, 2023

Accelerate FindMergeBase by caching commits #2981

Accelerate FindMergeBase by caching commits #2981

Comments

talSofer commented Feb 27, 2022

GetCommit is slow

GetCommit should be cached

arielshaqed commented Mar 17, 2022

github-actions bot commented Nov 1, 2023

github-actions bot commented Nov 12, 2023

arielshaqed commented Nov 14, 2023