gix upgrade and optimizations #1081

Byron · 2023-06-09T19:44:20Z

The latest improvements make more data available during commit iteration,
which makes some improvements possible.

Please note that CI won't be green for a while until gix v0.46 was actually released.

Tasks

upgrade to gix v0.46
optimize single-threaded case (without real-live improvements :/)
optimize case where repo has a commit-graph for massive performance gains

Byron · 2023-06-10T08:56:07Z

It was fund implementing it and this kind of optimization isn't even present in ein t h, so I think it's safe to say that onefetch will be the fastest in obtaining commits-by-author if a commit-graph cache exists.

Without a commit-graph, it takes ~13s to run onefetch on the linux kernel, and with commitgraph it now takes ~3.8s (at the same memory consumption). That's not quite linear scaling, but scaling worth having. The effective parallelism seems to be 5 cores of the 8.5 it could have, but it's easy to forget that there is also other workload that might not parallelize this well. Separate measurements shows that just obtaining commit metrics takes 10.3s without commitgraph, and 1.43s with commitgraph, a 7.2x speedup :).

❯ hyperfine --warmup 1 /Users/byron/dev/github.com/o2sh/onefetch/onefetch-pre-commitgraph /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: /Users/byron/dev/github.com/o2sh/onefetch/onefetch-pre-commitgraph
  Time (mean ± σ):     12.896 s ±  0.186 s    [User: 24.748 s, System: 2.674 s]
  Range (min … max):   12.685 s … 13.354 s    10 runs

Benchmark 2: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean ± σ):      3.945 s ±  0.151 s    [User: 16.775 s, System: 2.845 s]
  Range (min … max):    3.775 s …  4.263 s    10 runs

Summary
  /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch ran
    3.27 ± 0.13 times faster than /Users/byron/dev/github.com/o2sh/onefetch/onefetch-pre-commitgraph

The latest improvements make more data available during commit iteration, which makes some improvements possible. Doing so potentially saves some time, even though it's not visible in practice.

…t commit information. In practice, this change is unlikely to be noticeable, especially since `--no-bots` is not enabled by default. However, if it is, bot commits will now contribute to the most recent commit times and the first commit times, which will help if the only commit is a bot commit. It also helps with allowing an upcoming multi-threaded implementation that leverages the commit-graph which in turn will greatly speed up information retrieval in large repositories.

This is in preparation for creating a multi-threaded version of it.

Byron · 2023-06-10T14:29:55Z

Please note that if there is a small repo with commitgraph, then there might not be much time at all to compute churn. In case of gitoxide with ~11k commits, it can compute only three deltas 😁. To my mind that's fine as it's unlikely to have small repos with commitgraph anyway as it usually isn't worth it.

- Extract commit graph traversal logic out of CommitMetrics constructor - Some renaming and simplified logic

- Extrat commit graph traversal logic out of CommitMetrics's constructor - renaming and simplified logic

o2sh

Just learned about Git Commit Graph through this PR. Powerful stuff! Glad gitoxide utilizes it.

I tried to refactor the code a bit as the CommitMetrics constructor was becoming quite overwhelming. If you could please take look to make sure I haven't broken anything 😅 .

I left a few questions for you in the comments.

Without a commit-graph, it takes ~13s to run onefetch on the linux kernel, and with commitgraph it now takes ~3.8s (at the same memory consumption).

That's very impressive!

src/info/utils/git.rs

src/info/git/mod.rs

* unify logic to update number of commits by author. More notes related to the `use_commit_graph()` call: I was thinking about this API for a while, and it's the easiest way to configure this even though it seems redundant. Let me explain, maybe you can find a better solution. In an even more recent version, `use_commit_graph()` doesn't default to `true`, but defaults to the value of `core.commitGraph`, which in turn defaults to true. We, however, always want to use the commitgraph, under the condition that we can use enough threads *and* actually have a commit-graph. Now, one would think that `with_commit_graph` is enough to get what we want, but since `use_commit_graph` generally defaults to `true` under the hood it would make another attempt to open the commit-graph, even though there might not be one. This leads to the commit-graph detection to be run twice in the case there is none. Thus we explicitly set our verdict in `use_commit_graph` to avoid that kind of work. Does it matter, you might ask? Probably not on today's SSDs, a few `stat` calls don't matter, so if you want, you can remove the `use_commit_graph` call. By default, I want an API that allows to avoid any waste, which is why it's implemented that way. ---- More notes related to the overcommit of threads There isn't really a such a thing like exhausting threads, but indeed this implementation will overcommit a little bit for N threads: * N threads for author computation (author) * 1 thread for commit-iteration and author aggregation (aggregator) * 1 threads for churn computation (churn) The aggregator thread has to wait for (author) threads and won't be busy doing its work - thanks to the commit-graph these walks are very fast. It won't actually consume much CPU. The churn thread is the one that runs just as hot if not hotter than the author threads, and I intend to let these contend a little as typically a little bit of over-commit eeks out a little more performance in practice. This is due to IO still playing a role here and once a thread is blocked, it's good to have a choice of more threads to run. Further, the `churn` thread is operating on a best-effort basis so to me its fine to drown it out a little, if the OS decides to do that, but in turn it gets a little additional time while the author threads are joined, so it should even out. Fearless concurrency is fun :).

src/info/git/mod.rs

vercel bot deployed to Preview June 9, 2023 19:44 View deployment

Byron force-pushed the optimizations branch from 16dbf52 to 634004c Compare June 10, 2023 09:10

vercel bot deployed to Preview June 10, 2023 09:10 View deployment

Byron added 3 commits June 10, 2023 11:12

upgrade to latest gix v0.46 and make use of free traversal state.

7fca1da

The latest improvements make more data available during commit iteration, which makes some improvements possible. Doing so potentially saves some time, even though it's not visible in practice.

refactor retrieval of commit graph information to make output clear

81b41ae

This is in preparation for creating a multi-threaded version of it.

Byron force-pushed the optimizations branch from 634004c to 81b41ae Compare June 10, 2023 09:14

vercel bot deployed to Preview June 10, 2023 09:14 View deployment

Byron marked this pull request as ready for review June 10, 2023 09:29

Byron requested review from spenserblack and o2sh as code owners June 10, 2023 09:29

Byron force-pushed the optimizations branch from 81b41ae to e9c14a8 Compare June 10, 2023 13:51

vercel bot deployed to Preview June 10, 2023 13:51 View deployment

refactor retrieval of commit graph information to make output clear

058adee

This is in preparation for creating a multi-threaded version of it.

Byron force-pushed the optimizations branch from e9c14a8 to 058adee Compare June 10, 2023 14:28

vercel bot deployed to Preview June 10, 2023 14:28 View deployment

refactoring

03d9095

- Extract commit graph traversal logic out of CommitMetrics constructor - Some renaming and simplified logic

vercel bot deployed to Preview June 12, 2023 15:29 View deployment

refactoring

61d4982

- Extrat commit graph traversal logic out of CommitMetrics's constructor - renaming and simplified logic

vercel bot deployed to Preview June 12, 2023 15:31 View deployment

o2sh approved these changes Jun 12, 2023

View reviewed changes

minor

fdd4f05

vercel bot deployed to Preview June 12, 2023 16:07 View deployment

Merge branch 'main' into optimizations

444c25c

vercel bot deployed to Preview June 12, 2023 16:27 View deployment

cargo fmt

0558680

vercel bot deployed to Preview June 12, 2023 16:28 View deployment

remove is_shallow property from GitMetrics

4747e2b

vercel bot deployed to Preview June 12, 2023 16:32 View deployment

Single thread for churn and author summary on main thread

3eac83d

vercel bot deployed to Preview June 12, 2023 22:08 View deployment

o2sh reviewed Jun 12, 2023

View reviewed changes

src/info/utils/git.rs Outdated Show resolved Hide resolved

process author sig in seperate thread

af5433f

vercel bot deployed to Preview June 12, 2023 22:44 View deployment

revert

feecc23

vercel bot deployed to Preview June 12, 2023 22:57 View deployment

o2sh reviewed Jun 12, 2023

View reviewed changes

src/info/git/mod.rs Outdated Show resolved Hide resolved

vercel bot deployed to Preview June 13, 2023 08:51 View deployment

spenserblack reviewed Jun 13, 2023

View reviewed changes

src/info/git/mod.rs Show resolved Hide resolved

explicitly call use_commit_graph and with_commit_graph

876e6a5

vercel bot deployed to Preview June 13, 2023 21:27 View deployment

o2sh merged commit 09c4dc9 into o2sh:main Jun 13, 2023

o2sh mentioned this pull request Jun 13, 2023

improve bot regex #1086

Merged

Byron deleted the optimizations branch June 14, 2023 06:25

spenserblack mentioned this pull request Jun 22, 2023

Errs when invoked on partial clones #1092

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gix upgrade and optimizations #1081

gix upgrade and optimizations #1081

Byron commented Jun 9, 2023 •

edited

Loading

Byron commented Jun 10, 2023

Byron commented Jun 10, 2023

o2sh left a comment •

edited

Loading

gix upgrade and optimizations #1081

gix upgrade and optimizations #1081

Conversation

Byron commented Jun 9, 2023 • edited Loading

Tasks

Byron commented Jun 10, 2023

Byron commented Jun 10, 2023

o2sh left a comment • edited Loading

Choose a reason for hiding this comment

Byron commented Jun 9, 2023 •

edited

Loading

o2sh left a comment •

edited

Loading