[March]: status()
iterator and onefetch
without git2
#1351
Byron
announced in
Progress Update
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This month felt way more productive than the last which was dominated by sickness unfortunately. However, there was a lot going on so I certainly spent less time on
gitoxide
because of it. Part of that is quite relevant though, and if all goes well I will be able to share more next month.Status - 66% done!
It's nothing new that
gitoxide
recently finished thegix-dir
crate to traverse a directory and find untracked files, and that it could find tracked files that have been modified as well. That's two thirds ofgit status
, with the difference between a tree and the index notably missing.But what didn't exist is that both can work together, in parallel, while also optionally allowing for rename tracking.
This capability has now been implemented in the
gix-status
crate, and in such a way that all data can still be referenced. The tradeoff here is that the API is clearly plumbing level, and users must implement aDelegate
trait in order to receive the data.To make that a bit easier to use, especially with a pre-implemented delegate which just collects all data by copying it, the
gix::Repository
type now provides a method to drive that functionality as well. On top of that, because everything is always more complicated, thegix
abstraction level also provides status information on Submodules while respectingsubmodule.<name>.ignore
to determine how much effort is spent there.submodule.name.active
is of course respected as well (which turned out to be a bug as Git doesn't do that). Something that's very notable and noticeable is the performance difference of the implementation - in big repositories like the one of Rustc with plenty of submodules,gix status
(66% of it) can easily be 40% faster than Git. It's probably attributable to aggressive parallelisation as a parallel index check may see submodules which then run a parallel status check to see what changed.Repository::status()
with index to worktree iteratorAs mentioned before, using a
Delegate
that needs implementation is always a bit cumbersome, and not quite worthy of the highest abstraction level that isgix
. It does, however, have the advantage that data can be referenced despite a fair share of parallelism. Thus, such an API must remain for those who want to eek out every last bit of performance or reduce load on the memory allocator.In many cases though, these optimisations aren't needed, and readable code is of greater value. For that reason, there is now an iterator that provides copies of the data obtained by a delegate. It's made so it doesn't only support progress reporting, but can also integrate with external interrupt flags or when omitted, wires up its own interrupt flag that triggers as the iterator drops. That way, the execution that happens in its own thread to enable the lazy evaluation of the iterator will be interrupted automatically, while also decoupling the production of items with their consumption from a performance perspective, possibly yielding better performance than a typical
Delegate
implementation would.Dirwalk iterator
Now that the architecture of such iterators was explored, it seemed only natural to leverage it once more and apply it to the directory walk. Previously, it was only available as
Repository::dirwalk()
which needs aDelegate
, but now it's also available asRepository::dirwalk_iter()
with the same features as described above.It's used to great effect in the Cargo PR, discussed further down, which otherwise would have been far more complicated to implement.
is-dirty check
With two thirds of a
status
implementation, I thought it's worth providing a near-complete implementation ofis_dirty()
to enable typical usages like ingit describe --dirty
, orgix commit describe --dirty
in my case. Thanks to the iterator API, it was very easy to implement akin to 'status().next().is_some()`, fully leveraging that the operation will be aborted once the iterator drops after the first change was found.Community
Thanks to the improved status, and
gix-dir
as it turned out, I was able to reach out again and bring the existinggitoxide
integrations much further.Memory maps - the right way!
When reading the corresponding issue I was amazed of the environments that
gitoxide
is currently conquering - it's so complex and 'virtual' that I wouldn't dare to try explain it here, knowing I wouldn't even get close to what it really is.Long story short, it turns out that
gitoxide
was creating memory maps in a way that used the wrong creation flags, which differed from what Git was doing as well. Truth to be told, I never noticed this even was the case, it seemed all the same to me.But semantics are different in certain virtualized environments, and fortunately the venerable
mmap2
crate thatgitoxide
uses to abstract such details has just the right function to fix this issue for good.Onefetch, now without
git2
!After nearly 2 years (!!) the integration of
gitoxide
is now finally done with the last PR which manages to removegit2
entirely. The last bastion ofstatus
, and fortunatelyonefetch
was never using the 'head-index' diff whichgitoxide
is still lacking. The performance difference is quite stark ranging from 3% faster when runningonefetch
in the Linux repository, 11% faster in the Git repository, and finally, 23% faster in the WebKit repository.This is probably possible as
gitoxide
overall is more parallel, running the index modification check while the directory walk is active.In any case, this is one of the few, maybe the first, project that achieves this, and I hope there will be many more to come.
Gix in Cargo
Despite still being on the way to have a fully functional
status
implementation which could power the next PR, coincidentally I found a niche application forgix-dir
where usinggitoxide
will solve a real problem. The PR also helped to greatly improve the usability of the provided APIs ingitoxide
, which I am always very thankful for.It's definitely exciting to finally be able to integrate new features again, and with
status
and laterreset
I hope to be able to make big strides.Cheers
Sebastian
PS: The latest timesheets can be found here (2024).
Beta Was this translation helpful? Give feedback.
All reactions