-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New version of ocaml-git #395
Conversation
It seems that the current status of this PR can perfectly works with
And I think it's done but this PR seems to be a good bedrock for the release of |
- ENCODER / DECODER These interfaces are used to describe a non-blocking interface. From a new perspective, these interfaces are useless where `angstrom` and `encore` are sufficient to obtain a non-blocking encoder/decoder (see [Angstrom.state] and [Encore.Lavoisier.state]) - META A special interface (with an argument) to expose the format of a Git object from a /meta/ syntax (the module argument). However, since `encore.0.6`, this way is not used anymore (a GADT is used instead) - DESC A description of the produced format from a /meta/ decoder language such as `angstrom` or a /meta/ encoder language such as `encore.lavoisier`. Due to `encore.0.6`, such interface is not used anymore. Instead, each Git object should provide a [val format : t Encore.t] which can derive to an `angstrom`'s parser or a `lavoisier`'s encoder. - INFLATE / DEFLATE Due to the fact that Zlib is used by a middle layer of Git (eg. PACK file), it's unnecessary to /functorize/ `ocaml-git` over these interfaces. - FILE / DIR / MAPPER / FS The first entry-point of this PR, deletion a needed implementation of a file-system. By this way, such POSIX-close interfaces must be removed!
- null: [= digest_string ""], this value is used to initialise some values such as an [Hash.t array] with an impossible value (Git should __never__ create an Git object with the hash [digest_string ""] - length: a better name for [digest_size] - feed: be able to feed a [Bigstringaf.t] value
204 commits after, I think it's done. |
bb93091
to
2e518b0
Compare
The interface will include only: - S.DIGEST - S.BASE The interface defines only one new type [hash] - and [Make] constraints it. A type [t] exists outside the /functor/ and it is reused by the /functor/.
We define a type [t] which represents the Git Blob object. Independently of the hash implementation used, the object exists. Some functions are exposed to manipulate it (with a documentation).
The interface will include only: - S.DIGEST - S.BASE - Some function to manipulate a tree - a [format] value which describes the format a Git Tree object The interface defines only one new type [hash] - and [Make] constraints it. A type [t] exists outside the /functor/ and it is reused by the /functor/.
We define a type [t] which represents the Git Tree object. Independently of the hash implementation used, the object exists (it is parameterized by ['hash]). Some functions are exposed to manipulate it.
The description of the Tree format is represented by an [encore]'s value [format] (and exposed by the interface). This patch is a translation from the /meta-syntax/ to [encore]'s combinators.
This module is a replacement of old [Helper] module. It provides a way to calculate the hash of a Git object given by its OCaml representation. The way to calculate the hash is: 1) serialise the Git object 2) start with a /header/ ([kind length\000]) 3) feed the context with the serialised Git object A Git object can be big. Instead to entirely serialise it, we /stream/ it to limit the memory footprint. NOTE: the memory footprint is __not__ really limited (see [test/tree/test] to understand why) but, at least, the serialised value is cut to many /small/ [string].
NOTE we don't need temporary buffers anymore to calculate the hash of a Git object.
|
… git-unix (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
…t-unix and git-mirage (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
…t-unix and git-mirage (3.0.0) CHANGES: - Rewrite of `ocaml-git` (@dinosaure, mirage/ocaml-git#395) - Delete useless constraints on digestif's signature (@dinosaure, mirage/ocaml-git#399) - Add support of CoHTTP with UNIX and MirageOS (@ulugbekna, mirage/ocaml-git#400) - Add progress reporting on fetch command (@ulugbekna, mirage/ocaml-git#405) - Lint dependencies on packages (`git-cohttp-unix` and `git-cohttp-mirage`) and update to the last version of CoHTTP (@hannesm, mirage/ocaml-git#407) - Fix internal `Cstruct_append` implementation (@dinosaure, mirage/ocaml-git#401) - Implement shallow commit (@dinosaure, mirage/ocaml-git#402) - Update to `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#408) (deleted by the integration of `mimic`) - Delete use of `ocurl` (@dinosaure, mirage/ocaml-git#410) - Delete the useless **old** `git-mirage` package (@hannesm, mirage/ocaml-git#411) - Fix about unresolved endpoint with `conduit.3.0.0` (@dinosaure, mirage/ocaml-git#412) - Refactors fetch command (@ulugbekna, mirage/ocaml-git#404) - Fix ephemerons about temporary devices (@dinosaure, mirage/ocaml-git#413) - Implementation of `ogit-fetch` as an example (@ulugbekna, mirage/ocaml-git#406) - Rename `nss` to `git-nss` (@dinosaure, mirage/ocaml-git#415) - Refactors `git-nss` (@ulugbekna, mirage/ocaml-git#416) - Update README.md (@ulugbekna, mirage/ocaml-git#417) - Replace deprecated `Fmt` functions (@ulugbekna, mirage/ocaml-git#421) - Delete physical equality (@ulugbekna, mirage/ocaml-git#422) - Rename `prelude` argument by `uses_git_transport` (@ulugbekna, mirage/ocaml-git#423) - Refactors Smart decoder (@ulugbekna, mirage/ocaml-git#424) - Constraint to use `fmt.0.8.7` (@dinosaure, mirage/ocaml-git#425) - Small refactors in `git-nss` (@dinosaure, mirage/ocaml-git#427) - Delete `conduit.3.0.0` and replace it by `mimic` (@dinosaure, mirage/ocaml-git#428) - Delete the useless `verify` function on `fetch` and `push` (@dinosaure, mirage/ocaml-git#429) - Delete `pin-depends` on `awa` (@dinosaure, mirage/ocaml-git#431)
The new version of
ocaml-git
The initial goal of this PR is to MirageOS-ize
ocaml-git
. Indeed, if youlook into details, the current implementation of
ocaml-git
used by MirageOS isthe
Mem
implementation which is a simpleHashtbl.t
(and only needs an hashalgorithm and the caml-runtime).
This PR wants to provide a possible other way to use
ocaml-git
with MirageOS.So, the main problem is the needed implementation to
Make
a Git store which iscurrently too POSIX-compliant.
Side-effect
However, the PR takes the opportunity to update and fix bugs which are
intrinsic:
Underlying needed layout
May be 2 years ago, I started to think the Git store as 2 spaces where:
the first one should contains recent objects (possibly volatile)
As know as loose objects - these objects take the opportunity of the
underlying file-system to store/search Git objects. The layout is close to a
simple radix tree over the hex-representation of the used hash algorithm
where:
the second one should contains lifelong objects
As know as pack files, which contains several objects.
Let's talk about minor (loose) and major (pack) heaps. From that, what these
spaces needs for
Unix
world and/or MirageOS world?For the minor heap, it should simple as it needs:
Where
append
/appendv
(atomic/non-atomic) create and fill the objectuid
intot
(which is a representation of the minor heap given by the user).For the UNIX world, a machinery of several syscalls is needed (
stat
,create
,write
andclose
) and for MirageOS, we still able to use a simpleHashtbl.t
or something better (about memory consumption/performance). But thereal constraint to fit into both worlds is:
As we said, for the Unix world, Git considers the file-system as a radix tree
where paths (keys) are the hash of the Git object.
For the major heap, it is a bit more complex where we can have several PACK
files to store several objects. Then, the indexation of these objects is done by
an
*.idx
file.So we can represent this space with:
By this interface, we assume that the creation of a PACK file (which contains
several Git objects) and the way to fill it should not be atomic (despite the
minor heap).
This interface is close to POSIX (but less close than what we currently have).
However, we can assume this interface as an Append-only interface. Again, this
interface can easily be replaced by a simple
Hashtbl.t
or something better.For the Unix world, we can take the opportunity to use
Unix.O_APPEND
.By this new design, the
Store
implementation of Git can easily fit into aMirageOS without a huge requirement (as before when a real file-system was
needed).
However, an other space with some specific requirements exists. It's about the
way to store references in Git. Into details, this area is mutable (instead of
Minor
andMajor
and should ensure the {i atomicity} when we want to test andset a reference - similar to the [CAS][cas] atomic operation).
From all of these spaces, I think it's better to localise an error and to trace
what a simple
Git.read
/Git.write
really does over these spaces.Git_unix
provides these spaces according the layout of a Git repository. And, even if in
reality these spaces work on a large common space (the file-system), we can
containerise them each others if we want.
New comers
Carton
May be 2 or 3 years ago, the idea to extract the design of the PACK file to be
usable by something else than Git came over the
carton
project. Withdifferent iterations, the API was fixed one year ago and the plan to integrate
it into
ocaml-git
was planified.The main goal of this sub-project is:
it is the possibility to load a PACK file into an unikernel with
caravan and have an other implementation of a read-only KV-store
for MirageOS.
By design,
carton
needs only themap
syscall to read an object and theappend
syscall to generate a PACK file. It takes the opportunity to test thetype ('a, 's) io
more deeply (see limitation of such design, etc.) and itseems clear that the result is good enough to:
ocaml-git
(not so) Minor updates
carton
leads the update ofdecompress.1.0.0
andduff.0.3
where:decompress
fixed many bugs about the inflation/deflationand the process is faster than before. See these articles about
decompress
duff
fix the support of 32-bits to be able to usethis library (and by transitivity
ocaml-git
) into some exotics architecturesTests over the PACK file
Of course, due to the separation between the Git's logic and the PACK file, we
are able to focus our tests over the format of the PACK file independently Git
assumptions (format of Git objects, hash algorithm, layout of Git repository).
Some fuzzers found into the official Git project was added to keep same
assumptions and the update take the opportunity to fix some bugs about the /PACK
engine/. All tests are available into
test/carton/
directory.The intrinsic possibility about
ocaml-git
Due to the requirement of
carton
to be able to decode/encode a PACK file, thenew design on top of
carton
unlock the ability to reduce the definition of theMajor heap to the signature given above.
Loose object
Because the question about the PACK file is, now, resolved by
carton
, weeasily can /formalise/ the way to extract a /loose/ object. Internally,
ocaml-git
comes with a new sub-librarygit.loose
which has 3 derivations:git.loose-lwt
git.loose-git
git-unix.loose-unix
This sub-library (as
carton
) unlocks the ability to shape this layout into theMinor interface given above. Of course, it adds the ability, again, to test this
part of
ocaml-git
without Git assumptions - where the layout is only aradix-tree of deflated objects.
Encore
A new release of [
encore
][encore] is available where the API of this libraryis better than before. The question of
encore
is: how to produce an encoderand a decoder from a common description.
The new API take the opportunity of GADTs to propose a DSL to describe a format.
From it, we are able to derive an
angstrom
parser or alavoisier
encoder.
From this update, I did not get any regressions from tests and the encoder was
simplified to focus on the initial goal of
encore
: ensure the /isomorphism/between the encoder and the decoder.
This update takes the opportunity to fix a bug about
ocaml-git
when we needs to extract a large object. A test was added to ensure
that we properly fix the problem.
Finally, the update of
encore
unlocks the ability to compileocaml-git
withjs_of_ocaml
and fix the issue about that.Conduit
See conduit about that.
Not So Smart (nss)
Since the version 2 of
ocaml-git
, I discovered several bugs about the way topush
orpull
a Git repository. Even if in most of the case,ocaml-git
works, it appears that the negotiation engine does something wrong.
I decided to rewrite it and fix problems about the negotiation engine.
Then, according the work from @hannesm, I decided to properly integrate a way to
use SSH (with
aws-ssh
). Of course, on this way, the new version ofconduit
helps me to do what I want.
But the biggest change is to delete the duplicate between the TCP, the HTTP and
the SSH implementation of the Smart protocol. Indeed, even if Git does the
same when it wants to
push
/fetch
, some details exist and the currentversion of
ocaml-git
already integrate some (not right) divergences betweenthe TCP and the HTTP implementation.
Restart from zero and focus on what the negotiation engine really does to be
able to use into any layered protocols was the goal. Thanks to
colombe
to give me the key about the right abstraction.Transparent integration with
ocaml-git
The Smart protocol wants to do only 2 things:
From these 2 tasks, the idea of the Git format, the layout of the store or more
generally the idea of a Git repository is outside the scope of the protocol.
nss
wants to provides only a way to get or send a PACK file from a context -by this way, requirements to do the negotiation are limited into few operations:
Again, the notion of a Git object is outside the implementation of the PACK file
(
carton
), sonss
does not need to know the format of a Git commit but onlythe way to get parents of a commit.
Then, from a set of commit, we should be able to create a PACK file (
push
).About the
fetch
operation, it is a bit more complex when we must analyse thePACK file to produce an index of it. But, again, all of these operations are
available outside Git's notions - and, of course, outside the Git scope.
Regression
Of course, the first goal of
nss
was to fix negotiation bugs and delete theduplicate between TCP, SSH and HTTP protocols. All previous regression tests was
added and works and all buggy situations such as this trouble was
added over all protocols (mostly to ensure a good behaviour of our negotiation
engine).
However, the negotiation engine of Git and
ocaml-git
is not welldefined/formalised. We can imagine an other perspective such a version 3 of the
Smart protocol to be able to
fetch
/push
- but this is not the goal of thisPR, it's definitely a cool and close goal for
ocaml-git
however.Performances
I did not do some benchmarks but the only update to
decompress.1.0.0
helps usabout performance of course. Then, the scheduling between the protocol process,
the reception of the PACK and the analyse of it (this what you do when you
git clone
) seems better. A macro benchmark tells to us that this new implementationis faster than before.
However, I did not have the time to benchmark all of that and mostly trust on
the work done on
decompress
to say that it's faster than before.Functor or not functor?
carton
,nss
orloose
were made into the same design, without the logic ofthe I/O scheduling. With this new view, functors are used in parsimony and
globally at the end of the development process to provide an easy-to-use API
over LWT or ASYNC.
More concretely, any types defined in these sub-libraries are outside the scope
of functors and their definitions don't depends on the I/O scheduling.
The Git core library follows this new design where the existence of the commit,
for example, does not depends from the application of a functor. The functor
only specialise the definition with the given new type.
At another layer, such as the Value module, we have less constraints
and it more easy for the compiler to infer a type equality even if we forget to
add a constraint.
In that case, every types provided by
git
and functions to manipulates them(without a knowledge of the hash algorithm used) are defined outside the scope
of functors.
Conclusion
I think this PR adds a lot of possibilities for MirageOS and it is a really
step-forward about performances and compatibilities with Git and its behaviour.
It paves the way for a better integration with MirageOS of course and open some
possibilities such as:
The split clears the way to add some others logic which are more close to Git
than the format of the PACK file, the loose file or the way to synchronise a Git
repository with a peer.
Finally, the Git core library is only about Git:
These pieces co-exist together but can be use separately.