Deduplicate project remote dependencies #2761
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal
The goal of this PR is to deduplicate all dependencies in a Juvix project.
Two dependencies are identical when:
For example in the following dependency tree, where each of the named dependencies represent identical git dependencies.
The project
MyPkg
should just contain the following dependencies:Dep1-hash1, Dep2-hash2, Stdlib-hash3
.Design
Storage of transitive dependencies
Currently the transitive dependencies of a project are fetched/stored in
.juvix-build
directories of the corresponding dependencies. After this PR all dependencies, including transitive ones are stored in the.juvix-build
directory of the root project.Again, assuming that all the transitive dependencies have the same git hash, in the file system we label the dependencies with their git hash.
Say we have two versions of
Dep2
in the transitive dependency graph:we would have two copies of
Dep2
in the.juvix-build
directory with different hashes:Storage of git clones
As a consequence of this design we cannot store the git clones for each dependency in the
.juvix-build
directory as we do now.We now store the git clones in a global directory
~/.config/juvix/0.6.1/git-cache
.When a dependency at a particular revision is required, the global git clone is fetched/checked out at the required revision and copied into the
.juvix-build
directory of the relevant project.Naming of git clones
The requirement for the naming of the global git clones is that they can be identified by URL.
In this PR the name of a clone is formed by taking the SHA256 hash of the dependency git URL. This is to avoid issues with file-system safe escaping of characters.
Naming of dependency directories
The requirement for the naming of the dependency directories is that they can be identified by URL.
/ revision in accordance with our definition of identical dependencies.
In this PR the name of a clone is formed by taking the SHA256 hash of the concatenation of the dependency git URL and git revision. This is to avoid issues with file-system safe escaping of characters.
The downside of this approach is that it's hard to see which directories correspond to which dependencies when navigating the filesystem. However, navigating using the Juvix tooling by using go-to-definition etc. will continue to work as before.
Benchmarks
I tested using
juvix-containers
testMain.juvix
.The following benchmarks show timings excluding the initial clone of dependencies (which happens in the warmup run).
Before:
After:
The time saved is due to the fact that before the project depends on 2 copies of the stdlib and after the project depends on 1 copy of the stdlib.
Time is also saved in the initial run because the stdlib is only cloned once instead of twice. The cached stdlib clone is also shared between all project which will improve the performance of all projects that use the stdlib.
Closes