Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate project remote dependencies #2761

Merged
merged 7 commits into from
May 3, 2024
Merged

Conversation

paulcadman
Copy link
Collaborator

@paulcadman paulcadman commented Apr 30, 2024

Goal

The goal of this PR is to deduplicate all dependencies in a Juvix project.

Two dependencies are identical when:

  • For path dependencies, their paths are equal
  • For git dependencies, their URL and their resolved revision (i.e the git revision hash after resolving a tag) are equal

For example in the following dependency tree, where each of the named dependencies represent identical git dependencies.

MyPkg
|
|-- Dep1-hash1
|   |
|   |-- Dep2-hash2
|   |   |
|   |   `-- Stdlib-hash3
|   |
|   `-- Stdlib-hash3
|
|-- Dep2-hash2
|   |
|   `-- Stdlib-hash3
|
`-- Stdlib-hash3

The project MyPkg should just contain the following dependencies: Dep1-hash1, Dep2-hash2, Stdlib-hash3.

Design

Storage of transitive dependencies

Currently the transitive dependencies of a project are fetched/stored in .juvix-build directories of the corresponding dependencies. After this PR all dependencies, including transitive ones are stored in the .juvix-build directory of the root project.

Again, assuming that all the transitive dependencies have the same git hash, in the file system we label the dependencies with their git hash.

MyPkg
|
`- .juvix-build
          |- Dep1-hash1
          |- Dep2-hash2
          `- Stdlib-hash3

Say we have two versions of Dep2 in the transitive dependency graph:

MyPkg
|
|-- Dep1-hash1
|   |
|   |-- Dep2-hash2
|   |   |
|   |   `-- Stdlib-hash3
|   |
|   `-- Stdlib-hash3
|
|-- Dep2-hash4
|   |
|   `-- Stdlib-hash3
|
`-- Stdlib-hash3

we would have two copies of Dep2 in the .juvix-build directory with different hashes:

MyPkg
|
`- .juvix-build
          |- Dep1-hash1
          |- Dep2-hash2
          |- Dep2-hash4
          `- Stdlib-hash3

Storage of git clones

As a consequence of this design we cannot store the git clones for each dependency in the .juvix-build directory as we do now.

We now store the git clones in a global directory ~/.config/juvix/0.6.1/git-cache.

When a dependency at a particular revision is required, the global git clone is fetched/checked out at the required revision and copied into the .juvix-build directory of the relevant project.

Naming of git clones

The requirement for the naming of the global git clones is that they can be identified by URL.

In this PR the name of a clone is formed by taking the SHA256 hash of the dependency git URL. This is to avoid issues with file-system safe escaping of characters.

Naming of dependency directories

The requirement for the naming of the dependency directories is that they can be identified by URL.
/ revision in accordance with our definition of identical dependencies.

In this PR the name of a clone is formed by taking the SHA256 hash of the concatenation of the dependency git URL and git revision. This is to avoid issues with file-system safe escaping of characters.

The downside of this approach is that it's hard to see which directories correspond to which dependencies when navigating the filesystem. However, navigating using the Juvix tooling by using go-to-definition etc. will continue to work as before.

Benchmarks

I tested using juvix-containers test Main.juvix.

The following benchmarks show timings excluding the initial clone of dependencies (which happens in the warmup run).

Before:

$ juvix clean && juvix clean -g
$ hyperfine -w 1 'juvix compile native Main.juvix'
Benchmark 1: juvix compile native Main.juvix
  Time (mean ± σ):      5.598 s ±  0.410 s    [User: 5.020 s, System: 0.586 s]
  Range (min … max):    5.106 s …  6.382 s    10 runs

After:

$ juvix clean && juvix clean -g
$ hyperfine -w 1 'juvix compile native Main.juvix'
Benchmark 1: juvix compile native Main.juvix
  Time (mean ± σ):      4.418 s ±  0.241 s    [User: 4.083 s, System: 0.343 s]
  Range (min … max):    4.237 s …  4.927 s    10 runs

The time saved is due to the fact that before the project depends on 2 copies of the stdlib and after the project depends on 1 copy of the stdlib.

Time is also saved in the initial run because the stdlib is only cloned once instead of twice. The cached stdlib clone is also shared between all project which will improve the performance of all projects that use the stdlib.

Closes

@paulcadman paulcadman added this to the 0.6.2 milestone Apr 30, 2024
@paulcadman paulcadman self-assigned this Apr 30, 2024
@paulcadman paulcadman force-pushed the git-cache-dependencies branch 2 times, most recently from 96ec581 to e328caa Compare May 1, 2024 09:16
@paulcadman paulcadman changed the title Use a global cache of git dependency clones Deduplicate project remote dependencies May 1, 2024
@paulcadman paulcadman marked this pull request as ready for review May 1, 2024 18:22
@paulcadman paulcadman force-pushed the git-cache-dependencies branch from 43c5944 to bf2dd26 Compare May 1, 2024 18:57
paulcadman added 6 commits May 3, 2024 10:05
Use the hash of the URL and revision of remote dependencies as the
dependency directory so that there will only be one directory for
each (URL, revision) pair in the project (including transitive dependencies).
The git clones are now stored in the global configuration
@paulcadman paulcadman force-pushed the git-cache-dependencies branch from bf2dd26 to a779c30 Compare May 3, 2024 09:05
@paulcadman paulcadman requested a review from janmasrovira May 3, 2024 09:05
@paulcadman paulcadman merged commit 640d96e into main May 3, 2024
4 checks passed
@paulcadman paulcadman deleted the git-cache-dependencies branch May 3, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants