Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Specify exact git commit of a dependency #2762

Open
RalfJung opened this issue Nov 26, 2016 · 11 comments
Open

[Feature Request] Specify exact git commit of a dependency #2762

RalfJung opened this issue Nov 26, 2016 · 11 comments

Comments

@RalfJung
Copy link

We are developing some rather fast-moving (as in, constantly breaking backwards compatibility) research projects and would like to sue opam to express and manage dependencies. The trouble is that generally, project B will need a specific git commit of project A. I hoped I could express that by writing in B's opam file:

depends: [
  "https://.../projectA.git#TheRightCommit"
]

However, that doesn't seem to work, the dependency is just ignored. Of course, for a local package I can simulate that by first pinning manually, but I'd like to have something that can automatically install the right versions of everything, for example for a CI build. This also has to work transitively; project A may have a similarly volatile dependency on project X, so in A's opam file I would like to give the right commit of X.

It would be great if opam could support pin-like notation in the dependency field, so that one can express a dependency on a particular git commit. (As another benefit, this would make it easy to depend on projects that are not available in any repository; one could just give their git URL. There is precedence for doing this e.g. in Rust's cargo.)

opam config report:

# OPAM config report
# opam-version    1.2.2 
# self-upgrade    no
# os              linux
# external-solver aspcud $in $out $criteria
# criteria        -count(removed),-notuptodate(request),-sum(request,version-lag),-count(down),-notuptodate(changed),-count(changed),-notuptodate(solution),-sum(solution,version-lag)
# jobs            4
# repositories    4* (http), 1 (version-controlled)
# pinned          1 (version), 2 (version control)
# current-switch  coq-8.5*
# last-update     2016-11-25 15:57
@AltGr
Copy link
Member

AltGr commented Nov 29, 2016

Thanks for the suggestion; indeed, managing development dependencies, and in general providing a way to share and reproduce consistent dev environments, is the last major concern that we are triving to solve before releasing opam 2.0. There are still open questions on how exactly to manage it, though, and input is welcome.

One issue with the suggested behaviour is that it supposes an open universe of packages, while opam relies on solving installation problems within a closed universe. What I mean by this is that opam supposes prior knowledge of the metadata of all packages (or at least their relationships, ie conflicts and dependencies) before attempting to solve the installation problem, and returning the best solution. The installability problem is NP-complete, and, AFAIK, package managers with a different policy don't handle conflicts or expressive dependencies.

Since dependencies, in this case, point to a single version, it could be possible to recursively fetch the metadata for all dependencies before the solver is called; this assumes we know for sure the exact initial package that we want to know the dependencies for and install, though: for example, if there are several versions of projectB and you don't specify the one you want, opam may not get the latest one if that's incompatible with your system, or requires downgrades, etc. — in this case, we would need to choose the version explicitely, before the solver can be called.

On the bright side, this is in part what opam pin already does: select packages and their metada in advance of calling the solver. Pinning is also the current way of setting up development environments — so an extension to pinning is what seems the best to me, at the moment. This also seems sound, as the kind of dependencies suggested are not supposed to appear in a repository.

There are generally two ways to handle development environment reproductibility: specific dependencies to URLs, as you suggest, and what npm or Cargo call "lock files".

For lock files, we have "export files" (opam switch export|import <file>), that can reproduce the state of a switch, including pinned package definitions. There is still progress to be made on how we handle them, though, for example, they were initially not designed for project-specific switches, and it would be nice to have a single command to initialise a project-specific switch, taking care of the export file if any.

As for the feature you are wishing for, I think it could fit into pinning with something like a "recursive mode", that would pin a given package, and, recursively, its dependencies. opam files found in package sources could allow a specific syntax for expressing that, that is either not allowed or ignored when in the repository, e.g.

depends: [ "projectA" { = "0.2.2@https://.../projectA.git#TheRightCommit" } ]

(see the current syntax)

When opam pinning project B, project A (and, recursively, its dependencies) would get pinned to the given target as well, reproducing the exact build environment that you expect. Of course, updating the remotes of pinned packages would get a bit more tricky: we would need to build their (current) dependency graph, and proceed in order, updating metadata and related pinning targets as we go.

Do you think such a feature would, in your case, fit the bill ?

Note re. « the dependency is just ignored »: opam 1.2.2 was lacking on error reporting in packages if you don't specify --verbose; that has been improved since, but beware that, in your case, opam found an invalid package name, and thus ignored the whole depends: field.

@RalfJung
Copy link
Author

RalfJung commented Nov 29, 2016

Great to hear that this is a concern of the opam developers! I have to say that my first impression of opam was really great, it seemed like finally someone designed a package manager "by developers, for developers" that still is so easy to use that one can tell end-users to use it for installing stuff. I never wrote a single line of OCaml, so I hadn't heard of it before, but now the Coq community is jumping on this train.
However, when it came to find a way to fix the very versatile dependencies that we have in our in-development libraries (basically, more fine-grained than versions), I got a little frustrated because this case doesn't seem well-supported. So, I'm more than happy to do my part in improving this :)
I am not surprised that there are quite some subtleties involved though that make my naive suggestions hard or infeasible.

For lock files, we have "export files" (opam switch export|import ), that can reproduce the state of a switch, including pinned package definitions. There is still progress to be made on how we handle them, though, for example, they were initially not designed for project-specific switches, and it would be nice to have a single command to initialise a project-specific switch, taking care of the export file if any.

One concern I have here is that I don't want to fix all dependencies. To extend my example above, both projectA and projectB are Coq projects, and both work with versions 8.5 and 8.6 of Coq. So I would like to have CI jobs that test this. The dependency of projectB on projectA is on a particular commit, but it should still be possible to have variations in other dependencies, like Coq.
(Cargo lock files in Rust are usually only put into the git repositories for applications that want to ensure that all developers work on something comparable; for libraries, really you want to be compatible with anything satisfying your dependencies, so the general advise is not to version control the lock file.)

As for the feature you are wishing for, I think it could fit into pinning with something like a "recursive mode", that would pin a given package, and, recursively, its dependencies. opam files found in package sources could allow a specific syntax for expressing that,

That sounds great! Adding pins in the opam files is exactly what I wish for.

In case that is interesting for you, I hacked something together that approaches this behavior in a very crude way (and relying on assumptions I can make because we control all the projects involved in the "tight" part of the dependency graph). The shell script https://gitlab.mpi-sws.org/FP/LambdaRust-coq/blob/master/build/opam-pins.sh expects on stdin an opam.pins file, which can look e.g. like this:

coq-iris https://gitlab.mpi-sws.org/FP/iris-coq#90f773c0eb319320932edfd4a5fbe878673bb3de

It will then apply these pins, but also recursively look at the given commits of the given repositories and process their opam.pins file. (The way it fetches that file relies on us using GitLab for our repositories so that we can fetch individual files via HTTP.)
Unfortunately, that was not enough because if you pin something with -n, and if the package was already installed at the same version (but different pin, e.g., a different git commit), opam doesn't actually remember that anything changed and forgets to re-compile. (Btw, is this a bug?) On the other hand, the -n is necessary because the pins need to all be applied in one transaction; compiling new versions of some pins with old versions of others won't work. So to fix this, I did some very crude hacking (I am kind of ashamed of that code, but then, it works...): I record the "opam pin list" before and after all this, diff it, and reinstall packages whose pin changed.
I am thinking maybe I could get rid of that hack if I find a way to automatically edit the opam file of the pins and overwrite the version number from dev to dev.<commit>. Then opam would at least notice that the version changed and a simple "opam install" or "opam upgrade" or so should do it...

EDIT: I now found something that works in way less hacky ways: When I set the pins, the version number is changed to include the commit hash. Then opam remembers it has to recompile. The link above has been updated.

It seems to me that what you suggest with the advanced dependency expressions would cover exactly what I try to cover with my hack, just obviously in a more integrated fashion and hopefully without problems related to not working "within a single opam transaction". That's a great sign :)

@AltGr
Copy link
Member

AltGr commented Nov 30, 2016

One concern I have here is that I don't want to fix all dependencies.

Not by far the best interface, but note that while there is no way to automatically generate partial "export" files, the current behaviour is to apply them on top of your existing switch state, changing as little as possible. You might be able to achieve what you are currently trying to do with opam switch import, and an export file containing just your pins. (the format changed between 1.2 and 2.0, and is now much easier to edit by hand. Old format is one package per line, pkgname version install-status [pin-kind pin-target], the new format is similar to switch state, but can also directly include pinned package definitions).

I don't really consider that a solution, at least not without tools to generate the file, but in the current state of affairs, it might help.

Unfortunately, that was not enough because if you pin something with -n, and if the package was already installed at the same version (but different pin, e.g., a different git commit), opam doesn't actually remember that anything changed and forgets to re-compile. (Btw, is this a bug?)

This is actually related to #2763: opam 2 will check for differences in the metadata, and determine if it has changed in a meaningful way (a different upstream URL or hash being one). If so, it will prompt to reinstall at the next opportunity, so I believe the next version will have the desired behaviour. (pinned packages may require reinstall without changes to their metadata, so the "reinstall" file is still there but only used to handle this case).

After some thinking, I don't think that using the same field for repository and dev opam files, but with a slightly different format, is a good idea. I still think this would be much nicer to specify without having to add an extra file, so the best option might be a new field, that is only meaningful when pinning the package (ignored — or better, disallowed — on the repository).

It could be something like:

pinning-dependencies: [
  [ "projectA.0.2.0" "https://gitlab.mpi-sws.org/FP/iris-coq#90f773c0eb319320932edfd4a5fbe878673bb3de" ]
]

This would be read at pinning time, and whenever projectB is updated, causing recursive pins or pin updates. Note that we probably woulnd't unpin if the constraint disappears, though.

I am glad that seems like an adequate solution to your problem.

@AltGr
Copy link
Member

AltGr commented Nov 30, 2016

Since you're using opam for coq, support should be much better in different areas with opam 2, as well. Mainly, all ocaml-specific code has been removed, which means you can define a coq installation as a first-class compiler and switch between them easily.

@RalfJung
Copy link
Author

After some thinking, I don't think that using the same field for repository and dev opam files, but with a slightly different format, is a good idea. I still think this would be much nicer to specify without having to add an extra file, so the best option might be a new field

I don't have a strong opinion on whether it's a new field or a part of the dependency. Making it a new file is obviously a band-aid. ;-) (I tried re-using the opam file by adding a comment in the opam file and then asking opam to show the opam file of a project; unfortunately, it seems to normalize the content of those files and throw away comments. So I pretty much have to go with a separate file for now.)

that is only meaningful when pinning the package (ignored — or better, disallowed — on the repository).

What is the reason this cannot work in a repository? I am thinking of "dev" versions in a repository here, of course, not of stable releases. (Such versions seem to be common at least in the Coq community, see https://github.com/coq/opam-coq-archive/tree/master/extra-dev/packages).

Since you're using opam for coq, support should be much better in different areas with opam 2, as well. Mainly, all ocaml-specific code has been removed, which means you can define a coq installation as a first-class compiler and switch between them easily.

I was already wondering whether the special treatment of OCaml in opam is more than just a history accident. Glad to hear the design scales.
I already have two switches for different Coq versions, and I can "opam switch coq-8.5" and "opam switch coq-8.6". What else could opam 2 give me?

@AltGr
Copy link
Member

AltGr commented Dec 1, 2016

(I tried re-using the opam file by adding a comment in the opam file and then asking opam to show the opam file of a project; unfortunately, it seems to normalize the content of those files and throw away comments. So I pretty much have to go with a separate file for now.)

For that, we have added "extension fields": any field in the opam file starting with x- will be kept and can be queried, but won't be used by opam. It's yet again a 2.0 feature though ;)

What is the reason this cannot work in a repository? I am thinking of "dev" versions in a repository here, of course, not of stable releases. (Such versions seem to be common at least in the Coq community, see https://github.com/coq/opam-coq-archive/tree/master/extra-dev/packages).

Two reasons:

  1. when in the repository, consistency of package sets should be defined by the relationships expressed in the repository terms (package+version): package B.1 dependency on package A.2 should match A.2 that exists in the repository.
  2. the feature we are drafting relies on recursive pinning, which makes sense if you opam pin B, pinning also A. It would be very weird that opam install B pins A (and that would get us back to the original problem of dependencies not known in advance to the solver, since the correct metadata of A wouldn't have to be in the repository)

So if we want to be able to handle this with repositories, we need a different solution. There is, for example, an idea hanging around about one-command generation of an ad-hoc repositories to share this kind of setups.

Note that my wording wasn't clear: the opam tool should ignore the field, except when pinning, while the repository rules and checking tools should forbid it from appearing, but it's at a different scope.

I was already wondering whether the special treatment of OCaml in opam is more than just a history accident. Glad to hear the design scales.
I already have two switches for different Coq versions, and I can "opam switch coq-8.5" and "opam switch coq-8.6". What else could opam 2 give me?

The possibility to create the coq switches in a single command, by defining "switchable" coq packages, and to have coq and all dependencies considered "base" packages for the switch, so that they won't be changed, removed or affected by upgrades by default. It would even be possible to define a switch relying on a system installation of coq, like is done for opam's system compiler switch. If you need to export coq-specific environment variables, that can now be done from the coq package, as well.

@RalfJung
Copy link
Author

RalfJung commented Dec 1, 2016

My impression for dev versions in the Coq world seems to be that the dependencies are not entirely precise; e.g., something may depend on Coq 8.6.dev (the v8.6 branch in the upstream git repo), but it will not actually work with all commits this branch ever pointed to. That's why I was asking. But I agree pinning is the wrong mechanism here, it'd have to be something deeper where opam understands git commits as version constraints. There's probably many commits of Coq that would work, as long as they are "not too old" and "not too new". I guess it would have to be more like "this needs Coq 8.6.dev, recent enough to contain commit X", but then there's still a risk of getting a new too version... I guess this is just a different (and harder ;) problem, then. And we may just not have a repo with dev versions of our packages, at least not until things settled down enough that most upgrades don't break older stuff (which I kind of doubt will ever happen, this being a research project and all^^).

to have coq and all dependencies considered "base" packages for the switch, so that they won't be changed, removed or affected by upgrades by default

That does indeed sound useful, though pins seem to serve the same purpose really well.

@AltGr
Copy link
Member

AltGr commented Jan 16, 2017

While this is probably too much for the current in-process release, I think it could fit quite well as a plugin for now, letting us further experiment with it:

  • Add a field x-pinning-dependencies: which binds package names to pinning targets
  • A plugin opam recursive-pin that:
    • pins the considered package (without taking action)
    • reads that field for the considered package
    • recurses on the bindings found (i.e. does all the pinnings recursively)
    • installs the original package, using what was pinned for the dependencies

The limitation of using a plugin is that you would have to run opam recursive-pin again whenever the x-pinning-dependencies changed. This shouldn't be too difficult to write.

EDIT: note, naming the field an the plugin the same would probably be better.

@AltGr
Copy link
Member

AltGr commented Sep 25, 2017

There is now a pin-depends: field (being merged) that addresses this partially.
It differs in significant ways though:

  • it is only active on packages you pin directly (has no effects for packages in repositories)
  • it is not recursive. A plugin or option could be added to make it so, but this is safer for the general case (and consistent with the first point above)
  • see also opam-lock, that generates the pin-depends: for a whole dependency tree (also consistent with above) for reproducibility.

I am curious how well this addresses your use-cases, and if it could be adapted to fit them better, or if recursive handling of a precise dependency tree with commit-hashes really seems to be the best (or only ?) solution for you.
Thanks for the feedback!

@RalfJung
Copy link
Author

Not having recursive pinning would certainly be a problem. However, opam-lock looks like it could provide the right tooling to overcome that.

If I compare this with Rust, there Cargo does not just bring lock files (emulated by opam-lock), but also direct dependencies on git commits:

[dependencies.foo]
git = "https://github.com/foo/foo.git"
rev = "a123456"

Of course, these also work recursively and in repositories (if the repository permits that; the "main" repository at https://crates.io insists on all dependencies being version numbers of packages in the same repository). I think that's what I originally was asking for here. And indeed we now also want to support people installing the latest version of our project from a repository, so pin-depends not working from repositories makes that a non-solution.


Anyway, we have since then changed our approach to handling dependencies: We now have tooling that automatically gives each commit we push a version number and puts it into an opam repository. That lets us use normal opam dependencies throughout. The one thing we still have to hack around is the lack of a command saying "please install the build-dependencies of this package (given by a path to the opam file) and satisfy all pins and upgrade everything"; from my understanding based on #2764, opam 2.0 will bring such a feature. Right now, we have some hilarious hacks to make things work.

@AltGr
Copy link
Member

AltGr commented Sep 26, 2017

Ok, I see indeed that this is a different use-case, although there is some overlap. Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants