Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant selection mechanism for virtual libraries #1900

Merged
merged 1 commit into from
Mar 27, 2019

Conversation

TheLortex
Copy link
Collaborator

The issue that was tracking this is #1859. I've been able to implement the variant selection mechanism along with default implementations in lib.ml, within closure construction.

It adds some syntax in dune files: for libraries, an implementation can specify its variant and a virtual library can specify its default_implementation. For executables, it's possible to give a set of variants to match.

What it does is the following: when the closure is computed, a map from virtual libraries to implementations is also built. Then variants are resolved and after that there's only virtual libraries with default implementations remaining unsolved: here we need to check for potential ambiguity while finding out which default implementation we can safely choose to add in the closure (and it loops until everything is resolved.).

This is not ready for a merge but we can start reviewing code and thinking about nice error messages !

@rgrinberg
Copy link
Member

Will try to give this a review today.

src/dune_file.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

Looks good so far. Great work. I think that we'll be able to get this in for 1.8. From my brief look so far:

The test suite needs a bit more work. We need to add a test that witnesses the changes to the dune-package format. We also need some tests for the case where variants are loaded from an external package. Also, I regret the pattern of stuffing all the variant tests into a single run.t. It really slows down the test suite, so we should not do it. I propose that we split the test suite into:

  • virtual libraries (basically my old tests)
  • default implementations
  • variants

I will do a more thorough review tomorrow

src/toplevel.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

Also: please rebase the PR when you have the time. The documentation and change log need to be updated as well

@rgrinberg
Copy link
Member

Okay, I refactored the tests myself. It's a mess that I caused anyway.

@rgrinberg
Copy link
Member

I've added some formatting tweaks:

  • Reformatted everything with ocp-indent (please use it as well)
  • Wrapped to 80 chars
  • Remove most spurious uses of let .. and. If there is no recursion, let's just stick to normal let bindings.

src/dune_file.mli Outdated Show resolved Hide resolved
@TheLortex TheLortex requested a review from a user March 7, 2019 09:15
@TheLortex TheLortex requested a review from emillon as a code owner March 7, 2019 09:15
@rgrinberg rgrinberg force-pushed the impl-variants branch 4 times, most recently from 3b54daa to 163ea3c Compare March 7, 2019 09:55
src/lib.ml Show resolved Hide resolved

and closure_with_overlap_checks db ts ~stack:orig_stack ~linking ~variants =
let name_to_lib name loc =
resolve_dep (Option.value_exn db) name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be right I think. db is None in some cases. Why can't it be None here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is weird. I assumed the database was accessible when virtual libraries resolution was needed. We just need to be sure that variants is empty when db is None. It's correct for every case, except in lib.ml: resolve_user_written_deps_for_exe, where db is (Option.some_if (not allow_overlaps) t). I don't really know about that allow_overlap feature: when is it set to true ?

@TheLortex TheLortex force-pushed the impl-variants branch 2 times, most recently from b089e6b to 7ac752c Compare March 7, 2019 14:42
@rgrinberg rgrinberg force-pushed the impl-variants branch 2 times, most recently from bd2425a to 56bdee9 Compare March 11, 2019 09:37
@@ -227,6 +237,7 @@ module DB : sig
val create
: ?parent:t
-> resolve:(Lib_name.t -> Resolve_result.t)
-> find_implementations:(Lib_name.t -> Lib_info.t list Variant.Map.t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on the necessity of this new field? Why can't we just use all and iterate over all the public libraries this way for example? Is it because we want to avoid resolving them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to avoid resolving them. Before coming to this solution I wanted to put possible implementations in the Lib.t type but as you mentioned after this introduces a dependency cycle on instantiation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed that your first attempt would have been ideal. Unfortunately, we will need forward references before it's possible.

Why is resolving the implementations a problem however? Are there situations where we just need the Lib_info.t? We can't really do much with this value apart from resolving it into a proper library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a problem, I just tried to do a minimal amount of computation when evaluating variants: we only need to resolve the implementations that are actually selected.

@rgrinberg
Copy link
Member

The issue with these Option.value_exn should really be solved by having a field such as:

; default_implementation : Lib.t Or_exn.t option

in the Lib.t type.

However, this isn't really possible to add in instantiate as it would introduce a dependency cycle when resolving the libraries.

@diml Would it be possible to create some sort of a forward reference in Lib.t to address this issue?

@ghost
Copy link

ghost commented Mar 12, 2019

Ah, that's annoying. A forward reference is ok if it is filled immediately after the Lib.t value is created, otherwise it is going to be hard to reason about when it is safe to reuse these values between runs.

BTW, I've been wondering if instantiate wasn't too complicated. Instead of storing direct Lib.t values in Lib.t, we would just store a new Resolved_lib_name.t type. And the idea would be that going from Resolved_lib_name.t to Lib.t would be very fast so it would be almost like having a Lib.t directly, but on the other hand it would simplify this code.

@rgrinberg
Copy link
Member

BTW, I've been wondering if instantiate wasn't too complicated. Instead of storing direct Lib.t values in Lib.t, we would just store a new Resolved_lib_name.t type. And the idea would be that going from Resolved_lib_name.t to Lib.t would be very fast so it would be almost like having a Lib.t directly, but on the other hand it would simplify this code.

That might work, but then we have to allow for cycles when resolving the Resolved_lib_name.t, right? Seems like it's the same problem. What's the semantic difference between Resolved_lib_name.t and Lib.t?

@ghost
Copy link

ghost commented Mar 12, 2019

Nope, we wouldn't need to recurse for getting a Resolved_lib_name.t. The only information attached to a Resolved_lib_name.t would be that it exist. We wouldn't know whether it is hidden or not for instance. It's pretty much the same as storing Lib_name.t values in Lib.t, and doing a resolution everytime we want to access one of those, except that it is faster because we already did the job of resolving the name.

@ghost
Copy link

ghost commented Mar 12, 2019

To be clear, the distinction is as follow:

  • Lib_name.t: just a string
  • Resolved_lib_name.t: a unique identification of a library, i.e. a lib name + a path
  • Lib.t: information about a library

@TheLortex
Copy link
Collaborator Author

So where are we now ? Shoud I try to implement that Resolved_lib_name module and refactor the code ?

By the way regarding libraries forcing implementations, I was suggested that it might not be a sane behavior, so we could simply forbid that. What do you think ?

@rgrinberg
Copy link
Member

So where are we now ? Shoud I try to implement that Resolved_lib_name module and refactor the code ?

I need to think about this change a bit more. @diml would this require storing the Lib.DB.t in the Lib.t? Otherwise, I'm not really sure what a resolved name means if we don't know which database it came from.

By the way regarding libraries forcing implementations, I was suggested that it might not be a sane behavior, so we could simply forbid that. What do you think ?

I think it's fine to forbid it.

@TheLortex Let me have a look at this patch again today. Sorry about the delay, but we really need some simplifications here. The code was already too complex even before this PR.

@ghost
Copy link

ghost commented Mar 20, 2019

Well, resolved library names would be globally unique, so we would have a single db going from resolved library names to Lib.t values. In fact, it could be just a memoized function. It is basically the function that takes a unique way to identify a library and returns its definition by reading dune, dune-package or META files. Breaking things down this way would also help having more fine grained memoization: the library databases would just become string -> resolved named maps, which are much smaller than the current string -> lib maps. And individual library definition would have less dependency in memory.

@rgrinberg rgrinberg force-pushed the impl-variants branch 2 times, most recently from 9375820 to bd08856 Compare March 24, 2019 11:02
src/lib.ml Outdated
type nonrec t =
| No_implementation
| Implemented_by of t
| Too_many_impl of t list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the arguments to the Implemented_by and Too_many_impl constructors? I don't see them being used anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed they are not useful. They were built for potential error messages but we can remove that actually.

@rgrinberg
Copy link
Member

I've squashed the commits into one as it makes rebasing simpler. @TheLortex I've removed a few cases where libraries were compared by name. This isn't correct in dune and names aren't unique. In fact, I agree with Jeremie's current idea of adding an intermediate step for name resolution. So even though there are still some problematic parts in the code that rely on concrete library names too much, I think this is good enough for now. I will look into implementing Jeremie's proposal in a subsequent PR.

@rgrinberg
Copy link
Member

@TheLortex Have another look at my code. I've tried to simplify some things.

For your Vlib_status type:

module Vlib_status = struct
  type nonrec t =
    | No_implementation
    | Implemented_by of t
    | Too_many_impl of t list
end

I'm wondering if it overlaps, a little too much with our:

      type vlib_status =
        | No_impl of Dep_stack.t
        | Impl of lib * Dep_stack.t
      type t = vlib_status Map.t

So instead of a Vlib_status.t Map.t, we could use vlib_status Map.t Or_exn.t.

Yes, our error handling is a little less precise, but at least it will be consistent everywhere. Moreover, cases like this are confusing to me:

        Map.foldi !virtual_status ~init:(Ok ([], []))
          ~f:(fun lib status acc ->
            match status with
            | Implemented_by _
            | Too_many_impl _ -> acc

Shouldn't we be treating Too_many_impl _ as an error there? It probably doesn't matter, but using the type that I've proposed should simplify some things I think.

Using Lib_info.t for the dependency is something that I wish we'd avoid. We already have a Dep.Stack.t type. This is just a minor point and doesn't prevent the PR from being merged, but is there a reason why you did it this way?

@rgrinberg
Copy link
Member

@TheLortex Could you add a comment documenting vlib_default_parent by the way? I think we should have a comment for every single one of these refs that we use while traversing the closure.

@TheLortex
Copy link
Collaborator Author

TheLortex commented Mar 26, 2019

Thank you for your work on this ! I took a look, it's indeed more readable than the first version.
Moreover, if we forbid libraries to depend (and thus force) implementations, we can actually remove the default implementation dependency cycle check.

Shouldn't we be treating Too_many_impl _ as an error there? It probably doesn't matter, but using the type that I've proposed should simplify some things I think.

The error is handled later, in the Virtual_libs module it throws Double_implementation so I didn't want to make a duplicate. I guess we could use vlib_status instead.

Using Lib_info.t for the dependency is something that I wish we'd avoid.

Could you add a comment documenting vlib_default_parent by the way?

If we forbid libraries to depend on implementations we can actually remove the whole resolve_default_libraries function.

@rgrinberg I'll update the PR accordingly

@rgrinberg
Copy link
Member

I'm not sure we should forbid to depend on implementations. The reason being is that it makes it hard to turn existing libs into implementations.

Also non dune users can only consume implementations so I think we have to live with the check for now.

@rgrinberg
Copy link
Member

@TheLortex The last commit replaces the custom map with Vlib.Table.Partial. Things almost seem to work except for the resolution_priority issue. Could you explain that test a little more?

@TheLortex
Copy link
Collaborator Author

Yes you're right. Let's keep the check then.

resolution_priority was a case where the three kinds of implementation selection were used:

  • for direct it's an user defined dependency to direct.ocaml
  • for variant it's selected by using variant c
  • for test it uses a default implementation

It also checks that an user defined dependency overrides variant selection. Thus even if the c variant is given, direct.c should not be selected because direct is already implemented by direct.ocaml.

This allows to override variant selection in case of ambiguity.

@rgrinberg
Copy link
Member

I've removed the Vlib_status stuff in favor of just keeping unimplemented libraries in an abstract type. I think this is good enough to merge now.

Variants are a mechanism for selecting virtual library implementations en masse.
This PR includes the following changes:

* Add a [variant] field to implementations. This tags a virtual library
  implementation with a variant.

* Add [variants] field to executables and toplevels. This allows us to select
  implementations en masse when linking.

Default implementation is a feature to select a library to be a default
implementation when no implementation is selected by the user. This feature
allows to smoothly port existing libraries to be virtual.

Signed-off-by: Lucas Pluvinage <lucas.pluvinage@gmail.com>
Signed-off-by: Rudi Grinberg <rudi.grinberg@gmail.com>
@rgrinberg rgrinberg merged commit 23f6db7 into ocaml:master Mar 27, 2019
rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Apr 9, 2019
CHANGES:

- Warn when generated `.merlin` does not reflect the preprocessing
  specification. This occurs when multiple stanzas in the same directory use
  different preprocessing specifications. This warning can now be disabled with
  `allow_approx_merlin` (ocaml/dune#1947, fix ocaml/dune#1946, @rgrinberg)

- Watch mode: display "Success" in green and "Had errors" in red (ocaml/dune#1956,
  @emillon)

- Fix glob dependencies on installed directories (ocaml/dune#1965, @rgrinberg)

- Add support for library variants and default implementations. (ocaml/dune#1900,
  @TheLortex)

- Add experimental `$ dune init` command. This command is used to create or
  update project boilerplate. (ocaml/dune#1448, fixes ocaml/dune#159, @shonfeder)

- Experimental Coq support (fix ocaml/dune#1466, @ejgallego)

- Install .cmi files of private modules in a `.private` directory (ocaml/dune#1983, fix
  ocaml/dune#1973 @rgrinberg)

- Fix `dune subst` attempting to substitute on directories. (ocaml/dune#2000, fix ocaml/dune#1997,
  @rgrinberg)

- Do not list private modules in the generated index. (ocaml/dune#2009, fix ocaml/dune#2008,
  @rgrinberg)

- Warn instead of failing if an opam file fails to parse. This opam file can
  still be used to define scope. (ocaml/dune#2023, @rgrinberg)

- Do not crash if unable to read a directory when traversing to find root
  (ocaml/dune#2024, @rgrinberg)

- Do not exit dune if some source directories are unreadable. Instead, warn the
  user that such directories need to be ignored (ocaml/dune#2004, fix ocaml/dune#310, @rgrinberg)

- Fix nested `(binaries ..)` fields in the `env` stanza. Previously, parent
  `binaries` fields would be ignored, but instead they should be combined.
  (ocaml/dune#2029, @rgrinberg)

- Allow "." in `c_names` and `cxx_names` (ocaml/dune#2036, fix ocaml/dune#2033, @rgrinberg)
rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Apr 9, 2019
CHANGES:

- Warn when generated `.merlin` does not reflect the preprocessing
  specification. This occurs when multiple stanzas in the same directory use
  different preprocessing specifications. This warning can now be disabled with
  `allow_approx_merlin` (ocaml/dune#1947, fix ocaml/dune#1946, @rgrinberg)

- Watch mode: display "Success" in green and "Had errors" in red (ocaml/dune#1956,
  @emillon)

- Fix glob dependencies on installed directories (ocaml/dune#1965, @rgrinberg)

- Add support for library variants and default implementations. (ocaml/dune#1900,
  @TheLortex)

- Add experimental `$ dune init` command. This command is used to create or
  update project boilerplate. (ocaml/dune#1448, fixes ocaml/dune#159, @shonfeder)

- Experimental Coq support (fix ocaml/dune#1466, @ejgallego)

- Install .cmi files of private modules in a `.private` directory (ocaml/dune#1983, fix
  ocaml/dune#1973 @rgrinberg)

- Fix `dune subst` attempting to substitute on directories. (ocaml/dune#2000, fix ocaml/dune#1997,
  @rgrinberg)

- Do not list private modules in the generated index. (ocaml/dune#2009, fix ocaml/dune#2008,
  @rgrinberg)

- Warn instead of failing if an opam file fails to parse. This opam file can
  still be used to define scope. (ocaml/dune#2023, @rgrinberg)

- Do not crash if unable to read a directory when traversing to find root
  (ocaml/dune#2024, @rgrinberg)

- Do not exit dune if some source directories are unreadable. Instead, warn the
  user that such directories need to be ignored (ocaml/dune#2004, fix ocaml/dune#310, @rgrinberg)

- Fix nested `(binaries ..)` fields in the `env` stanza. Previously, parent
  `binaries` fields would be ignored, but instead they should be combined.
  (ocaml/dune#2029, @rgrinberg)

- Allow "." in `c_names` and `cxx_names` (ocaml/dune#2036, fix ocaml/dune#2033, @rgrinberg)

- Format rules: if a dune file uses OCaml syntax, do not format it.
  (ocaml/dune#2014, fix ocaml/dune#2012, @emillon)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants