Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute transitive modules depencies #666

Merged
5 commits merged into from
Apr 17, 2018
Merged

Conversation

emillon
Copy link
Collaborator

@emillon emillon commented Mar 30, 2018

This work in progress PR is an attempt to fix #660.

@diml let me know how we should do that. From what I understood, the plan is to compute transitive dependencies of modules, so in the linked bug, add a dependency from Main to Lib_sub?

@ghost
Copy link

ghost commented Mar 30, 2018

Indeed, that's the idea.

Overview

The Ocamldep (src/ocamldep.mli) module abstracts the computation of cross-module dependencies using ocamldep. While setting up the compilation rules for libraries and executables, we have a value of type Ocamldep.Dep_graphs.t at hand which gives us both the dependency graphs for implementations and interfaces. Then we use Ocamldep.Dep_graph.deps_of graph m to find out what modules m depends on.

So the goal of this PR is to make Ocamldep.Dep_graph.deps_of return the transitive dependencies instead of just the first step ones, i.e. the ones returned by ocamldep -modules. The compiler will only recurse through interfaces, i.e. it might load transitive dependencies of cmi files, but not of cmx files. If we note deps_of(M, Kind) the direct dependencies of module M with Kind being either Impl or Intf, what we want to compute is the following function:

transitive_deps_of(M, Kind) =
  let deps = deps_of(M, Kind) in
  deps U transitive_deps_of(X, Intf) for X in deps

Implementation details

Ocamldep.Dep_graph.deps_of returns a value of type Build.t. Build.t is an arrow that we use to represent computations that might read generated files. In Jbuilder, a rule is simply an arrow of type (unit, Action.t) Build.t. deps_of is basically implemented like this:

Build.contents "file.ml.d"
>>>
Build.arr parse

where file.ml.d contains the output of ocamldep -modules file.ml and Build.contents is an arrow that evaluates to the contents of file.ml.d. Build.contents takes a static filename, i.e. the filename cannot be computed using the contents of generated files. What that means is that we cannot represent the union of transitive_deps_of(X, Intf) for X in deps directly in the build arrow, since computing deps requires reading generated files. We could add a dynamic version of Build.contents, although this would require diving deeper inside Jbuilder and there might also be performance implications, so it's a more complex change. Alternatively we can go through intermediate files. For instance if we say that the M.ml.all-d contains transitive_deps(M, Impl), then the rule to produce M.ml.all-deps can read M.ml.d and produce an action that will read all the X.mli.all-d files and merge them. The action that merge the files can be implemented using a bunch of cat <file>, or by adding a specific action. To add a specific action, simply add a new constructor to Action_intf.Ast.t and then follow the compilation errors to find the code to upgrade.

Testing

We have a blackbox tests framework which allows to run the final jbuilder executables on real examples and capture the output. Usually what we do is start by adding a test that demonstrates the failure so that we can see when it is fixed.

To add a blackbox test, you can create a directory in test/blackbox-tests/test-cases, for instance test/blackbox-tests/test-cases/github660. Simply put the files in #660 in there except for run.sh which should become run.t with this contents (including the two spaces at the beginning of each line):

  $ $JBUILDER runtest --root . --display quiet -j1
  $ echo 'let x = 1' >> lib_sub.ml
  $ $JBUILDER runtest --root . --display quiet -j1

Then edit test/blackbox-tests/jbuild and copy&past the last stanza, replacing test-cases/XXX by test-cases/github660. If you then run make test followed by make promote, the file will be updated with the output of each command.

BTW, I was initially thinking that we would just remove a few -no-alias-deps, which was a much simpler change. This one is much bigger, I hope you are still happy to work on it, but if not just let me know.

@emillon emillon force-pushed the rebuild-module-aliases branch from f9ad029 to f37e2dd Compare April 3, 2018 13:07
@emillon
Copy link
Collaborator Author

emillon commented Apr 3, 2018

Thanks for the detailed instructions. I have some questions already:

The build arrow

I'm having trouble understanding the relation between rules and the build arrow. If I understand it correctly, a rule is added for each module, and there's no "pattern rule":

dune/src/ocamldep.ml

Lines 129 to 133 in 126c91f

if not (Module.Name.Set.mem already_used unit.name) then
SC.add_rule sctx
(Build.run ~context (Ok context.ocamldep)
[A "-modules"; Ml_kind.flag ml_kind; Dep file]
~stdout_to:ocamldep_output);

Is there an implicit dependency through the file system, or do I have to handle that by hand? In other words, if my Build.t value reads a file, can I expect it to be created "automatically"?

Tests

About tests, other blackbox tests use a plain jbuilder and not $JBUILDER, so I used that. Is $JBUILDER preferable? (+ I guess adding a (setenv JBUILDER ...) since otherwise the variable is unset)

BTW, I was initially thinking that we would just remove a few -no-alias-deps, which was a much simpler change. This one is much bigger, I hope you are still happy to work on it, but if not just let me know.

No problem, I'll just take baby steps because that's indeed quite involved already!

@rgrinberg
Copy link
Member

@emillon Jeremie is MIA this week, so I'll try and answer your questions:

If I understand it correctly, a rule is added for each module

Yes. Dependency analysis is done on every module separately to produce a corresponding .d file. As you might notice, this is different from other build tools that usually run ocamldep in bulk on the entire project.

and there's no "pattern rule":

That's correct. You must set up rules explicitly by looping.

Is there an implicit dependency through the file system, or do I have to handle that by hand? In other words, if my Build.t value reads a file, can I expect it to be created "automatically"?

The build arrow will handle dependencies for you as long as you use the dependency aware primitives. What this means is that you should certainly avoid the Unix.* variety of functions to interact with the filsystem. In the snippet that you've linked we can see a few instances of the automatic dependency management:

  • Dep file will register the path of file as a dependency for the rule being added with SC.add_rule

  • stdout_to:ocamldep_output adds ocamldep_output as a target of the current rule we're adding. So once we depend on the path of ocamldep_output elsewhere, jbuilder will know which rule to fire to produce an up to date version of it.

There are many examples of this, so I can't list them all, but certainly ask if you're unclear about how to model your dependencies.

About tests, other blackbox tests use a plain jbuilder and not $JBUILDER, so I used that. Is $JBUILDER preferable? (+ I guess adding a (setenv JBUILDER ...) since otherwise the variable is unset)

$JBUILDER was a hack that existed in an older version of our blackbox tests. It's no longer relevant so don't worry about it. Use other blackbox tests as examples as you're doing now and you'll be on the right path.

@emillon
Copy link
Collaborator Author

emillon commented Apr 3, 2018

Thanks, that's helpful.

I think that I'm getting somewhere:

  • I extended the cat machinery so that it can deal with multiple files.
  • I managed to split the current rule into one that deals with ocamldep output (.d files) and one that deals with transitive output (.all-deps files).
  • I tricked cat into writing to a new file by using with_outputs_to and action_dyn. That does not look very good, so I might be missing something (using Build.action maybe?).

Anyway, the problem I'm hitting is how to handle missing files. It seems that I'm not handling the case where the .mli.d file does not exist. It seems to me that action_dyn should be adding a dependency, so the problem must be elsewhere. I think that I'd like my mli_d_path function to return None when the corresponding file does not exist, but it's not running at the correct level (Build vs Action it seems). Any pointers appreciated even if I realize that my question is not very clear! 😅

Extra question: what is Build.memoize used for? So that the dependency graph is not recomputed, or so that the action is only done once?

Thanks a lot!

@rgrinberg
Copy link
Member

I extended the cat machinery so that it can deal with multiple files.

That seems like it could be a useful change, but it's actually not necessary for this work.

I tricked cat into writing to a new file by using with_outputs_to and action_dyn. That does not look very good, so I might be missing something (using Build.action maybe?).

Why do you need action_dyn here? I think this is what you mean:

diff --git a/src/ocamldep.ml b/src/ocamldep.ml
index f1c5942f..d6609482 100644
--- a/src/ocamldep.ml
+++ b/src/ocamldep.ml
@@ -159,10 +159,8 @@ let rules ~(ml_kind:Ml_kind.t) ~dir ~modules
               Action.with_outputs_to all_deps_file @@ Action.cat paths
             in
             SC.add_rule sctx
-              ( Build.lines_of ocamldep_output
-                >>^ build_paths
-                >>> Build.action_dyn ~targets:[all_deps_file] ()
-              )
+              (Build.lines_of ocamldep_output
+               >>^ build_paths)
           end;
         Build.memoize (Path.to_string all_deps_file)
           (Build.lines_of all_deps_file

There's no need to add a target here because it will be inferred from your action. You can see this from how infer works on the Redirect constructor.

Anyway, what I think will be a lot more convenient for you is to use the functions in the Build module rather than doing everything through Actions. Actually, there's nothing wrong with the way you're doing it, but it can be awkward. For example, to create a target with the concatenated of many files, I would simply use Build.all, Build.contents, and Build.write_file_dyn. But your change to cat is could be good anyway btw - cat in Unix also accepts multiple arguments.

It seems that I'm not handling the case where the .mli.d file does not exist

How should this case be handled? Sorry, I haven't taken the time to understand the issue and solution yet. At first glance, the way you check for .mli absence is correct.

I think that I'd like my mli_d_path function to return None when the corresponding file does not exist, but it's not running at the correct level (Build vs Action it seems).

As I said, the way you're handling missing interfaces seem correct. A couple of points that might clarify things for you:

  • Note that jbuilder knows all the .ml/.mli sources for a library upfront. That is, before we've generated the rules for a library. This means that when our build rules execute, they shouldn't care whether .ml or .mli files are absent because we were supposed to generate rules specific to our set of .ml and .mli's.

  • To clarify the Action vs. Build distinction let'd review what an Action. is. Action.t is pretty much the value a user can construct using the (action ..) constructor in jbuild files. So basically, it's a shell-like command with a few bells and whistles to make it easy to keep track of dependencies and targets. In other words it's just a large variant (I'm simplifying some things here).

  • A Build.t represents an arbitrary computation during the execution of a build. As you say, in dune we have 2 levels for the build rules. I'll call them stages. The first stage is when we generate the build rules (Build.t values) and register them (SC.add_rule), and a 2nd stage where the build rules actually execute to produce targets. Anything that is done in the 2nd stage must be done through a Build.t value. One of the advantages of using an arrow is that it prevents us from mixing up these stages. So if you're trying to check for an existence of a file inside the execution of a build rule, you would find this indeed difficult. Because it really shouldn't be done often (though we do have a workaround with Build.if_file_exists)

@emillon emillon force-pushed the rebuild-module-aliases branch 2 times, most recently from 08f57cd to e249f90 Compare April 4, 2018 16:40
@emillon
Copy link
Collaborator Author

emillon commented Apr 4, 2018

This is starting to work! The problem is now fixed in the case where there are explicit interface files (I added a separate test case for that).

I'd like to use Build functions as well, but it seems that there is no way to get the path from inside of the arrow (i.e., contents_dyn : (Path.t, string) Build.t), so I had to defer that to Action time. Otherwise, I agree that manipulating with Build.all, Build.contents, and Build.write_file_dyn would be easier. If there's a way to use this, I'm interested.

If the action_dyn part is removed, this errors with "Build_interpret.Rule.make: rule has no targets". Indeed, the Arr case cannot determine where the action will write.

There is a similar problem with what the action will read, so I added the dependencies with Build.dyn_paths and extracted will_read_this and will_write_to to simplify the resulting arrow expression.

There are still some problems:

  • the contents of *.all-deps are not deduplicated, so the performance might not be great
  • this only works when there is an explicit interface file
  • it'd be great to remove the manual dependency & target tracking although I'm not sure how to do that. As @diml suggested, adding a custom action that reads a file list from a file and writes it to a target would probably work, but this seems super specific.

@rgrinberg
Copy link
Member

OK, I re-read the issue and you're doing things the right way.

this only works when there is an explicit interface file

What should happen when there is no interface file? Can we add a test case for this situation perhaps? If you'd like to your test cases run in isolation, take a look at a test like private-public-overlap to see how this is done (every folder corresponds to a test case).

it'd be great to remove the manual dependency & target tracking although I'm not sure how to do that

The way you're doing it is fine. I'd get rid of all the 1 off functions as they don't really have good names. Something like this:

diff --git a/src/ocamldep.ml b/src/ocamldep.ml
index fdf3ea61..4172e4d1 100644
--- a/src/ocamldep.ml
+++ b/src/ocamldep.ml
@@ -120,12 +120,6 @@ let parse_deps ~dir ~file ~(unit : Module.t)
   in
   List.concat_map lines ~f:parse_line
 
-let will_read_this =
-  Build.dyn_paths (Build.arr (fun x -> x))
-
-let will_write_to target =
-  Build.action_dyn ~targets:[target] ()
-
 let rules ~(ml_kind:Ml_kind.t) ~dir ~modules
       ?(already_used=Module.Name.Set.empty)
       ~alias_module ~lib_interface_module sctx =
@@ -164,15 +158,13 @@ let rules ~(ml_kind:Ml_kind.t) ~dir ~modules
               in
               paths
             in
-            let write paths =
-              Action.with_stdout_to all_deps_file @@ Action.cat paths
-            in
             SC.add_rule sctx
               ( Build.lines_of ocamldep_output
                 >>^ build_paths
-                >>> will_read_this
-                >>^ write
-                >>> will_write_to all_deps_file
+                >>> Build.dyn_paths (Build.arr (fun x -> x))
+                >>^ (fun paths ->
+                  Action.with_stdout_to all_deps_file (Action.cat paths))
+                >>> Build.action_dyn ~targets:[all_deps_file] ()
               )
           end;
         Build.memoize (Path.to_string all_deps_file)

@emillon emillon force-pushed the rebuild-module-aliases branch from e249f90 to c3d7dfb Compare April 5, 2018 12:00
@rgrinberg
Copy link
Member

rgrinberg commented Apr 6, 2018

@emillon btw, a signed CLA will be necessary before we can accept your code. If you don't mind, the details of the procedure are available here: https://janestreet.github.io/contributing.html

Sorry for the hassle, but if we get this done in advance, then your valueable contribution will not be delayed.

@emillon emillon force-pushed the rebuild-module-aliases branch from c3d7dfb to b5b1393 Compare April 6, 2018 14:50
@emillon
Copy link
Collaborator Author

emillon commented Apr 6, 2018

What should happen when there is no interface file? Can we add a test case for this situation perhaps?

Just re-wrote the tests with subdirectories as you suggested. I don't know yet how to handle this subcase, unfortunately.

I'd get rid of all the 1 off functions as they don't really have good names

Done.

a signed CLA will be necessary before we can accept your code

I am very surprised by this. I assumed that since this project moved under ocaml/, it was now a community project. In addition, this does not seem to be mentioned in the source tree. I'll check to see what's possible on my side, but a DCO would be way easier of course. I'll open an issue to discuss this aspect.

@yminsky
Copy link
Contributor

yminsky commented Apr 6, 2018

Changing the org in Github doesn't really change the legal status. That legal status, though, doesn't I hope cut against this being a community-oriented project. Note that contributing to OCaml itself also requires a CLA (though to a different organization), and until fairly recently, it was considered part of best practices for responsibly managing the licensing of an open-source project.

And to be clear, the source code itself is licensed liberally, and there are contributors for a number of organizations.

FWIW, we do hope to move to a new more lenient license and to a DCO, not just for Dune, but also for all of Jane Street's open source code. But until then, the CLA is still required.

@emillon emillon mentioned this pull request Apr 6, 2018
@emillon emillon force-pushed the rebuild-module-aliases branch from b5b1393 to 83ce859 Compare April 10, 2018 08:36
@ghost
Copy link

ghost commented Apr 10, 2018

When there is no mli, we should read the .ml.X file instead, it will always contain a super-set of what we are trying to compute.

Adding a custom action seems fine to me, we already have specific actions that are not exposed to the user such as Remove_tree and Digest_files. I'm fine either way, but we should at least do the de-duplication once at the end, that will save some computations in module_compilation.ml.

Extra question: what is Build.memoize used for? So that the dependency graph is not recomputed, or so that the action is only done once?

The values computed by arrows are not memoized by default, so if a Build.t value is used in several places, it will be computed everytime. For instance, if you remove the Build.memoize in rules, parse_deps will end up being called once for the rule for building the .cmo, once for the rule for building the .cmx and once for computing the toplogical sort.

@emillon emillon force-pushed the rebuild-module-aliases branch 2 times, most recently from 8c1f10f to 63e7a83 Compare April 12, 2018 16:39
@emillon
Copy link
Collaborator Author

emillon commented Apr 12, 2018

I signed and sent the CLA.

When there is no mli, we should read the .ml.X file instead, it will always contain a super-set of what we are trying to compute.

Great, I did this and this made the second test pass! I had to ignore the case where the dependency is an alias. I'm not sure I understand why this is necessary, but removing this case makes bootstrap fail.

we should at least do the de-duplication once at the end

I implemented a first version of that. Next I'll try to see if I can move that into a custom action, which might remove the need for extending cat.

@emillon emillon force-pushed the rebuild-module-aliases branch from 63e7a83 to 88cb057 Compare April 13, 2018 12:22
@emillon emillon changed the title WIP: Rebuild module aliases Compute transitive modules depencies Apr 13, 2018
@emillon
Copy link
Collaborator Author

emillon commented Apr 13, 2018

I replaced the extended Cat by a custom action that merges files together. This is not as good as I expected, because the arrow cannot use the infer result, so there's still some manual dependency tracking with dyn_paths and action_dyn. I moved that part to a helper in Build, though.

Let me know what you think.

@emillon emillon force-pushed the rebuild-module-aliases branch from 88cb057 to 51a98c3 Compare April 13, 2018 12:27
src/ocamldep.ml Outdated
in
let paths =
[ocamldep_output]
@ List.filter_map dependencies ~f:mli_d_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must read the .mli.all-deps files here

src/ocamldep.ml Outdated
| None when Option.is_some alias_module -> None
| None -> Module.file ~dir m Ml_kind.Impl
in
Option.map path ~f:ocamldep_output_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should read the .all-deps files, otherwise we are not computing the transitive dependencies. Doing this change should simplify the code a bit, as the .all-deps no longer need to use the ocamldep syntax and can use whatever syntax we want

@ghost
Copy link

ghost commented Apr 16, 2018

Thanks, I think if the complicated code is in build.ml it's not a problem. We could use infer to make sure we have registered all the dynamic dependencies, however there are some internal actions such as remove_tree that are currently not supported by infer.

@emillon emillon force-pushed the rebuild-module-aliases branch from 51a98c3 to 7b43460 Compare April 16, 2018 13:00
@emillon
Copy link
Collaborator Author

emillon commented Apr 16, 2018

I changed so that it reads the corresponding .all-deps file instead of plain ocamldep output. I replaced the format by a merge-friendly format (one module name per line), but unfortunately this means adding an extra temporary file for the raw ocamldep output.

@ghost
Copy link

ghost commented Apr 16, 2018

What about changing Merge_into to take an additional string list argument? This does look awfully specific, but I believe it is the primitive we would use to compute any transitive closure through the file system.

@emillon emillon force-pushed the rebuild-module-aliases branch from 7b43460 to 2aac534 Compare April 16, 2018 13:39
@emillon
Copy link
Collaborator Author

emillon commented Apr 16, 2018

Sure, I just changed it so that it's the initial value for the fold.

It'd be great to be able to pass ocamldep output here directly, but since it has to wait until Action time I don't think that's possible here.

@ghost
Copy link

ghost commented Apr 16, 2018

I guess it would be good to have a way to register custom actions, so that we could define this action in ocamldep.ml directly, but that's too big of a change for this PR I think.

Could you update the code in ocamldep.ml to use this new argument rather than go through a temporary file?

src/ocamldep.ml Outdated
else
Module.Name.Map.find modules m)
in
if check then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of this function (the check and addition the alias_module) should go in parse_deps. We only need to do these operations once when reading the output of ocamldep.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I wasn't sure if the alias_module part was relevant.

src/ocamldep.ml Outdated
let path =
match Module.file ~dir m Ml_kind.Intf with
| Some x -> Some x
| None when Option.is_some alias_module -> None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look and I understand why we need this: alias_modules is in modules so we setup the rule to call ocamldep on it and given it's generated contents, it always reports dependencies on every other modules. Additionally, we manually add a dependency from every other module to alias_module, so we end up with a circular dependency.

We didn't notice the issue before this PR as we manually override the binding for alias_module just after (line 193). What we should do instead is add a test at the beginning of this function (line 140) to make sure we don't generate rules for alias_module and just return Build.return [] instead. dependency_file_path should return None when m is alias_module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved that part to the main Module.Name.Map.map modules call as you suggested. There's some duplication between the two Module.file calls since they're both coupled to a match on alias_module, but that might be early to extract an helper function.

@emillon emillon force-pushed the rebuild-module-aliases branch from 2aac534 to db61586 Compare April 17, 2018 09:53
@emillon
Copy link
Collaborator Author

emillon commented Apr 17, 2018

I believe that I took care of your remarks. Good call about the intermediate file, it's indeed not necessary if this data goes through merge_files_dyn.

The symptoms are a bit different depending on the presence of an
interface file.
@emillon emillon force-pushed the rebuild-module-aliases branch from db61586 to 9191606 Compare April 17, 2018 11:56
@ghost
Copy link

ghost commented Apr 17, 2018

Ok, I think we should indeed add an helper function is_alias_module, this would express the intent a bit more clearly. read_deps_files can be inlined now that it is used only once. Apart from that this PR looks ready to merge.

emillon added 3 commits April 17, 2018 15:33
This uses two different extensions:

- `.d` corresponds to the raw `ocamldep` output.
- `.all-deps` corresponds to this output, merged with the dependencies
of all the interfaces mentioned in the earlier.

This also means that `.all-deps` files will contain output from multiple
files.
@emillon emillon force-pushed the rebuild-module-aliases branch from 9191606 to d282564 Compare April 17, 2018 13:52
@emillon
Copy link
Collaborator Author

emillon commented Apr 17, 2018

Done. Thanks for your comments!

@ghost ghost merged commit 5228b43 into ocaml:master Apr 17, 2018
@ghost
Copy link

ghost commented Apr 17, 2018

No problem, thanks for your contribution!

@ghost ghost mentioned this pull request May 21, 2018
@emillon emillon deleted the rebuild-module-aliases branch July 25, 2023 09:22
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inconsistent assumptions when using module aliases
3 participants