Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple phases of PPX translation #1135

Closed
zepalmer opened this issue Aug 14, 2018 · 6 comments
Closed

Support multiple phases of PPX translation #1135

zepalmer opened this issue Aug 14, 2018 · 6 comments

Comments

@zepalmer
Copy link

zepalmer commented Aug 14, 2018

Motivation

I'm working on a PPX extension that performs a pretty broad code transformation on a module. The preprocessor needs to know things like which variables are bound at certain points of the code; it traverses the AST to determine this. One precondition, however, is that any other extensions which may modify the bound variable set have already been processed.

I've been using dune for this project and have found it to be a hefty improvement over my old hackish Oasis build environment. However, it doesn't seem to be possible for me to enforce any kind of ordering on the PPX extensions. As the Dune documentation clearly states, Dune takes a set of PPX rewriter libraries, compiles them together into a single executable, runs that, and then compiles the result. This is sensible for most cases, but some projects (and rewriters) have somewhat more delicate requirements.

Dune already includes a staged_pps option, but this limits preprocessing to two particular phases and appears to be intended specifically for preprocessors like ppx_import (#193).

Proposal

I'd like to suggest an alternative which is simply an iterative version of the existing process. The syntax would go something like this:

(preprocess
  (phased_pps
    (ppx1 ppx2)
    (ppx3 ppx4)
    (ppx1)
  )
)

Here, each group of PPX libraries is compiled into a single executable. That executable is the run against the code; then, the next executable is constructed. In this example, we would build three executables: one which combines PPX libraries ppx1 and ppx2, one which combines PPX libraries ppx3 and ppx4, and one which just uses PPX library ppx1. The code would then be passed through each executable in turn to produce the final result. (In this case, one of the libraries -- such as ppx4 -- could even generate code which is expected to be processed using another library -- such as ppx1 -- without doing anything special.)

In my scenario, for instance, my PPX rewriter could be specified as a final phase, after common tools like ppx_deriving have generated bindings, so I can correctly ascertain which variables are bound in a given AST node and which are free.

Advantages

The phased_pps mode still uses the PPX library API that Dune currently uses. This preprocessing mode would allow for more sophisticated PPX rewriter configurations and would also support PPX rewriters with more delicate invariants. From an outsider's perspective, it seems like it wouldn't be too difficult to add to Dune -- it would just require iterating over the routine that handles the pps mode -- though I'm not sure what assumptions the existing code makes or how easily abstracted it is.

Disadvantages

Implementation is always work. Projects using phased_pps would compile slower due to the multiple PPX passes (though this is an inevitable cost of multi-pass preprocessing over the AST).

Summary

I'm proposing an alternative preprocessing mode which would do what pps currently does, but multiple times with different configurations. This would deliver a lot of the power of the traditional -ppx flag (as it would allow multiple, ordered transformation passes) but retain the advantages of Dune's current approach (as configuration can specify simultaneously executing some transformations, reducing the number of necessary tree walks).

Thank you!

@ghost
Copy link

ghost commented Aug 15, 2018

We have been using ppx rewriters at Jane Street for several years, and one thing we found quite early is that composing ppx rewriters at the level of processes doesn't work well in practice. We developed ppxlib which implements a clean composition semantic for ppx rewriters. It seems that ppxlib is a better place to do what you are trying to do.

@zepalmer
Copy link
Author

Thanks for the thoughts! I read over the documentation for ppxlib and, if I understand correctly, it provides a sort of featureful replacement for the PPX driver from Migrate_parsetree.Driver. I'm afraid I'm a bit ignorant of how all of these tools work together, but I looked over the interface for ppxlib and do not see anything that would allow me to enforce an ordering on transformations. The primary entry point appears to be Driver.register_transformation which, similar Migrate_parsetree.Driver.register, allows me to provide some transformation code which will be run on my behalf at some point in the future. Is there some mechanism that I'm missing that allows me to ensure that one transformation is run before another?

As far as I can tell, both drivers want to assume that all of my PPX transformations are commutative; is that correct? The trouble I'm having is that my transformation does not commute with the others in use in this project, so transformation ordering becomes important. If my read of the drivers is correct, it seems the only ways to ensure an order of transformation would be to (1) use multiple drivers, each of which performs a collection of commutative transformations, or (2) write my own driver which composes and executes the PPX transformations in the way that I see fit. I proposed the extension in this issue to take the first approach as I suspect that other projects might benefit from the ability to do so. If there's an option 3, I'm eager to hear about it. :)

Thanks again for the feedback. I learn a lot from searching the issues list of the OCaml GitHub repositories!

@ghost
Copy link

ghost commented Aug 15, 2018

In ppxlib the idea is the following: each individual rewriter registers a function to expand a given extension point, and the rewriting is then done in one pass by the Context_free module which operates in a top-down manner: when an extension point is found, it is expanded with the corresponding expander, then the result is rewritten recursively. In particular this allows a ppx rewriter to produce extension points that will be rewritten by another rewriter, without the user having to specify these rewriters in a particular order. I wrote a blog post about this. I imagine that whatever you are doing could be done in this pass as well. However, this will of course not work in combination with ppx rewriters that don't use ppxlib, which might be an issue.

One thing you could do is modify either ppxlib or ocaml-migrate-parsetree to allow registering passes that are guaranteed to be executed before or after all other ppx rewriters. If you want to do it in a way that will allow your extension to be used in conjunction with as many other ppx rewriters as possible, you should modify ocaml-migrate-parsetree.

@zepalmer
Copy link
Author

Thanks for the clarification! Top-down rewriting and ppxlib make sense for commutative extensions. Configurations which require explicit ordering of mostly commutative items are problematic in that they are misleading and brittle and, for most basic uses of PPX extensions, I'm happy to see things working the way they are.

It sounds to me that you are suggesting what I call option 2 above: writing my own driver which explicitly performs the PPX transformations in the fashion that I need. There are some downsides to this approach, though.

  • Invoking the PPX extensions is somewhat harder than listing them in a configuration file.
  • Modifying the applied extensions requires modifying code, so my build configuration lives in multiple places (in the Dune file as well as the custom PPX driver).
  • This approach doesn't easily abstract to future non-commutative PPX projects.

This is why I was hoping I could make use of the code that Dune already has for building and executing PPX extensions. If Dune included the above stanza, it would keep my configuration in one place, make it easy to add or remove extensions, and would work for complex preprocessing tools other than my own. That said, I understand if this isn't a development priority. :)

@ghost
Copy link

ghost commented Aug 16, 2018

I'm suggesting to submit a PR to ocaml-migrate-parsetree to allow registering an extension that will be guaranteed to be executed before all other ones.

It is strictly better to do this change in the ocaml-migrate-parsetree driver rather than in dune, for the following reasons:

  • it will yield faster build times
  • the fact that your ppx needs to be applied before the other ones is an implementation detail that shouldn't leak to the user. By doing this in the driver you are preventing this low-level detail to be exposed to the user

Additionally, ocaml-migrate-parsetree is now the common denominator of all modern ppx rewriters and the ones that don't use it yet cannot be used with dune. So there really is nothing to gain by doing this in dune rather than in ocaml-migrate-parsetree.

@rgrinberg
Copy link
Member

Jeremie's argument was convincing here.

@rgrinberg rgrinberg closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants