Bisect [WIP] #616

rgrinberg · 2018-03-15T04:54:53Z

This is my bisect WIP branch. It's not yet in working in order, but it is preprocessing things with bisect. I made the PR to get the ball rolling on this feature.

There's a couple of questions to answer here:

What's the correct way to add preprocessors to the instrumented code. The way I have it now will not work action preprocessors. Although I'm not even sure how that would work at all.
How to guarantee that the bisect preprocessor runs first.
How to clean up the context definitions. The way I have it now is a bit awkward because I can't figure out how to parse the derived contexts without 2 passes.

cc @aantron

rgrinberg · 2018-03-15T14:04:06Z

Btw, I think we'll need to remove this check as well:

  begin match ectx.context with
   | None
   | Some { Context.for_host = None; _ } -> ()
   | Some ({ Context.for_host = Some host; _ } as target) ->
     let invalid_prefix prefix =
       match Path.descendant prog ~of_:(Path.of_string prefix) with
       | None -> ()
       | Some _ ->
         die "Context %s has a host %s.@.It's not possible to execute binary %a \
              in it.@.@.This is a bug and should be reported upstream."
           target.name host.name Path.pp prog in
     invalid_prefix ("_build/" ^ target.name);
     invalid_prefix ("_build/install/" ^ target.name);
  end;

Since we are going to allow the instrumented context to run the tests in this case.

ghost · 2018-03-20T00:30:50Z

Following a discussion on slack about this: it seems to me that the best way to make sure bisect_ppx runs first and works with action preprocessors is to apply bisect_ppx as a separate pass, just as we do for reason. Especially given that the internal details of ppx drivers are a bit complicated, not merging bisect_ppx with the other rewriters will make the overall feature easier to understand and debug.

Regarding the use of derived contexts: I'm a bit worried that this going to be a bit too complicated in the end. Currently, for cross-compilation we resolve all the binaries in the host context, but in the case of coverage, we will need to resolve only a subset of the binaries in the host context and it's not clear which ones. For instance in general you might want to use non-instrumented ppx rewriters for speed while in some other cases you might want to use instrumented ppx rewriters to measure the coverage of the ppx rewriter itself. In the end it just seems simpler to use the same build context, at least it's an easy to understand model. We can add derived build context later, but we need to clearly define their semantic first.

Regarding the user interface, as discussed it seems better to enable it from the command line. I suggest to add a variable bisect.dirs in jbuilder, whose default value is () and can use the special value :all to represent all the directories. So to enable bisect:

only on src: jbuilder build --set bisect.dirs '(src)'
on everything but bin: jbuilder build --set bisect.dirs '(:all \ bin)'

rgrinberg · 2018-03-22T12:06:42Z

For instance in general you might want to use non-instrumented ppx rewriters for speed while in some other cases you might want to use instrumented ppx rewriters to measure the coverage of the ppx rewriter itself.

It should be easy to detect which ppx's we want to instrument - it's just the ones that are defined within the workspace.

We can add derived build context later, but we need to clearly define their semantic first.

Okay, sure. But can we still setup a parallel workspace for the bisecting? Having a bisect blow out your entire build cache for normal build is really pesky.

ghost · 2018-03-22T12:26:39Z

Okay, sure. But can we still setup a parallel workspace for the bisecting? Having a bisect blow out your entire build cache for normal build is really pesky.

I suppose, but what would be the behavior? For instance would this command build and run both normal and instrumented tests?

$ jbuilder runtest --set bisect.dirs :all

Having a bisect blow out your entire build cache for normal build is really pesky.

At some point we should add a global artifact cache to Dune. At which point this won't be too much of a problem: deleting _build and rebuilding everything will be fast.

rgrinberg · 2018-03-22T14:27:23Z

Jérémie Dimino <notifications@github.com> writes:

I suppose, but what would be the behavior? For instance would this command build and run both normal and instrumented tests? ``` $ jbuilder runtest --set bisect.dirs :all ```

Maybe have a --context argument to disambiguate this? I think it's already a bit of an issue that tests run against every context without a way to control this btw.

At some point we should add a global artifact cache to Dune. At which point this won't be too much of a problem: deleting `_build` and rebuilding everything will be fast.

Okay, great to see that this is indeed coming. This should solve many other problems as well. I'm already with doing things the stupid way for now. We can leave all the improvements for v2 once we get some user feedback.

ghost · 2018-03-22T14:50:08Z

Maybe have a --context argument to disambiguate this? I think it's
already a bit of an issue that tests run against every context without a
way to control this btw.

Well, you can do jbuilder runtest _build/x. But yh, independently of this PR, a --context argument makes sense.

Basically there are two reasons why I'm unsure about using an extra context for bisect by default:

I'm not sure how intuitive this is in general. i.e. I expect that among users of bisect_ppx, many more users will know about bisect_ppx rather than build contexts
every build context duplicate the world and at some point this might have a non-negligible impact on memory usage. So I'm slightly worried about creating more contexts implicitly

rgrinberg · 2018-04-08T21:55:52Z

I'm not sure how intuitive this is in general. i.e. I expect that among users of bisect_ppx, many more users will know about bisect_ppx rather than build contexts

This is isn't a serious issue as the --bisect argument will take care of all these details (just like -x takes care of it for cross compilation). Nevertheless, I think that having the global artifact cache validates my number one concern, hence we can drop the multiple.

However, we still need a way to retain some workspace configuration. So we'll retain options such as covered_libs, etc. Do you agree with that?

rgrinberg · 2018-04-09T07:54:48Z

@diml reflecting on your comment about bisect_ppx should really act like our reason support makes me question our initial implementation of the reason syntax. refmt seems to act just like an action preprocessor with the exception is that it can also emit the marshalled parsetree as output. If we just had a way to properly compose (here this would mean sequence) action and pps preprocessing stacks then we wouldn't really need to hardcode the Reason syntax everywhere. As you said about bisect, the only way the reason pp is special is that it must come before the other ppx's. I was also never sure about why we needed those *.ml*.re* temporary files in the first place, as I thought it should be possibly to run the entire preprocessing pipeline through stdin/stdout without these temporary files. This seems cleaner to me.

The reason support also includes an additional component of allowing for .re and .rei` extensions. We'd still need to hardcode to accept those as source files and setup preprocessing rules for those automatically, but this is a far smaller "hack".

ghost · 2018-04-10T18:46:26Z

However, we still need a way to retain some workspace configuration. So we'll retain options such as covered_libs, etc. Do you agree with that?

Yes. However, shouldn't it be covered_dirs? Selecting directories to cover seems more general that selecting libraries.

Regarding reason, having the build system setup real pipes is a bit complicated. I'm not sure how to make it work. For instance, let's say you want to execute a | b | c | d. Technically this requires 4 jobs, so what do you do if you are running with -j 2? Though, we could implement pipes using temporary files (#428). At least we would have less rules and less persistent files in _build. We could also simply pass refmt as a -pp to the ppx driver. FTR, the OCaml compiler creates temporary files for both -pp and -ppx arguments.

Note that technically, for reason we shouldn't even need a separate process. We could just plug the reason parser into the driver directly, that would make things even simpler. Thought that's more projects to modify.

rgrinberg · 2018-04-16T06:46:33Z

Yes. However, shouldn't it be covered_dirs? Selecting directories to cover seems more general that selecting libraries.

Yeah, just typed the wrong thing. I agree that selecting things by directory is better.

Regarding reason, having the build system setup real pipes is a bit complicated. I'm not sure how to make it work. For instance, let's say you want to execute a | b | c | d. Technically this requires 4 jobs, so what do you do if you are running with -j 2

Hmm, the pp stack will be run serially so I don't see why the -j flag makes a difference. I don't see any opportunities for parallelization so it would be equivalent to -j1.

Though, we could implement pipes using temporary files (#428). At least we would have less rules and less persistent files in _build. We could also simply pass refmt as a -pp to the ppx driver. FTR, the OCaml compiler creates temporary files for both -pp and -ppx arguments.

Yeah, seems like that would be necessary for preprocessors that accept input or output through filenames. But I was thinking, for preprocessors that work via stdin -> stdout, we could avoid all the temporary files altogether. refmt works that way for example, so all we'd need is omp drivers to accept stdin output and we'd have in-memory preprocessing pipeline.

Note that technically, for reason we shouldn't even need a separate process. We could just plug the reason parser into the driver directly, that would make things even simpler. Thought that's more projects to modify.

Yeah, we'll need to modify both reason and omp. This would impose a lot of constraints on people.

ghost · 2018-04-16T08:12:52Z

Hmm, the pp stack will be run serially so I don't see why the -j flag makes a difference. I don't see any opportunities for parallelization so it would be equivalent to -j1.

If you use real pipes, then to execute a | b you need to execute a and b in parallel, there is no choice.

Yeah, seems like that would be necessary for preprocessors that accept input or output through filenames.

To be clear, I was thinking about implementing a | b as follow:

a > /tmp/XXX
b < /tmp/XXX

rgrinberg · 2018-04-18T06:50:58Z

If you use real pipes, then to execute a | b you need to execute a and b in parallel, there is no choice.

Ok, I see what you mean. My mental model of -j was a bit wrong. I thought that -j refered to a build job and such a job can have more than 1 process.

To be clear, I was thinking about implementing a | b as follow:

That would be fine for a first pass. The only issues are performance related and we can also optimize those.

Ok, so how about implementing bisect but limiting it to code that uses ppx until we can define more complex preprocessor stacks. This should be backwards compatible, and is actually a restriction that works just fine in practice. As it's currently impossible to use bisect with action preprocessors anyhow. The changes we're talking about are desirable but also fairly large, there's no need to delay this extremely useful feature to users meanwhile.

ghost · 2018-04-18T11:21:35Z

That seems fine for now indeed

rgrinberg · 2018-05-03T05:50:15Z

There's still a detail that remains a little awkward to handle if we treat bisect_ppx as a special preprocessor. It's that we don't have such a convenient way of handling its runtime dependencies. By modifying the executable or library stanzas, we get this functionality for free. I suppose runtime deps is the key distinction between the reason preprocessor and bisect.

So I guess we'll still a bit of a combined approach for a v1.

rgrinberg · 2019-02-17T11:09:50Z

The proposed support is going to be far simpler than this original PR. Therefore, I'm closing this issue for now.

rgrinberg added 2 commits March 15, 2018 12:43

Add bisect field to libraries

7f35a02

bisect WIP

b48af8f

rgrinberg requested a review from a user March 15, 2018 04:54

ghost mentioned this pull request May 4, 2018

Environment & build profiles #419

Merged

aantron mentioned this pull request May 4, 2018

thousands of bisect files generated by odig odoc ocaml/odoc#138

Closed

rgrinberg closed this Feb 17, 2019

aantron mentioned this pull request Aug 6, 2019

Figure out the Dune integration aantron/bisect_ppx#203

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bisect [WIP] #616

Bisect [WIP] #616

rgrinberg commented Mar 15, 2018

rgrinberg commented Mar 15, 2018

ghost commented Mar 20, 2018 •

edited by ghost

Loading

rgrinberg commented Mar 22, 2018

ghost commented Mar 22, 2018

rgrinberg commented Mar 22, 2018 via email

ghost commented Mar 22, 2018 •

edited by ghost

Loading

rgrinberg commented Apr 8, 2018

rgrinberg commented Apr 9, 2018

ghost commented Apr 10, 2018 •

edited by ghost

Loading

rgrinberg commented Apr 16, 2018

ghost commented Apr 16, 2018

rgrinberg commented Apr 18, 2018

ghost commented Apr 18, 2018

rgrinberg commented May 3, 2018

rgrinberg commented Feb 17, 2019

Bisect [WIP] #616

Bisect [WIP] #616

Conversation

rgrinberg commented Mar 15, 2018

rgrinberg commented Mar 15, 2018

ghost commented Mar 20, 2018 • edited by ghost Loading

rgrinberg commented Mar 22, 2018

ghost commented Mar 22, 2018

rgrinberg commented Mar 22, 2018 via email

ghost commented Mar 22, 2018 • edited by ghost Loading

rgrinberg commented Apr 8, 2018

rgrinberg commented Apr 9, 2018

ghost commented Apr 10, 2018 • edited by ghost Loading

rgrinberg commented Apr 16, 2018

ghost commented Apr 16, 2018

rgrinberg commented Apr 18, 2018

ghost commented Apr 18, 2018

rgrinberg commented May 3, 2018

rgrinberg commented Feb 17, 2019

ghost commented Mar 20, 2018 •

edited by ghost

Loading

ghost commented Mar 22, 2018 •

edited by ghost

Loading

ghost commented Apr 10, 2018 •

edited by ghost

Loading