Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bisect [WIP] #616

Closed
wants to merge 2 commits into from
Closed

Bisect [WIP] #616

wants to merge 2 commits into from

Conversation

rgrinberg
Copy link
Member

This is my bisect WIP branch. It's not yet in working in order, but it is preprocessing things with bisect. I made the PR to get the ball rolling on this feature.

There's a couple of questions to answer here:

  • What's the correct way to add preprocessors to the instrumented code. The way I have it now will not work action preprocessors. Although I'm not even sure how that would work at all.

  • How to guarantee that the bisect preprocessor runs first.

  • How to clean up the context definitions. The way I have it now is a bit awkward because I can't figure out how to parse the derived contexts without 2 passes.

cc @aantron

@rgrinberg rgrinberg requested a review from a user March 15, 2018 04:54
@rgrinberg
Copy link
Member Author

Btw, I think we'll need to remove this check as well:

  begin match ectx.context with
   | None
   | Some { Context.for_host = None; _ } -> ()
   | Some ({ Context.for_host = Some host; _ } as target) ->
     let invalid_prefix prefix =
       match Path.descendant prog ~of_:(Path.of_string prefix) with
       | None -> ()
       | Some _ ->
         die "Context %s has a host %s.@.It's not possible to execute binary %a \
              in it.@.@.This is a bug and should be reported upstream."
           target.name host.name Path.pp prog in
     invalid_prefix ("_build/" ^ target.name);
     invalid_prefix ("_build/install/" ^ target.name);
  end;

Since we are going to allow the instrumented context to run the tests in this case.

@ghost
Copy link

ghost commented Mar 20, 2018

Following a discussion on slack about this: it seems to me that the best way to make sure bisect_ppx runs first and works with action preprocessors is to apply bisect_ppx as a separate pass, just as we do for reason. Especially given that the internal details of ppx drivers are a bit complicated, not merging bisect_ppx with the other rewriters will make the overall feature easier to understand and debug.

Regarding the use of derived contexts: I'm a bit worried that this going to be a bit too complicated in the end. Currently, for cross-compilation we resolve all the binaries in the host context, but in the case of coverage, we will need to resolve only a subset of the binaries in the host context and it's not clear which ones. For instance in general you might want to use non-instrumented ppx rewriters for speed while in some other cases you might want to use instrumented ppx rewriters to measure the coverage of the ppx rewriter itself. In the end it just seems simpler to use the same build context, at least it's an easy to understand model. We can add derived build context later, but we need to clearly define their semantic first.

Regarding the user interface, as discussed it seems better to enable it from the command line. I suggest to add a variable bisect.dirs in jbuilder, whose default value is () and can use the special value :all to represent all the directories. So to enable bisect:

  • only on src: jbuilder build --set bisect.dirs '(src)'
  • on everything but bin: jbuilder build --set bisect.dirs '(:all \ bin)'

@rgrinberg
Copy link
Member Author

For instance in general you might want to use non-instrumented ppx rewriters for speed while in some other cases you might want to use instrumented ppx rewriters to measure the coverage of the ppx rewriter itself.

It should be easy to detect which ppx's we want to instrument - it's just the ones that are defined within the workspace.

We can add derived build context later, but we need to clearly define their semantic first.

Okay, sure. But can we still setup a parallel workspace for the bisecting? Having a bisect blow out your entire build cache for normal build is really pesky.

@ghost
Copy link

ghost commented Mar 22, 2018

Okay, sure. But can we still setup a parallel workspace for the bisecting? Having a bisect blow out your entire build cache for normal build is really pesky.

I suppose, but what would be the behavior? For instance would this command build and run both normal and instrumented tests?

$ jbuilder runtest --set bisect.dirs :all

Having a bisect blow out your entire build cache for normal build is really pesky.

At some point we should add a global artifact cache to Dune. At which point this won't be too much of a problem: deleting _build and rebuilding everything will be fast.

@rgrinberg
Copy link
Member Author

rgrinberg commented Mar 22, 2018 via email

@ghost
Copy link

ghost commented Mar 22, 2018

Maybe have a --context argument to disambiguate this? I think it's
already a bit of an issue that tests run against every context without a
way to control this btw.

Well, you can do jbuilder runtest _build/x. But yh, independently of this PR, a --context argument makes sense.

Basically there are two reasons why I'm unsure about using an extra context for bisect by default:

  • I'm not sure how intuitive this is in general. i.e. I expect that among users of bisect_ppx, many more users will know about bisect_ppx rather than build contexts
  • every build context duplicate the world and at some point this might have a non-negligible impact on memory usage. So I'm slightly worried about creating more contexts implicitly

@rgrinberg
Copy link
Member Author

I'm not sure how intuitive this is in general. i.e. I expect that among users of bisect_ppx, many more users will know about bisect_ppx rather than build contexts

This is isn't a serious issue as the --bisect argument will take care of all these details (just like -x takes care of it for cross compilation). Nevertheless, I think that having the global artifact cache validates my number one concern, hence we can drop the multiple.

However, we still need a way to retain some workspace configuration. So we'll retain options such as covered_libs, etc. Do you agree with that?

@rgrinberg
Copy link
Member Author

@diml reflecting on your comment about bisect_ppx should really act like our reason support makes me question our initial implementation of the reason syntax. refmt seems to act just like an action preprocessor with the exception is that it can also emit the marshalled parsetree as output. If we just had a way to properly compose (here this would mean sequence) action and pps preprocessing stacks then we wouldn't really need to hardcode the Reason syntax everywhere. As you said about bisect, the only way the reason pp is special is that it must come before the other ppx's. I was also never sure about why we needed those *.ml*.re* temporary files in the first place, as I thought it should be possibly to run the entire preprocessing pipeline through stdin/stdout without these temporary files. This seems cleaner to me.

The reason support also includes an additional component of allowing for .re and .rei` extensions. We'd still need to hardcode to accept those as source files and setup preprocessing rules for those automatically, but this is a far smaller "hack".

@ghost
Copy link

ghost commented Apr 10, 2018

However, we still need a way to retain some workspace configuration. So we'll retain options such as covered_libs, etc. Do you agree with that?

Yes. However, shouldn't it be covered_dirs? Selecting directories to cover seems more general that selecting libraries.

Regarding reason, having the build system setup real pipes is a bit complicated. I'm not sure how to make it work. For instance, let's say you want to execute a | b | c | d. Technically this requires 4 jobs, so what do you do if you are running with -j 2? Though, we could implement pipes using temporary files (#428). At least we would have less rules and less persistent files in _build. We could also simply pass refmt as a -pp to the ppx driver. FTR, the OCaml compiler creates temporary files for both -pp and -ppx arguments.

Note that technically, for reason we shouldn't even need a separate process. We could just plug the reason parser into the driver directly, that would make things even simpler. Thought that's more projects to modify.

@rgrinberg
Copy link
Member Author

Yes. However, shouldn't it be covered_dirs? Selecting directories to cover seems more general that selecting libraries.

Yeah, just typed the wrong thing. I agree that selecting things by directory is better.

Regarding reason, having the build system setup real pipes is a bit complicated. I'm not sure how to make it work. For instance, let's say you want to execute a | b | c | d. Technically this requires 4 jobs, so what do you do if you are running with -j 2

Hmm, the pp stack will be run serially so I don't see why the -j flag makes a difference. I don't see any opportunities for parallelization so it would be equivalent to -j1.

Though, we could implement pipes using temporary files (#428). At least we would have less rules and less persistent files in _build. We could also simply pass refmt as a -pp to the ppx driver. FTR, the OCaml compiler creates temporary files for both -pp and -ppx arguments.

Yeah, seems like that would be necessary for preprocessors that accept input or output through filenames. But I was thinking, for preprocessors that work via stdin -> stdout, we could avoid all the temporary files altogether. refmt works that way for example, so all we'd need is omp drivers to accept stdin output and we'd have in-memory preprocessing pipeline.

Note that technically, for reason we shouldn't even need a separate process. We could just plug the reason parser into the driver directly, that would make things even simpler. Thought that's more projects to modify.

Yeah, we'll need to modify both reason and omp. This would impose a lot of constraints on people.

@ghost
Copy link

ghost commented Apr 16, 2018

Hmm, the pp stack will be run serially so I don't see why the -j flag makes a difference. I don't see any opportunities for parallelization so it would be equivalent to -j1.

If you use real pipes, then to execute a | b you need to execute a and b in parallel, there is no choice.

Yeah, seems like that would be necessary for preprocessors that accept input or output through filenames.

To be clear, I was thinking about implementing a | b as follow:

a > /tmp/XXX
b < /tmp/XXX

@rgrinberg
Copy link
Member Author

If you use real pipes, then to execute a | b you need to execute a and b in parallel, there is no choice.

Ok, I see what you mean. My mental model of -j was a bit wrong. I thought that -j refered to a build job and such a job can have more than 1 process.

To be clear, I was thinking about implementing a | b as follow:

That would be fine for a first pass. The only issues are performance related and we can also optimize those.

Ok, so how about implementing bisect but limiting it to code that uses ppx until we can define more complex preprocessor stacks. This should be backwards compatible, and is actually a restriction that works just fine in practice. As it's currently impossible to use bisect with action preprocessors anyhow. The changes we're talking about are desirable but also fairly large, there's no need to delay this extremely useful feature to users meanwhile.

@ghost
Copy link

ghost commented Apr 18, 2018

That seems fine for now indeed

@rgrinberg
Copy link
Member Author

There's still a detail that remains a little awkward to handle if we treat bisect_ppx as a special preprocessor. It's that we don't have such a convenient way of handling its runtime dependencies. By modifying the executable or library stanzas, we get this functionality for free. I suppose runtime deps is the key distinction between the reason preprocessor and bisect.

So I guess we'll still a bit of a combined approach for a v1.

@rgrinberg
Copy link
Member Author

The proposed support is going to be far simpler than this original PR. Therefore, I'm closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant