Skip to content
This repository has been archived by the owner on Oct 29, 2022. It is now read-only.

download operation creates unnecessary noise in benchmarking results #1

Closed
ejgallego opened this issue Mar 24, 2017 · 21 comments
Closed
Milestone

Comments

@ejgallego
Copy link
Member

Unless I am missing something, using opam install is not suitable for benchmarking. Among other things, it will call constraint solvers, try to download packages, and thus could introduce significant differences on timing.

@maximedenes, what do you think?

@ghost
Copy link

ghost commented Mar 24, 2017

That and other things:

  • at the moment, there is no way how I could exclude the time it takes to download the package from the measurements

The benefits (handling of dependencies) still, even in this state, seem to outweight the drawbacks.

At the moment, we are evaluating the level of noise. By that I mean that for the chosen branch what I do is I check the results we get if we compile all packages once, or twice, or thrice, or four times. That will give us the idea how many iterations are enough (wrt. to the desired precision).

Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.

Also, the plan is to look at the implementation of the opam command and add subcommands that we need. What I need (and I do not yet have) is the ability to tell opam just to download the stuff and do nothing else.

@ghost ghost closed this as completed Mar 24, 2017
@ghost ghost reopened this Mar 24, 2017
@ghost
Copy link

ghost commented Mar 24, 2017

opam already provides one thing that is essential ... before we start benchmarking of the chosen package, we are able to install all its dependencies and thus exclude the compilation of those dependencies from the benchmarking.

@ejgallego ejgallego changed the title Opam install not suitable for installs Opam install not fully suitable for benchamarking. Mar 24, 2017
@ejgallego ejgallego changed the title Opam install not fully suitable for benchamarking. Opam install not fully suitable for benchmarking. Mar 24, 2017
@ghost
Copy link

ghost commented Mar 24, 2017

The better counterargument against relying on opam is, I think, the fact that:

  • I must create a fake OPAM repository
  • to which I need to generate Coq OPAM packages that represents exactly those commits we want to measure

The implementation, in that respect, isn't particularly appealing. I am not very happy about it.

@ejgallego
Copy link
Member Author

Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.

That is not true in my experience. Also, in your iteration scheme, opam caching will quick in, distorting the results based on the size of download.

@maximedenes
Copy link
Member

maximedenes commented Mar 24, 2017

I agree it is not so clear if the gains outweight the drawbacks. Maybe we could try a hybrid approach, like using opam to install dependencies, but do the build to be benchmarked manually. This way, we would not need to create a fake package, we would have more control on what we measure, but could still rely on OPAM to solve dependencies.

@ejgallego
Copy link
Member Author

We could certainly share the build recipes in dev/ci, I agree that using opam for dependencies is OK.

@ejgallego
Copy link
Member Author

Futhermore, we should ideally produce time reports by file. Most projects use coq_makefile so that should be doable.

@ejgallego
Copy link
Member Author

Changes in time by .v file.

@ghost
Copy link

ghost commented Mar 24, 2017

That is a good idea.

@ejgallego
Copy link
Member Author

In the future we will be able to track even changes by section, and sentence (sercomp will support that).

@ghost
Copy link

ghost commented Mar 25, 2017

I've added your ideas to the TODO list.

@ghost ghost closed this as completed Mar 25, 2017
@ejgallego
Copy link
Member Author

I would rather have the issue closed when it is actually solved (hint, other people interested can search, collaborate and comment) but I guess it is your call.

@ejgallego
Copy link
Member Author

It seems to me indeed that a feasible approach would be to teach coq_makefile to instrument developments for detailed bechmarking. cf coq/coq#406

In the end, we may want to use python, R, or some other higher-level language thou if we are going to produce so much data. bash is just not gonna cut it IMHO.

@ghost
Copy link

ghost commented Mar 28, 2017

@ejgallego To decrease the confusion a little bit, do you think that you can reiterate points that led you to conclude:

Opam install not fully suitable for benchmarking.

Several things (related/unrelated) were mentioned and even the original focus shifted a little bit. So restatement might help us, I think.

@ghost
Copy link

ghost commented Mar 28, 2017

The way how I originally planned to address this problem is to fix this issue at the root so that we can keep the benefits opam provides.

@ejgallego
Copy link
Member Author

That looks good, given the scope of the benchmarking it may make sense for now.

However, if/when we want to do fancier tricks such as calling coq_makefile --benchmark we will either have to modify opam, or think some kind of trick, (maybe and env variable?)

So let's keep an eye on it.

we can keep the benefits opam provides.

I am not so convinced yet that opam provides many benefits for testing bleeding edge code. A good example is iris, they had to roll out their own package system as opam had not the proper granularity cf https://gitlab.mpi-sws.org/FP/iris-coq/issues/83

@ghost ghost added the kind: bug label Apr 4, 2017
@ghost ghost changed the title Opam install not fully suitable for benchmarking. download operation creates unnecessary noise in benchmarking results Apr 6, 2017
@ghost
Copy link

ghost commented Apr 6, 2017

We should make sure that the "download time" does not influence benchmarking of "build time".

@ejgallego
Copy link
Member Author

ejgallego commented Apr 6, 2017

Be aware that when using opam there are other aspects beyond download time that may influence the timing, hence the original title was more accurate.

@ejgallego
Copy link
Member Author

This is still an issue, and a serious one it seems

@ejgallego ejgallego reopened this Dec 13, 2018
@ejgallego ejgallego added this to the 8.9+shell milestone Dec 13, 2018
@JasonGross
Copy link
Member

Note that if you call make TIMED=1 --output-sync on a coq-makefile-made Makefile, the python scripts I have can process the logs of the resulting build and display per-file timing diffs as well as whole-project timing diffs. (We unfortunately don't get CPU cycles nor mem faults, etc.)

@ppedrot
Copy link
Member

ppedrot commented Dec 7, 2019

FWIW I'm also using a script to filter the HTML output of the bench, which sets the TIMED variable and allows to retrieve a per-line diff. It's essentially the same as the one provided in the repo, except that it's quasi-linear in the number of lines. (The one from the bench is cubic IIRC.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants