download operation creates unnecessary noise in benchmarking results #1

ejgallego · 2017-03-24T14:07:28Z

Unless I am missing something, using opam install is not suitable for benchmarking. Among other things, it will call constraint solvers, try to download packages, and thus could introduce significant differences on timing.

@maximedenes, what do you think?

The text was updated successfully, but these errors were encountered:

ghost · 2017-03-24T14:26:02Z

That and other things:

at the moment, there is no way how I could exclude the time it takes to download the package from the measurements

The benefits (handling of dependencies) still, even in this state, seem to outweight the drawbacks.

At the moment, we are evaluating the level of noise. By that I mean that for the chosen branch what I do is I check the results we get if we compile all packages once, or twice, or thrice, or four times. That will give us the idea how many iterations are enough (wrt. to the desired precision).

Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.

Also, the plan is to look at the implementation of the opam command and add subcommands that we need. What I need (and I do not yet have) is the ability to tell opam just to download the stuff and do nothing else.

ghost · 2017-03-24T14:27:41Z

opam already provides one thing that is essential ... before we start benchmarking of the chosen package, we are able to install all its dependencies and thus exclude the compilation of those dependencies from the benchmarking.

ghost · 2017-03-24T14:33:37Z

The better counterargument against relying on opam is, I think, the fact that:

I must create a fake OPAM repository
to which I need to generate Coq OPAM packages that represents exactly those commits we want to measure

The implementation, in that respect, isn't particularly appealing. I am not very happy about it.

ejgallego · 2017-03-24T14:36:14Z

Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.

That is not true in my experience. Also, in your iteration scheme, opam caching will quick in, distorting the results based on the size of download.

maximedenes · 2017-03-24T14:36:27Z

I agree it is not so clear if the gains outweight the drawbacks. Maybe we could try a hybrid approach, like using opam to install dependencies, but do the build to be benchmarked manually. This way, we would not need to create a fake package, we would have more control on what we measure, but could still rely on OPAM to solve dependencies.

ejgallego · 2017-03-24T14:38:08Z

We could certainly share the build recipes in dev/ci, I agree that using opam for dependencies is OK.

ejgallego · 2017-03-24T18:10:45Z

Futhermore, we should ideally produce time reports by file. Most projects use coq_makefile so that should be doable.

ejgallego · 2017-03-24T18:16:11Z

Changes in time by .v file.

ghost · 2017-03-24T18:16:17Z

That is a good idea.

ejgallego · 2017-03-24T18:17:04Z

In the future we will be able to track even changes by section, and sentence (sercomp will support that).

ghost · 2017-03-25T13:57:05Z

I've added your ideas to the TODO list.

ejgallego · 2017-03-25T15:27:01Z

I would rather have the issue closed when it is actually solved (hint, other people interested can search, collaborate and comment) but I guess it is your call.

ejgallego · 2017-03-25T23:56:47Z

It seems to me indeed that a feasible approach would be to teach coq_makefile to instrument developments for detailed bechmarking. cf coq/coq#406

In the end, we may want to use python, R, or some other higher-level language thou if we are going to produce so much data. bash is just not gonna cut it IMHO.

ghost · 2017-03-28T13:55:18Z

@ejgallego To decrease the confusion a little bit, do you think that you can reiterate points that led you to conclude:

Opam install not fully suitable for benchmarking.

Several things (related/unrelated) were mentioned and even the original focus shifted a little bit. So restatement might help us, I think.

ghost · 2017-03-28T14:24:03Z

The way how I originally planned to address this problem is to fix this issue at the root so that we can keep the benefits opam provides.

ejgallego · 2017-03-28T14:42:29Z

That looks good, given the scope of the benchmarking it may make sense for now.

However, if/when we want to do fancier tricks such as calling coq_makefile --benchmark we will either have to modify opam, or think some kind of trick, (maybe and env variable?)

So let's keep an eye on it.

we can keep the benefits opam provides.

I am not so convinced yet that opam provides many benefits for testing bleeding edge code. A good example is iris, they had to roll out their own package system as opam had not the proper granularity cf https://gitlab.mpi-sws.org/FP/iris-coq/issues/83

ghost · 2017-04-06T12:19:11Z

We should make sure that the "download time" does not influence benchmarking of "build time".

ejgallego · 2017-04-06T13:31:58Z

Be aware that when using opam there are other aspects beyond download time that may influence the timing, hence the original title was more accurate.

ejgallego · 2018-12-13T11:35:47Z

This is still an issue, and a serious one it seems

JasonGross · 2019-12-07T05:38:04Z

Note that if you call make TIMED=1 --output-sync on a coq-makefile-made Makefile, the python scripts I have can process the logs of the resulting build and display per-file timing diffs as well as whole-project timing diffs. (We unfortunately don't get CPU cycles nor mem faults, etc.)

ppedrot · 2019-12-07T10:44:35Z

FWIW I'm also using a script to filter the HTML output of the bench, which sets the TIMED variable and allows to retrieve a per-line diff. It's essentially the same as the one provided in the repo, except that it's quasi-linear in the number of lines. (The one from the bench is cubic IIRC.)

ghost closed this as completed Mar 24, 2017

ghost reopened this Mar 24, 2017

ejgallego changed the title ~~Opam install not suitable for installs~~ Opam install not fully suitable for benchamarking. Mar 24, 2017

ejgallego changed the title ~~Opam install not fully suitable for benchamarking.~~ Opam install not fully suitable for benchmarking. Mar 24, 2017

ghost closed this as completed Mar 25, 2017

ghost reopened this Mar 25, 2017

ejgallego mentioned this issue Mar 25, 2017

Optimizing array mapping in the kernel. coq/coq#434

Merged

ghost mentioned this issue Mar 30, 2017

shared/opam_install.sh is a heresy #5

Closed

ghost added the kind: bug label Apr 4, 2017

ghost changed the title ~~Opam install not fully suitable for benchmarking.~~ download operation creates unnecessary noise in benchmarking results Apr 6, 2017

ejgallego closed this as completed Dec 7, 2017

ejgallego reopened this Dec 13, 2018

ejgallego added this to the 8.9+shell milestone Dec 13, 2018

ejgallego closed this as completed Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

download operation creates unnecessary noise in benchmarking results #1

download operation creates unnecessary noise in benchmarking results #1

ejgallego commented Mar 24, 2017

ghost commented Mar 24, 2017

ghost commented Mar 24, 2017 •

edited by ghost

Loading

ghost commented Mar 24, 2017 •

edited by ghost

Loading

ejgallego commented Mar 24, 2017

maximedenes commented Mar 24, 2017 •

edited

Loading

ejgallego commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ghost commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ghost commented Mar 25, 2017

ejgallego commented Mar 25, 2017

ejgallego commented Mar 25, 2017

ghost commented Mar 28, 2017 •

edited by ghost

Loading

ghost commented Mar 28, 2017 •

edited by ghost

Loading

ejgallego commented Mar 28, 2017

ghost commented Apr 6, 2017

ejgallego commented Apr 6, 2017 •

edited

Loading

ejgallego commented Dec 13, 2018

JasonGross commented Dec 7, 2019

ppedrot commented Dec 7, 2019

download operation creates unnecessary noise in benchmarking results #1

download operation creates unnecessary noise in benchmarking results #1

Comments

ejgallego commented Mar 24, 2017

ghost commented Mar 24, 2017

ghost commented Mar 24, 2017 • edited by ghost Loading

ghost commented Mar 24, 2017 • edited by ghost Loading

ejgallego commented Mar 24, 2017

maximedenes commented Mar 24, 2017 • edited Loading

ejgallego commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ghost commented Mar 24, 2017

ejgallego commented Mar 24, 2017

ghost commented Mar 25, 2017

ejgallego commented Mar 25, 2017

ejgallego commented Mar 25, 2017

ghost commented Mar 28, 2017 • edited by ghost Loading

ghost commented Mar 28, 2017 • edited by ghost Loading

ejgallego commented Mar 28, 2017

ghost commented Apr 6, 2017

ejgallego commented Apr 6, 2017 • edited Loading

ejgallego commented Dec 13, 2018

JasonGross commented Dec 7, 2019

ppedrot commented Dec 7, 2019

ghost commented Mar 24, 2017 •

edited by ghost

Loading

ghost commented Mar 24, 2017 •

edited by ghost

Loading

maximedenes commented Mar 24, 2017 •

edited

Loading

ghost commented Mar 28, 2017 •

edited by ghost

Loading

ghost commented Mar 28, 2017 •

edited by ghost

Loading

ejgallego commented Apr 6, 2017 •

edited

Loading