-
Notifications
You must be signed in to change notification settings - Fork 6
download operation creates unnecessary noise in benchmarking results #1
Comments
That and other things:
The benefits (handling of dependencies) still, even in this state, seem to outweight the drawbacks. At the moment, we are evaluating the level of noise. By that I mean that for the chosen branch what I do is I check the results we get if we compile all packages once, or twice, or thrice, or four times. That will give us the idea how many iterations are enough (wrt. to the desired precision). Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time. Also, the plan is to look at the implementation of the |
|
The better counterargument against relying on
The implementation, in that respect, isn't particularly appealing. I am not very happy about it. |
That is not true in my experience. Also, in your iteration scheme, opam caching will quick in, distorting the results based on the size of download. |
I agree it is not so clear if the gains outweight the drawbacks. Maybe we could try a hybrid approach, like using opam to install dependencies, but do the build to be benchmarked manually. This way, we would not need to create a fake package, we would have more control on what we measure, but could still rely on OPAM to solve dependencies. |
We could certainly share the build recipes in |
Futhermore, we should ideally produce time reports by file. Most projects use coq_makefile so that should be doable. |
Changes in time by |
That is a good idea. |
In the future we will be able to track even changes by section, and sentence (sercomp will support that). |
I've added your ideas to the TODO list. |
I would rather have the issue closed when it is actually solved (hint, other people interested can search, collaborate and comment) but I guess it is your call. |
It seems to me indeed that a feasible approach would be to teach In the end, we may want to use python, R, or some other higher-level language thou if we are going to produce so much data. bash is just not gonna cut it IMHO. |
@ejgallego To decrease the confusion a little bit, do you think that you can reiterate points that led you to conclude:
Several things (related/unrelated) were mentioned and even the original focus shifted a little bit. So restatement might help us, I think. |
The way how I originally planned to address this problem is to fix this issue at the root so that we can keep the benefits |
That looks good, given the scope of the benchmarking it may make sense for now. However, if/when we want to do fancier tricks such as calling So let's keep an eye on it.
I am not so convinced yet that opam provides many benefits for testing bleeding edge code. A good example is iris, they had to roll out their own package system as opam had not the proper granularity cf https://gitlab.mpi-sws.org/FP/iris-coq/issues/83 |
We should make sure that the "download time" does not influence benchmarking of "build time". |
Be aware that when using opam there are other aspects beyond download time that may influence the timing, hence the original title was more accurate. |
This is still an issue, and a serious one it seems |
Note that if you call |
FWIW I'm also using a script to filter the HTML output of the bench, which sets the |
Unless I am missing something, using
opam install
is not suitable for benchmarking. Among other things, it will call constraint solvers, try to download packages, and thus could introduce significant differences on timing.@maximedenes, what do you think?
The text was updated successfully, but these errors were encountered: