Change to narrow test integration interface #1160

mbj · 2020-12-16T15:01:30Z

This reduces the payload that has to be communicate from the killforks
to the main process.
Nice increase in synthetic benchmarks.

* This reduces the payload that has to be communicate from the killforks to the main process. * Nice increase in synthetic benchmarks.

mbj · 2020-12-16T15:19:41Z

@dgollahon, @pawelpacana On rspec payloads this releatively simple change has a dramatic effect:

On mbj/auom (rspec, minitest had more narrow selection where it did not matter that much):

Mutations/s:     23.61

now:

Mutations/s:     44.76

mbj · 2020-12-16T15:20:08Z

Do you mind to try master on your projects, just to verify I'm not lying to myself?

mbj · 2020-12-16T15:51:38Z

lib/mutant/result.rb

-      include Result, Anima.new(
-        :passed,
-        :runtime,
-        :tests


That one was entirely unused. Historic artifact, but transitively referenced lots of objects that where needlessly run through the wire.

dgollahon · 2020-12-16T18:41:53Z

I picked 5 different runs over various subjects in my main application codebase (rspec):

Subject	10.19	10.20	I/O
A	`20.30/s`	`19.13/s`	No
B	`2.48/s`	`2.50/s`	Yes
C	`10.51/s`	`8.44/s`	No
D	`12.05/s`	`12.94/s`	Yes
E	`18.85/s`	`17.26/s`	No

I ran all the 10.19 cases and then the 10.20 cases so I didn't have to keep messing with bundler. I then thought maybe my laptop just got overheated or something and the reason the second set was slower was because of overheating or some other variable so I re-ran C on 10.20 first and got 9.37/s the second time and then switched back to 10.19 and got 10.16/s. I think 10.20 looks like a performance regression on my real world app.

dgollahon · 2020-12-16T18:48:53Z

It is much faster on auom on my machine as well: 10.19 is 42.19/s and 10.20 is 75.29/s.

I suspect we need better profiling tools to say much about what is going on here.

mbj · 2020-12-16T19:03:10Z

I suspect we need better profiling tools to say much about what is going on here.

Actually no we do not for this case, its easy to explain. We removed redundant work, the examples you ran A/B/C/D/E are all within standard variance and bound by computation inside the killfork. Measurement on cold and hot start (literally, as CPU throttling easily kicks in earlier for later runs) distort later runs also. Given we clearly removed "static" work from the system as soon you hit an computation intensive subject auom is NOT: you get lots of standard jitte on the non auom case.

I suspect multiple re-runs on .19 and .20 would show a small bias towards .20 on your examples. Whereas auom produces a BIG signal regardless on machine base state.

This, plus the information we clearly removed static overhead: Means .20 is not a regression, but a step in the right direction.

dgollahon · 2020-12-16T19:04:06Z

Since there is some variability I wanted to add a couple more data points to confirm some of the rows were actually slower. Pretty surprised that subject A dropped so much in the re-test but I double checked the original output. 🤷‍♂️

19.37/s and 19.59/s on 10.19 and 16.77/s and 16.05/s on 10.20 for A
19.40/s and 20.25/s on 10.19 and 19.11/s and 19.14/s on 10.20 for E

(note: i ran 10.20 first on most of these^)

mbj · 2020-12-16T19:08:46Z

@dgollahon Given your examples have high variability (likely induced from GC) you need to run them lots of times till they are statistically relevant.

What matters to me is the auom case that clearly shows all samples are now way higher than all samples before. In your cases the samples still intersect with .19 and .20 point to more computation within tests than in auom, hence you need way more iterations to show a bias there.

mbj · 2020-12-16T19:17:20Z

@dgollahon Side note: There is a change in the pipeline that avoids inheriting the result objects from the main process into the killforks.

This reduces the chance of what I call GC amplification which induces lots of variance on bigger projects.

GC amplification works like this:

Main process had some activity that fulfilled the GC trigger by 99%
Multiple forks happen, and the activity inside the forks pushes each fork over the GC threshold. Where if the parent process had not accumulated that many objects, the killforks may have exited before major GC.
Parent triggers GC, making sure that next scheduled forks are cleaner, but scheduling also pauses, reducing the chances of children to trigger GC.

On bigger projects than auom this effect is massive, triggers lots of CoW page expansions and hurts performance dramatically.

Now as we clearly generate less objects for the parent to inherit (the result read back from the killforks is smaller) I gurantee that with a large enough sample of heavier code: .20 provides a speed-up, likely not as massive one as on an intentionally small project like auom. But the subjects you chose may just have been shy of the 99.9% threshold on average, some may even get worse for this effect. As mutant now generates less garbadge to collect by its own.

The real fix is to fork off a process that does not accumulate state. And I'd still bet that .20 is a noticable performance increase if you had high test iteration counts on your subjects.

dgollahon · 2020-12-16T19:54:20Z

5 back-to-back runs of subject C on 10.20:

9.09/s, 9.05/s, 9.54/s, 8.59/s, and 8.20/s (this run segfaulted) (I think my computer slows down as it overheats, not 100% sure)

switch back to 10.19 (computer still very hot from 10.20 runs so i suspect this is unfavorable to 10.19 but am not sure):

10.52/s, 10.97/s, 11.81/s, 8.40/s, 8.40/s (again -- this time there was a segfault)

Decided to do a 6th run on 10.19: 8.89/s and then updated back to 10.20 and got: 9.38 (segfault on this one too). Switched back to 10.19 and got 11.45/s. I am not sure but I suspect that the act of switching between versions gives my computer enough of a break to cool down a bit and perform better.

I am not saying this data is conclusive but there are several more runs across all subjects where 10.19 outperforms 10.20 than the other way around and by larger margins. That said, I am also getting occasional segfaults so I'm not sure how much that taints the numbers. Either way, I'm not that convinced on:

And I'd still bet that .20 is a noticable performance increase if you had high test iteration counts on your subjects.

It may be better, but if so, not noticeably so under normal conditions for me. I think this data is mixed and sometimes negative enough that we should at least admit the possibility that it is actually slower on my project overall for some non-obvious reason. I can't collect any additional data right now because it is too slow and I have other things to do. I am not sure how many runs or under what conditions you would find data from my project convincing.

I still think it would be helpful to have better profiling tools so we can actually check hypotheses like GC thresholds, GC runs, memory usage, time on a given mutation, etc. instead of speculating. The mutations/s can't tell you why it was slower or faster that run.

re: the GC commentary--should mutant try to do some GC-tuning on your behalf? or have an option in addition to the ruby options? Or maybe a section in the documentation on adjusting it? If GC has so much of an effect, I would guess that you'd get better runs by turning off GC, GC'ing before fork, having huge GC sizes, infrequent collections, etc.

did you get a difference in speed on running mutant on itself? that might also be an interesting test.

mostlyobvious · 2020-12-16T20:04:08Z

Results from a single run, highly dependent on shared machine load:

rails_event_store_active_record

Mostly slow tests (database), no significant difference:
- 0.10.19 was progress: 2871/2874 alive: 0 runtime: 1727.27s killtime: 2420.63s mutations/s: 1.66
- 0.10.20 was progress: 2859/2874 alive: 0 runtime: 1684.60s killtime: 3008.85s mutations/s: 1.70
ruby_event_store

Rather quick tests, lots of subjects, huge improvement:
- 0.10.19 was progress: 11604/11605 alive: 0 runtime: 5022.55s killtime: 1432.42s mutations/s: 2.31
- 0.10.20 was progress: 11598/11605 alive: 0 runtime: 1763.99s killtime: 1656.08s mutations/s: 6.57
Additionally, when running on i7-7700K it now never stops fans which is an improvement as well:
progress: 11603/11605 alive: 0 runtime: 427.98s killtime: 2349.10s mutations/s: 27.11

dgollahon · 2020-12-16T20:07:58Z

I guess i just don't have whatever preconditions are required to make this faster on my project 🤷‍♂️

It still seems slower in several cases for me but maybe that's just noise or some kind of particularly unfavorable situation.

mbj · 2020-12-16T20:28:27Z

@dgollahon Wait, I've got more in the pipeline for you.

@pawelpacana You are affected by huge amounts of degeneration for the large pass. Mutant never was optimized for these, working on it.

dkubb · 2020-12-17T01:20:23Z

@dgollahon what about turning off the GC before running the benchmarks?

dgollahon · 2020-12-17T02:39:03Z

@dkubb I have not tried that yet but that is a good idea.

As a note I am currently inclined to agree with @mbj's analysis and I am probably seeing slower runs because of some minor threshold thing with the GC or it is pure noise. I looked closer at the changes from 10.19 to 10.20 and it's hard to see why they would make things slower in general.

I may retry on another day if I have time with different GC settings and see if I can get a stable result but I do not currently have time to look into it further today.

mbj · 2020-12-17T03:32:34Z

Just to sump this up again: The root cause is mutants process model. Fixing the process model removes lots of known to be very suboptimal behavior.

The key is to have the process the killforks getting forked of: minimal and size wise stable. AKA not having the killforks accumulate any state they are not interested in.

I hope I can get the process model change in before having to switch to the new main process written in Haskell. We'll see.

mbj · 2020-12-23T15:48:50Z

@pawelpacana Your

Rather quick tests, lots of subjects, huge improvement:

With #1096 this case is now up to:

Mutations/s:     9.10

Its likely that now your test suite execution is the bottleneck.

Change to narrow test integration interface

b6b1c16

* This reduces the payload that has to be communicate from the killforks to the main process. * Nice increase in synthetic benchmarks.

mbj merged commit 2af00e3 into master Dec 16, 2020

mbj deleted the change/to-narrow-interface branch December 16, 2020 15:07

mbj commented Dec 16, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change to narrow test integration interface #1160

Change to narrow test integration interface #1160

mbj commented Dec 16, 2020

mbj commented Dec 16, 2020 •

edited

Loading

mbj commented Dec 16, 2020

mbj Dec 16, 2020

dgollahon commented Dec 16, 2020

dgollahon commented Dec 16, 2020

mbj commented Dec 16, 2020

dgollahon commented Dec 16, 2020 •

edited

Loading

mbj commented Dec 16, 2020

mbj commented Dec 16, 2020 •

edited

Loading

dgollahon commented Dec 16, 2020

mostlyobvious commented Dec 16, 2020 •

edited

Loading

dgollahon commented Dec 16, 2020

mbj commented Dec 16, 2020

dkubb commented Dec 17, 2020

dgollahon commented Dec 17, 2020

mbj commented Dec 17, 2020

mbj commented Dec 23, 2020

Change to narrow test integration interface #1160

Change to narrow test integration interface #1160

Conversation

mbj commented Dec 16, 2020

mbj commented Dec 16, 2020 • edited Loading

mbj commented Dec 16, 2020

mbj Dec 16, 2020

Choose a reason for hiding this comment

dgollahon commented Dec 16, 2020

dgollahon commented Dec 16, 2020

mbj commented Dec 16, 2020

dgollahon commented Dec 16, 2020 • edited Loading

mbj commented Dec 16, 2020

mbj commented Dec 16, 2020 • edited Loading

dgollahon commented Dec 16, 2020

mostlyobvious commented Dec 16, 2020 • edited Loading

dgollahon commented Dec 16, 2020

mbj commented Dec 16, 2020

dkubb commented Dec 17, 2020

dgollahon commented Dec 17, 2020

mbj commented Dec 17, 2020

mbj commented Dec 23, 2020

mbj commented Dec 16, 2020 •

edited

Loading

dgollahon commented Dec 16, 2020 •

edited

Loading

mbj commented Dec 16, 2020 •

edited

Loading

mostlyobvious commented Dec 16, 2020 •

edited

Loading