-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to narrow test integration interface #1160
Conversation
* This reduces the payload that has to be communicate from the killforks to the main process. * Nice increase in synthetic benchmarks.
@dgollahon, @pawelpacana On rspec payloads this releatively simple change has a dramatic effect: On mbj/auom (rspec, minitest had more narrow selection where it did not matter that much): Previous:
now:
|
Do you mind to try master on your projects, just to verify I'm not lying to myself? |
include Result, Anima.new( | ||
:passed, | ||
:runtime, | ||
:tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one was entirely unused. Historic artifact, but transitively referenced lots of objects that where needlessly run through the wire.
I picked 5 different runs over various subjects in my main application codebase (rspec):
I ran all the 10.19 cases and then the 10.20 cases so I didn't have to keep messing with bundler. I then thought maybe my laptop just got overheated or something and the reason the second set was slower was because of overheating or some other variable so I re-ran C on 10.20 first and got |
It is much faster on I suspect we need better profiling tools to say much about what is going on here. |
Actually no we do not for this case, its easy to explain. We removed redundant work, the examples you ran A/B/C/D/E are all within standard variance and bound by computation inside the killfork. Measurement on cold and hot start (literally, as CPU throttling easily kicks in earlier for later runs) distort later runs also. Given we clearly removed "static" work from the system as soon you hit an computation intensive subject I suspect multiple re-runs on .19 and .20 would show a small bias towards .20 on your examples. Whereas This, plus the information we clearly removed static overhead: Means .20 is not a regression, but a step in the right direction. |
Since there is some variability I wanted to add a couple more data points to confirm some of the rows were actually slower. Pretty surprised that subject A dropped so much in the re-test but I double checked the original output. 🤷♂️
(note: i ran 10.20 first on most of these^) |
@dgollahon Given your examples have high variability (likely induced from GC) you need to run them lots of times till they are statistically relevant. What matters to me is the |
@dgollahon Side note: There is a change in the pipeline that avoids inheriting the result objects from the main process into the killforks. This reduces the chance of what I call GC amplification which induces lots of variance on bigger projects. GC amplification works like this:
On bigger projects than Now as we clearly generate less objects for the parent to inherit (the result read back from the killforks is smaller) I gurantee that with a large enough sample of heavier code: .20 provides a speed-up, likely not as massive one as on an intentionally small project like The real fix is to fork off a process that does not accumulate state. And I'd still bet that .20 is a noticable performance increase if you had high test iteration counts on your subjects. |
5 back-to-back runs of subject C on 10.20:
switch back to 10.19 (computer still very hot from 10.20 runs so i suspect this is unfavorable to 10.19 but am not sure):
Decided to do a 6th run on 10.19: I am not saying this data is conclusive but there are several more runs across all subjects where 10.19 outperforms 10.20 than the other way around and by larger margins. That said, I am also getting occasional segfaults so I'm not sure how much that taints the numbers. Either way, I'm not that convinced on:
It may be better, but if so, not noticeably so under normal conditions for me. I think this data is mixed and sometimes negative enough that we should at least admit the possibility that it is actually slower on my project overall for some non-obvious reason. I can't collect any additional data right now because it is too slow and I have other things to do. I am not sure how many runs or under what conditions you would find data from my project convincing. I still think it would be helpful to have better profiling tools so we can actually check hypotheses like GC thresholds, GC runs, memory usage, time on a given mutation, etc. instead of speculating. The mutations/s can't tell you why it was slower or faster that run. re: the GC commentary--should mutant try to do some GC-tuning on your behalf? or have an option in addition to the ruby options? Or maybe a section in the documentation on adjusting it? If GC has so much of an effect, I would guess that you'd get better runs by turning off GC, GC'ing before fork, having huge GC sizes, infrequent collections, etc. did you get a difference in speed on running mutant on itself? that might also be an interesting test. |
Results from a single run, highly dependent on shared machine load:
|
I guess i just don't have whatever preconditions are required to make this faster on my project 🤷♂️ It still seems slower in several cases for me but maybe that's just noise or some kind of particularly unfavorable situation. |
@dgollahon Wait, I've got more in the pipeline for you. @pawelpacana You are affected by huge amounts of degeneration for the large pass. Mutant never was optimized for these, working on it. |
@dgollahon what about turning off the GC before running the benchmarks? |
@dkubb I have not tried that yet but that is a good idea. As a note I am currently inclined to agree with @mbj's analysis and I am probably seeing slower runs because of some minor threshold thing with the GC or it is pure noise. I looked closer at the changes from 10.19 to 10.20 and it's hard to see why they would make things slower in general. I may retry on another day if I have time with different GC settings and see if I can get a stable result but I do not currently have time to look into it further today. |
Just to sump this up again: The root cause is mutants process model. Fixing the process model removes lots of known to be very suboptimal behavior. The key is to have the process the killforks getting forked of: minimal and size wise stable. AKA not having the killforks accumulate any state they are not interested in. I hope I can get the process model change in before having to switch to the new main process written in Haskell. We'll see. |
@pawelpacana Your
With #1096 this case is now up to:
Its likely that now your test suite execution is the bottleneck. |
to the main process.