Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stryker freezes with too many concurrent test runners #1665

Closed
psfinaki opened this issue Aug 18, 2021 · 22 comments · Fixed by #1977
Closed

Stryker freezes with too many concurrent test runners #1665

psfinaki opened this issue Aug 18, 2021 · 22 comments · Fixed by #1977
Assignees
Labels
Area: Mutation Test 🐛 Bug Something isn't working Workaround A workaround is available but a better solution would be nice

Comments

@psfinaki
Copy link
Contributor

Describe the bug

We are now executing Stryker during the night against a big codebase. Recently we switched to very strong machines and Stryker suddenly started to "freeze" during executions. After a few days of frustration I figured out that concurrency is the issue. The machines we use now have 16 logical processors so Stryker spins up 8 concurrent test runners and that occasionally (actually more often than not) just leeds to freezes in random places.

I don't know where the problem exactly is. I can figure out the exact number of test runners when things start going bad. Now I only know the 5 are ok and 8 are not ok. For us the execution time is the same for 4 and 5 concurrent test runners so we stopped there.

Expected behavior

No freezes, well.

Workarounds

Thanks god Stryker has an option to specify this amount in the config. This saved my ass :D

@rouke-broersma
Copy link
Member

rouke-broersma commented Aug 18, 2021

We have actually experienced this ourselves before but only when using the solution run or multiple test projects feature! We had never had a user report before as far as I know.

We have a bug report open for this with the vstest team 😉😉😉

See: microsoft/vstest#2686

Would be awesome if you could confirm that you see the same errors in the vstest logs

@psfinaki
Copy link
Contributor Author

Well we don't use the solution run, rather I wrote a script back in a day to glue project runs together heh :D

I will try to get VsTest logs and verify then. So we could push together :)

@rouke-broersma
Copy link
Member

rouke-broersma commented Aug 18, 2021

Well good because solution runs are not recommended (because of this problem but also because of inaccuracies) 😅.

@richardwerkman
Copy link
Member

This does sound different from issue #1136. We have crashing testrunners there. So lowering the number of testrunners should increase the chance of stryker hanging. But you could check. If you find this in strykers logs (not vstest logging) it's the same issue: [Error] Cancelling the operation as requested.

@psfinaki
Copy link
Contributor Author

psfinaki commented Sep 8, 2021

@rouke-broersma @richardwerkman okay sorry it took so much time, I was implementing mutation score lock in solution wide and it took a while :D

Now I got back to this, seems like VsTest is indeed the issue. I cannot collect VsTest logs from the build machine (I guess for that I would need to execute custom Stryker version there?) but I collected Stryker logs for hung executions and I see this [Error] Cancelling the operation as requested. there.

Should we try pinging the guys internally?

@rouke-broersma
Copy link
Member

rouke-broersma commented Sep 8, 2021

If you could do that, that would be amazing :)

The logs should be available when you execute stryker with log to file enabled and the loglevel set to trace, no custom stryker build necessary.

@psfinaki
Copy link
Contributor Author

psfinaki commented Sep 8, 2021

Oh alright. Thanks! I'll see what I can do. This is maybe causing us problems even with running unit tests, we're investigating this now. I'll keep you posted on that.

@rouke-broersma rouke-broersma added the 🐛 Bug Something isn't working label Sep 17, 2021
@richardwerkman
Copy link
Member

@psfinaki Could you try pushing the vstest team again? I've been trying for over a year now but the issue is still open... Maybe with some internal force we can get this issue out of our way 👼

@psfinaki
Copy link
Contributor Author

psfinaki commented Nov 3, 2021

Yyyyep, taking action right now :)

@psfinaki
Copy link
Contributor Author

psfinaki commented Nov 3, 2021

Okay so they are working on that and should update you in that issue - I've subscribed to the PR mentioned there so will follow as well.

@richardwerkman
Copy link
Member

Thanks! Let's hope we can go forward quickly!

@psfinaki
Copy link
Contributor Author

The sound of merging touched me today :)
We are getting closer.

@richardwerkman
Copy link
Member

Oh my! This is excellent news! Do you also hear the sound of a release soon? 👼

@psfinaki
Copy link
Contributor Author

Well so VS17.1 is out there a few days now and they wanted to have this thing along with VS17.2. That's as much as I know 🤷

@richardwerkman
Copy link
Member

Great! Thanks for the info

@psfinaki
Copy link
Contributor Author

So! How about releasing this? :)

@rouke-broersma
Copy link
Member

rouke-broersma commented Apr 12, 2022

Yep, just waiting to see if 0xced replies today on their pr. Note that this is not yet fixed until it's fixed in vstest, we just have a workaround that might help us if this occurs.

@rouke-broersma rouke-broersma added the Workaround A workaround is available but a better solution would be nice label Apr 12, 2022
@psfinaki
Copy link
Contributor Author

Sure!

@rouke-broersma
Copy link
Member

1.5.1 has been released with the workaround for the testrunner freezing, please give it a try

@psfinaki
Copy link
Contributor Author

Will do soon, things are a bit slow now because of Easter :)

@dupdob
Copy link
Member

dupdob commented Apr 21, 2022

I am pretty confident the 'freeze' problem will be fixed, at least partially. But you may encounter another problem (see #2002) that is the probable cause for 'freeze' like situation. Note that 1.51 improved the handling of VsTest freezes, but you may still run into #2002. The workaround is then to increase the additional timeout value.
That is, until the next release that includes the fix.

@dupdob dupdob self-assigned this Apr 21, 2022
@psfinaki
Copy link
Contributor Author

So I've run our nightly mutation testing three times without concurrency limitations and yes looks like the freezes are gone. Our build machines are kind of random now so the times vary wildly and thus we are yet benefitting from this freedom but still wanted to confirm this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Mutation Test 🐛 Bug Something isn't working Workaround A workaround is available but a better solution would be nice
Projects
None yet
4 participants