Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec tests get stopped (flakiness) #330

Open
andychu opened this issue Jun 8, 2019 · 17 comments
Open

spec tests get stopped (flakiness) #330

andychu opened this issue Jun 8, 2019 · 17 comments

Comments

@andychu
Copy link
Contributor

andychu commented Jun 8, 2019

This seems to have started happening in the last few days. A signal gone wild?

test/spec-runner.sh run-cases prompt 
test/spec-runner.sh run-cases quote 

[1]+  Stopped                 test/spec.sh all
andy@lisa:~/git/oilshell/oil$ flogout
@andychu
Copy link
Contributor Author

andychu commented Jul 25, 2020

I have still seen this occasionally :-( Doesn't happen very often though

@andychu
Copy link
Contributor Author

andychu commented Sep 3, 2020

Just ran into this twice on 0.8.pre11.

I did grep kill spec/*.test.sh to see any stray signals were sent? But I don't see anything. gah.

@andychu
Copy link
Contributor Author

andychu commented Sep 3, 2020

It definitely is some race in xargs -P. I guess we could turn it on in the continuous build to find the flakiness.

@andychu
Copy link
Contributor Author

andychu commented Sep 3, 2020

Different flakiness: one test case in builtin-io.test.sh fails with xxxx instead of xxx

@andychu
Copy link
Contributor Author

andychu commented Nov 18, 2020

0.8.5 release, got flakiness in spec/redirect

24 | pass | pass | FAIL | pass | : 3>&3 (OSH regression)
-- | -- | -- | -- | -- | --
  |   |   | details |  

oh actually this was mksh, not osh!

@andychu
Copy link
Contributor Author

andychu commented Nov 18, 2020

Changed it to 4>&4 instead of 3>&3, still got flakiness... but then I did fg and it kept going.

gah!

andychu pushed a commit that referenced this issue Nov 18, 2020
This is issue #330.  For some reason mksh started being flaky in
parallel ('osh-all'), but not when run serially on its own.

[build] Get rid of /../ in path.

This caused the ovm-build benchmark to fail for some unknown reason.
andychu pushed a commit that referenced this issue Nov 22, 2020
@andychu
Copy link
Contributor Author

andychu commented Jan 23, 2021

case 47 in spec/builtin-io -- got xxxxx instead of xxx

andychu pushed a commit that referenced this issue Jan 23, 2021
I don't really know what it was trying to do anyway.

Addresses issue #330.
@andychu
Copy link
Contributor Author

andychu commented Jul 6, 2021

Happened on releasing 0.8.12, and then I just did fg manually :-(

@andychu
Copy link
Contributor Author

andychu commented Sep 11, 2021

Hm this seems to happen much more often on CPUs with 24 cores (broome, spring)

@andychu
Copy link
Contributor Author

andychu commented Sep 16, 2021

hit this twice in a row on 0.9.2 :-( need to fix

For some reason it always causes sh-usage test to fail too

@andychu
Copy link
Contributor Author

andychu commented Sep 16, 2021

Hm are we starting too many processes? Then I'd expect the OOM killer to be the culprit, but that kills it, doesn't stop it (SIGSTOP)

https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

I also searched for SIGSTOP in the spec tests ...

@andychu
Copy link
Contributor Author

andychu commented Nov 11, 2021

Got the failing sh-usage.test.sh behavior again

@andychu
Copy link
Contributor Author

andychu commented Nov 17, 2021

Hitting this very consistently with 24 core machine ... gah. For the 0.9.4 release

I think the real solution is containers so there's no interference?

andychu pushed a commit that referenced this issue Nov 17, 2021
@andychu
Copy link
Contributor Author

andychu commented Apr 1, 2023

Started hitting this again today, for some reason :-(

https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev/topic/Random.20spec.20test.20stoppages

@andychu
Copy link
Contributor Author

andychu commented Apr 28, 2023

It is much more prevalent now , running into this for the 0.15.0 release

I took spec/interactive out , and there were fewer random stoppages, but some still failed

sh-options still seems to fail, probably because it uses -i

andychu pushed a commit that referenced this issue Apr 28, 2023
These files use $SH -i and seem to tickle a race condition resulting in
stopped jobs.

This has been happening for 4 years -- see bug #330.

But it's more prevalent now that we have job control.
@andychu
Copy link
Contributor Author

andychu commented May 11, 2023

Update:

  • I fixed a genuine OSH bug that made this more prevalent
  • The stoppages happen with just bash, and bash + OSH
  • Right now I'm NOT getting them with just OSH, which is good
  • But I am getting a different issue where the CI hangs with OSH in parallel, but not OSH run serially

Details:

https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev/topic/Random.20spec.20test.20stoppages

@andychu
Copy link
Contributor Author

andychu commented Jun 30, 2023

Hm I hit this while refactoring the spec tests

I believe it doesn't hold up the release because test/spec-py.sh all-and-smoosh sets MAX_PROCS=1

Doesn't happen in the CI because we don't have a terminal

It happens with a parallel run WITH a terminal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant