-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_controller still hangs sometimes #192
Comments
Sahit seemed to have a similar issue, it seems to be that things hand when you connect to an already publishing node. No idea why unfortunately. |
Is this still the case with test controller, I can't test it. |
@chsahit ^ |
@chsahit bumperoni. I don't believe it is having issues on CI, but I'd still like to know. |
Ok, so this seems to fail almost every time on CI, but works 100% of the time locally. Because we're on docker, sshing into the CI environment is practically useless, since everything is done inside of a docker container created inside of the CI environment. I'm trying to run locally to see if I can repro, but I run into the following: as part of our install script, I delete a symlink (https://github.com/gtagency/buzzmobile/blob/master/install#L44) created as part of the virtualenv creation process. This works as part of the install-at-startup, and works when run on Circle, but when I run locally, runtests.sh fails because this symlink doesn't exist. If I skip the step, it fails moments later on something else. Any ideas? |
Hmm, I can try to take a look at this on tuesday earliest. You can definetly still ssh via the circleci ssh feature, it's just a bit more complicated. Ssh in, then run a I'm a little bit confused as to why the symlink would be existing on the remote but not local containers. Maybe there's something slightly different about the environment variables being passed in or something? I definetly noticed this problem earlier, but I didn't try and debug it. I have a feeling it has to do with a clean install of buzzmobile (if you get past one run, it suddenly becomes fine). I'll get back to you soon and let you know what I find out. |
I can't, it fails with an error: Apparently Circle doesn't use normal docker. |
Alright, now to bamboozle everyone even more: I can't repro this on local docker container. I ran, on my host machine,
Then, in the docker image
I'm fully bamboozled here, because it is quite clearly hanging on CI, but its not here. As an attempted fix I guess we could try to have the install step run git clone instead of anything else, but that doesn't seem any better. |
Yup, I can't seem to reproduce it locally on a single machine (after one run). Since it seems flakey (not all the time) and it happens on CI but not on local, this smells like a deadlock/race condition that's going on, which is somehow made worse when you have less parallization (CI only has 2 threads). It's possible some subtle change to CircleCI caused this to start triggering now. I have no idea how ros works though, so it might not be related. I think this a semi-recent change though, since d5aef13 builds fine (at least for one try, you could try rebuilding it to see if it happens consistently). It seems to fail once I merge master in though. I actually noticed this back a long time ago but I thought it was something up with your testing suite, so I didn't comment. I would try disabling one of the two tests your running (at a time) and see if that helps to help narrow down the problem. I would start with test_controller, since the test I ran just now seems to be hanging on that (but I don't know how pytest output works, so I might be wrong. |
I'm still leaning towards it being an issue in pyrostest, if only because https://circleci.com/gh/gtagency/buzzmobile/596 passes. My guess would be that there's something that can spin if the context managers aren't being used correctly, but I'm not sure why exactly that would be the case. |
Ok nevermind. I added an additional test in gtagency/pyrostest#26, and that works just fine. So yeah, this appears to be a weird CI issue. |
Added some new things to pyrostest that should mitigate this. In testing it appeared to make the tests fail early instead of hanging, so that's nice. |
in a fresh install, it will hang on the first try of testing then it will pass on all subsequent times.
I'm marking low priority because Josh now has Google to worry about and because it does work most of the time. But I'd like to at least know why this happens..
The text was updated successfully, but these errors were encountered: