-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis is having a bad day (with openblas 0.2.7) [PILEDRIVER codes broken] #3834
Comments
I've gotten ssh access to a Travis worker. Let's see if we can't nail down what exactly is going on here, and also debug some of the cause of any other strange Travis errors we encounter. I have access for 48 hours, so let's make the most of it! My first question, is for @aviks and perhaps @JeffBezanson, as I'm seeing some really strange behavior regarding multiprocessing. When I start up a worker, (even if it's just Running I should also mention that when I start up, say, 5 processes via I can confirm that the error only occurs when OpenBLAS 0.2.7 is installed, but it's strange that it's only happening on Travis machines. @dmbates, @andreasnoackjensen, @ViralBShah, you guys are my go-to people for |
Oh, and if any of you guys want to SSH into this thing, email me, I'll give you access to poke around. I can replicate the memory behavior on other machines, which makes me think it might not be a bug (although at least one of my machines doesn't exhibit the symptoms, I can't quite figure out why not) but the linalg error I've only been able to reproduce on Travis machines. |
You probably meant @amitmurthy to ask about the multiprocessing issues. |
Is it possible that this isn't working with OpenBLAS 0.2.7 on specifically the Travis machines? |
I have reverted 0.2.7 until the lapack symbols in openblas are fixed. |
Issue filed OpenMathLib/OpenBLAS#263, identifying the bad guy as |
I plan to roll back bulldozer & piledriver kernels to barcelona kernel. |
I've patched OpenBLAS to disable AVX kernels for now, as I wasn't able to quickly figure out how to remove pile driver codes from the dynamic arch binary. Once xianyi rolls them back, I'll disable the patch, and we should be good to go. |
It looks like they are eventually going to release a 0.2.8 fix to OpenBLAS to address some of these issues. @xianyi, if you like, just let me know when 0.2.8 is ready to test, and I'll make sure it works on all of my systems. (Linux with AMD, Linux with Intel, OSX, etc...) |
@staticfloat Can you test 0.2.8-rc1 on AMD? |
Yes, I am doing that right now. Unfortunately, I had to wait for the space requirements for the PPA to be increased before I could get OpenBLAS 0.2.8-rc1 uploaded. I should know whether it works or not before I go to bed tonight. |
All tests pass now on 0.2.8-rc1. Once 0.2.8 proper is released, I will update the PPA, and we can move to 0.2.8 in |
I've run the test suite on every Linux box I have, and I can't seem to reproduce the error that Travis is hitting on all the current builds. I have a sneaking suspicion it has to do with the new openblas 0.2.7 deb I push to my PPA, but all my local tests pass with it, so I'm having a hard time tracking down exactly where the failure is.
Can anyone else reproduce these errors?
The text was updated successfully, but these errors were encountered: