Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Test failures on M1 with BLAS from conda-forge #10759

Closed
larsoner opened this issue Jun 14, 2022 · 32 comments
Closed

BUG: Test failures on M1 with BLAS from conda-forge #10759

larsoner opened this issue Jun 14, 2022 · 32 comments
Labels
Milestone

Comments

@larsoner
Copy link
Member

In working on #10756 on my M1 I had some test failures related to beamforming. Before we release 1.1 we should fix these. I hopefully will be able to look+fix today.

I'm a bit worried it might be an upstream OpenBLAS bug, which means ideally we'd wait for them to fix something, then NumPy/SciPy to release, then we use the latest version only on M1 :( But if that turns out to be the case, it's possible we can find some workaround, like using a different/slower lapack driver on M1 than the default or something. We'll see!

This is the sad downside of all of these repos not having proper arm64 testing on CIs :(

@larsoner larsoner added the BUG label Jun 14, 2022
@larsoner larsoner added this to the 1.1 milestone Jun 14, 2022
@hoechenberger
Copy link
Member

hoechenberger commented Jun 14, 2022

Argh, what a bummer!!

But also extremely good to know

Maybe we should try to set up a self-hosted GHA runner on a spare ARM64 machine. To my knowledge, this is actually free (??) We just need a computer we can spare.

@larsoner
Copy link
Member Author

Yes I think within the last month they added support for self-hosted runners. I don't have a spare M1 machine though. Maybe @agramfort has one sitting around :)

@hoechenberger
Copy link
Member

@larsoner I could offer to set up my regular M1 as a self-hosted runner and only allow jobs to be run at night, when I don't actively use the machine. But I'm currently busy with other things. But just saying that I'm generally willing to do this.

@larsoner
Copy link
Member Author

Still plenty of linalg-related failures even with accelerate. @hoechenberger can you confirm that pytest mne/utils/tests/test_linalg.py has a bunch of failures for you, too?

@hoechenberger
Copy link
Member

Will check!

The last alternative would be netlib

@hoechenberger
Copy link
Member

hoechenberger commented Jun 16, 2022

I did:

❯ mamba create -n mne-dev -c conda-forge/label/mne_dev -c conda-forge mne pytest pytest-cov
❯ mamba run -n mne-dev pip install ~/Development/mne-python
❯ mamba run -n mne-dev pytest ~/Development/mne-python/mne/utils/tests/test_linalg.py

The test result is really bad:

======================== 54 failed, 90 passed in 1.41s =========================

So then with accelerate:

❯ mamba install -n mne-dev "libblas=*=*accelerate"
❯ mamba run -n mne-dev pytest ~/Development/mne-python/mne/utils/tests/test_linalg.py

Same results:

======================== 54 failed, 90 passed in 0.85s =========================

Then with netlib:

❯ mamba install -n mne-dev "libblas=*=*netlib"
❯ mamba run -n mne-dev pytest ~/Development/mne-python/mne/utils/tests/test_linalg.py

Same:

======================== 54 failed, 90 passed in 0.86s =========================

😭

@hoechenberger
Copy link
Member

Everything passes when using Intel binaries, running through Rosetta

❯ CONDA_SUBDIR=osx-64 mamba create -n mne-dev-intel -c conda-forge/label/mne_dev -c conda-forge mne pytest pytest-cov
❯ mamba run -n mne-dev-intel pip install ~/Development/mne-python
❯ mamba run -n mne-dev-intel pytest ~/Development/mne-python/mne/utils/tests/test_linalg.py
============================= 144 passed in 8.88s ==============================

@hoechenberger
Copy link
Member

@larsoner
Since we're seeing these issues with all supported BLAS implementations, I'd bet it's an issue created by conda-forge's cross-compilation scheme.

@larsoner
Copy link
Member Author

Or it could be f2py or some other generic BLAS/LAPACK bug. So I think there are actually two problems: the SVD one specific to OpenBLAS, and these linear algebra failures across all implementations.

To see if it's a cross compilation bug, I could try building with OpenBLAS from brew locally. I already have it from messing with OpenMEEG anyway

@larsoner
Copy link
Member Author

test_linalg.py works for on my natively compiled OpenBLAS + NumPy. I'll have to look at the feedstock and try again after recompiling with as many arguments of theirs that I can match

@larsoner
Copy link
Member Author

... but taking your point about it happening with all implementations, I agree it's probably a cross-compilation problem. I'll open an issue

@hoechenberger
Copy link
Member

hoechenberger commented Jun 16, 2022

... but taking your point about it happening with all implementations, I agree it's probably a cross-compilation problem. I'll open an issue

If we're really crazy we could compile OpenBLAS natively on our M1 machines and publish the packages on anaconda.org under an mne channel, and enforce their usage via the installer. But urgs this is all so ugly.

@danieltomasz
Copy link

With mne installed in new virtual-env via pyenv (and without specifying library) the results about test are 144 passed in 7.01s

@hoechenberger
Copy link
Member

hoechenberger commented Jun 17, 2022

With mne installed in new virtual-env via pyenv (and without specifying library) the results about test are 144 passed in 7.01s

This implies you installed all dependencies via pip, i.e. from PyPI, right?

We already verified that this works.

This issue here is about conda-forge-based installations. 🙂

@hoechenberger hoechenberger changed the title BUG: Test failures on M1 BUG: Test failures on M1 with BLAS from conda-forge Jun 17, 2022
@larsoner
Copy link
Member Author

larsoner commented Jul 5, 2022

FYI this has been fixed by OpenMathLib/OpenBLAS#3675 and there is only one remaining issue in OpenBLAS's 0.3.21 milestone, so it might be worth waiting for this to be fixed.

@larsoner
Copy link
Member Author

@hoechenberger I'm not sure we should wait for this one, it's not clear when they'll actually release. Thoughts on how to move forward?

@hoechenberger
Copy link
Member

We could build and package OpenBLAS ourselves and distribute it via an mne channel in anaconda.org, which would be used by the installer

WDYT?

@larsoner
Copy link
Member Author

Sure, feel free to give it a shot. A commit including OpenMathLib/OpenBLAS#3675 should work!

@hoechenberger
Copy link
Member

hoechenberger commented Jul 27, 2022

@larsoner Do you know if this patch can be applied on the latest stable release?

Because then we could try to convince the folks over at https://github.com/conda-forge/openblas-feedstock to include the patch and bump the build number; it would be a fix for M1 and not affect other platforms – we could argue that, effectively, their previous M1 builds are broken, and that this is the correct fix.

This approach

  • is super simple
  • ensures that all users would get access to the fix ASAP, even if they don't use our installers, but manually install from conda-forge directly

WDYT?

@hoechenberger
Copy link
Member

@larsoner I will try to make a PR.

@hoechenberger
Copy link
Member

PR opened at conda-forge/openblas-feedstock#141

@larsoner
Copy link
Member Author

Yeah that seems better than making our own channel!

@hoechenberger
Copy link
Member

@larsoner Could you take a look at my PR please? Two tests on Travis are timing out and I am as clueless as ever when it comes to compiling stuff 🙃

@larsoner
Copy link
Member Author

Maybe try closing and reopening the PR to restart CIs

@hoechenberger
Copy link
Member

I did it the fancy way and asked the bot to do it for me 😎

@hoechenberger
Copy link
Member

x-ref

OpenMathLib/OpenBLAS#3701

@larsoner
Copy link
Member Author

I'll close this since it should land on CDNs soon.

Let's get the installers going and cut 1.1!

@hoechenberger
Copy link
Member

hoechenberger commented Jul 27, 2022

Please ensure to enforce use of opnenblas build >=1 by explicitly putting it in construct.yaml

@larsoner
Copy link
Member Author

You want to make a PR for this? If not I can do it later today

@larsoner
Copy link
Member Author

... also we need #10945, doc build is broken currently

@hoechenberger
Copy link
Member

I won't have time until tonight (it's currently afternoon here)

@hoechenberger
Copy link
Member

hoechenberger commented Jul 27, 2022

@larsoner you could create a new dev build on mne-feedstock, make a PR to the dev branch there
Then we can build the installers with latest main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants