Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerPC builds #8

Open
4 of 6 tasks
jaimergp opened this issue Oct 18, 2019 · 31 comments
Open
4 of 6 tasks

PowerPC builds #8

jaimergp opened this issue Oct 18, 2019 · 31 comments

Comments

@jaimergp
Copy link
Member

jaimergp commented Oct 18, 2019

ppc64le are technically possible, but in reality there are some barriers. I will collect relevant issues and PRs here.

All

  • doxygen has no ppc64le build yet. I am submitting a PR here.
  • We will also need to ask to build version 1.8.14, since .16 fails with current openmm.

CUDA

OpenCL

  • ocl-icd could be used to trigger the compilation of the OpenCL parts.

We could get an OpenCL + CPU build with relatively low effort if we fix doxygen and ocl-icd. Would this be enough?

@jakirkham
Copy link
Member

cc @jayfurmanek

@jayfurmanek
Copy link

I saw the doxygen build going. Looks like it timed out (10mins no output). A couple things we could try there:

  • set idle_timeout to 60 or something in the conda-forge.yaml
  • set it to use patchelf specifically. for the path patcher - thatmight help speed up that final phase there.

Also:
There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.

Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.

I don't know if anyone has tried ocl-icd on ppc64le. I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.

@jaimergp
Copy link
Member Author

jaimergp commented Oct 22, 2019

Thanks for the valuable feedback @jayfurmanek!

I saw the doxygen build going. Looks like it timed out (10mins no output).

We changed the provider to azure for ppc64le and, although it takes a couple of hours, it worked! Doxygen is not frequently updated, so I'd say it's ok to leave as is.

There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.

Didn't know that, nice! One less thing to worry about.

Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.

Is there any official way to use the IBM channels with conda-forge?

I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.

If that's the case (I didn't know that either) then you are right, then there is probably no point in trying until we have official CUDA builds in ppc64le.

Thanks again!

@jayfurmanek
Copy link

jayfurmanek commented Nov 5, 2019

The IBM channel is here:
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

There is a license that needs to be accepted at package install time with an environment variable. IBM_POWERAI_LICENSE_ACCEPT=yes

It currently has various levels of CUDA 10.1 for ppc64le and x86-64.

@giadefa
Copy link

giadefa commented Feb 25, 2021

Hi,
what is the state for the release of openmm for ppc64le? Here #36 (comment) there seem to be still shortcomings.

@jchodera
Copy link
Contributor

In particular, @giadefa pointed out that there are now new Power9 supercomputers with powerful GPUs:
https://www.hpc.cineca.it/hardware/marconi100

@jaimergp
Copy link
Member Author

I recall master is ready for PPC, but we need to cut a new release for that. See openmm/openmm#2993

@mrshirts
Copy link

We would love to start running on ORNL GPU's soon, so this would be great to get finalized!

@jayfurmanek
Copy link

Also, forge does have up to date cudatoolkit and ocl-icd packages for ppc64le now too, so I don't see any other blockers.

@jaimergp
Copy link
Member Author

Once openmm/openmm#2993 is accepted for release, I'll work on the CF machinery to put the PPC builds out there!

@jchodera
Copy link
Contributor

@peastman: Can we prioritize a 7.5.1 bugfix release to enable the ppc64le openmm toolchain to start building?

@peastman
Copy link
Contributor

The thing blocking 7.5.1 is finding someone with an ARM Mac who can test that. If we either drop the ARM Mac support, or clearly mark it as untested, we can move ahead with releasing 7.5.1.

@jaimergp
Copy link
Member Author

We can leave the existing warnings for 7.5.1 on arm64 and remove them when we have tested it thoroughly (either in a new build or in a new version).

@jchodera
Copy link
Contributor

+1 for just keeping the warnings. We've had the minimal tests run, and you didn't want us to send you an ARM machine, while I'm still months away from being allowed to use one by MSK. Let's get it out there so people can give us feedback.

@peastman
Copy link
Contributor

Ok!

@raimis
Copy link

raimis commented Apr 16, 2021

OpenMM 7.5.1rc1 is out (https://anaconda.org/conda-forge/openmm/files?version=7.5.1rc2), but I don't see the packages for PowerPC. Are we still on track to support PowerPC in OpenMM 7.5.1?

@jchodera
Copy link
Contributor

@peastman @jaimergp: Wasn't 7.5.1 supposed to have everything we need for ppc64le support?

@peastman
Copy link
Contributor

Yes, I thought it was building for it. @jaimergp do you know why it didn't?

@jaimergp
Copy link
Member Author

Because we (I) haven't rolled out support for CUDA on PPC yet. I was half hoping somebody else would do it while we fixed its support in OpenMM, but that didn't happen, so I'll get to it.

It shouldn't delay the release of the other builds though; I can work on it in the meantime.

@raimis
Copy link

raimis commented Apr 19, 2021

@jaimergp thanks for the update. Do you have an estimate when the PowerPC packages will be available?

@jaimergp
Copy link
Member Author

We need three (cascading) pieces of infrastructure:

So I can't give an estimate, but at least you can see the progress here.

@peastman
Copy link
Contributor

Thanks! No need to hold up anything else while we wait for it.

@raimis
Copy link

raimis commented May 14, 2021

@jaimergp

I see that conda-forge/docker-images#178 and conda-forge/nvcc-feedstock#66 have been merged. What is the situation with the last step?

@jaimergp
Copy link
Member Author

I am working on it. I'll submit a PR later!

@jaimergp
Copy link
Member Author

@raimis see #55

@tonigi
Copy link

tonigi commented Jun 10, 2022

PPC builds used to be made on CI and uploaded to conda-forge until 7.6.0 (and they worked great btw). This does not seem to be the case for 7.7.0 any more. Any chance to resume them?

@peastman
Copy link
Contributor

PPC builds no longer work when built with the compilers used by conda-forge. A lot of the test cases fail or segfault. They work fine when built using the standard system compilers. I've tried to track down the problem but without success. I believe it's caused by a compiler bug. Unfortunately, this means distributing PPC builds through conda-forge is now impossible

@tonigi
Copy link

tonigi commented Jun 10, 2022

Oh no. Is there a "single place" for the local build instructions? (I used to have an attempt at https://github.com/giorginolab/miniomm/wiki/%5BOBSOLETE%5D-Compiling-OpenMM-on-M100 , but not sure how much they can be trusted).

@peastman
Copy link
Contributor

Instructions on building from source are at http://docs.openmm.org/latest/userguide/library/02_compiling.html. We haven't done a survey of compilers to figure out which specific ones work and which fail. My general impression has been that gcc is buggier than clang, but that's based on only a few incidents. Once you build, be sure to do a make test. Using the conda-forge compilers with PPC, we get a bunch of test failures like these:

  1/9 Test #45: TestCpuCheckpoints ...............***Failed    0.24 sec
  exception: Particle coordinate is NaN.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan
  
      Start 48: TestCpuCustomManyParticleForce
  2/9 Test #47: TestCpuCustomGBForce .............***Exception: SegFault  2.25 sec
  
      Start 49: TestCpuCustomNonbondedForce
  3/9 Test #49: TestCpuCustomNonbondedForce ......***Failed    0.20 sec
  exception: Assertion failure at TestCustomNonbondedForce.h:103.   Expected [4500, 0, 0], found [0, 0, 0]

@tonigi
Copy link

tonigi commented Jun 11, 2022

By chance, is this a problem that only appears in CI? From what I understand conda-forge runs PPC64LE through emulation by default, which in my impression is buggy especially for numerics. A native (local) conda-build with conda-forge gcc 12.1.0-16 seems to work. (But there are other quirks, like CMake not finding CUDA)

@peastman
Copy link
Contributor

I don't know. I don't have access to an actual PPC Linux system, so the only way I'm able to test it is through emulation. I can say, though, that it has all the hallmarks of a compiler bug. For example, I store some values into memory, load that memory into a SIMD register, and the register ends up with the wrong values. But if I print out the memory locations I just stored to before loading them into the register, then it ends up with the right values. That's the sort of behavior you tend to see if there's a bug in the compiler's optimization stage. This also isn't the first time I've run into a bug in gcc on PPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants