Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FFTW from the Cray Toolchain Definition #1585

Merged
merged 4 commits into from
Feb 3, 2016

Conversation

pforai
Copy link
Contributor

@pforai pforai commented Jan 29, 2016

This removes FFTW from the Cray toolchain definition. The include original was done because of symmetry reasons with other toolchains. Also at that time we didn't have external modules support and did stuff at runtime within crayfftw.py that the external modules machinery does in a generic way.

We'll just need to update the HPL configs - those ones are the only ones that we supplied that are using FFTW and assumed FFTW to be loaded through toolchain instantiation.

@boegel boegel mentioned this pull request Jan 29, 2016
17 tasks
@hpcugentbot
Copy link
Contributor

EasyBuild framework unit test suite PASSed (see https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/2611/console for more details).

This pull request is now ready for review/testing.

Please try and find someone who can tackle this; contact @boegel if you're not sure what to do.

@gppezzi
Copy link
Contributor

gppezzi commented Feb 2, 2016

👍 This fixes (at least) one of our python bugs we have been seeing.. More details coming soon

@gppezzi
Copy link
Contributor

gppezzi commented Feb 3, 2016

👍 I'm confirming that this PR fixes the problems we had with h5py and when performing os calls on compute nodes.

It seems that loading fftw modules has some implications on the python behaviour.

Since the wrapper links many libraries against fftw, it is a lot of work to narrow down the list and find the guilty here, so it is better to remove it from the toolchain and just load it when really needed.

@boegel
Copy link
Member

boegel commented Feb 3, 2016

The actual problem to fix is hpcugent/vsc-base#216, right?

I'm OK with not including FFTW in the Cray toolchains, but that makes them different from the other 'full' toolchains in EB, and will make it harder for some stuff to switch back & forth (although --try-toolchain is already pretty much futile in that context).

@gppezzi
Copy link
Contributor

gppezzi commented Feb 3, 2016

@boegel the fftw issue and the hpcugent/vsc-base#216 are unrelated problems

We didn't open an issue because it was not clear where this was coming from, we had a few complaints about the Python build with EB and removing fftw fixes most of them.

Now it remains only the mpi4py to be fixed 😄

@boegel
Copy link
Member

boegel commented Feb 3, 2016

@gppezzi: ah, ok, then please clarify what problems you were running into that are fixed with removing FFTW from the toolchains, for future reference (a 🍪 for including error messages)

@gppezzi
Copy link
Contributor

gppezzi commented Feb 3, 2016

We could reproduce the error with a simple test (doing os calls, see below), but a similar problem was reported by a couple of users.

Here's one of the error messages we got.

aprun -n 1 python -c 'import os; os.system("echo test")'
Tue Feb  2 19:04:23 2016: [unset]:_pmi_alps_sync:alps response not OKAY
Tue Feb  2 19:04:23 2016: [unset]:_pmiu_daemon:_pmi_alps_sync failed
Tue Feb  2 19:04:23 2016: [PE_0]:_pmi_daemon_barrier:PE pipe read failed from daemon errno = Success
Tue Feb  2 19:04:23 2016: [PE_0]:_pmi_init:_pmi_daemon_barrier returned -1
test

On some builds/systems this simple test was always blocking and other cases it was randomly working (but showing these error messages).

@boegel boegel added this to the v2.7.0 milestone Feb 3, 2016
@boegel
Copy link
Member

boegel commented Feb 3, 2016

OK, thanks for the info, let's merge it in.

The Cray toolchains are already different from the others in various ways, so it's OK not having FFTW in there imho (especially since the toolchain support does very little extra with FFTW included).

Please issue a corresponding PR to redefine the definition of the Cray toolchain easyconfigs (or point me to it, if it's already there).

boegel added a commit that referenced this pull request Feb 3, 2016
Remove FFTW from the Cray Toolchain Definition
@boegel boegel merged commit 02f5b01 into easybuilders:develop Feb 3, 2016
@gppezzi
Copy link
Contributor

gppezzi commented Feb 3, 2016

Finally here's the list of .so linked against python with and without fftw.

IMO there's a lot of bloat there which python is not even using, it is adding more trouble..

## WITHOUT FFTW
ldd `which python` 
    linux-vdso.so.1 =>  (0x00002aaaaaaab000)
    libpython2.7.so.1.0 => /apps/dora/UES/5.2.UP04/sandbox-nofftw/easybuild/software/Python/2.7.11-CrayGNU-2015.11-XC-noFFTW/lib/libpython2.7.so.1.0 (0x00002aaaaaaae000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaaafa5000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab1a9000)
    libm.so.6 => /lib64/libm.so.6 (0x00002aaaab3ad000)
    libAtpSigHandler.so.0 => /opt/cray/lib64/libAtpSigHandler.so.0 (0x00002aaaab626000)
    librca.so.0 => /opt/cray/rca/default/lib64/librca.so.0 (0x00002aaaab82c000)
    libc.so.6 => /lib64/libc.so.6 (0x00002aaaaba31000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabdad000)
    /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)

## WITH FFTW
ldd `which python`
    linux-vdso.so.1 =>  (0x00002aaaaaaab000)
    libpython2.7.so.1.0 => /apps/dora/UES/5.2.UP04/easybuild/software/Python/2.7.10-CrayGNU-5.2.82/lib/libpython2.7.so.1.0 (0x00002aaaaaaae000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaaaf9a000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab19e000)
    libm.so.6 => /lib64/libm.so.6 (0x00002aaaab3a1000)
    libfftw3f_mpi.so.mpi31.3 => /opt/cray/lib64/libfftw3f_mpi.so.mpi31.3 (0x00002aaaab61b000)
    libfftw3f_threads.so.mpi31.3 => /opt/cray/lib64/libfftw3f_threads.so.mpi31.3 (0x00002aaaab830000)
    libfftw3f.so.mpi31.3 => /opt/cray/lib64/libfftw3f.so.mpi31.3 (0x00002aaaaba36000)
    libfftw3_mpi.so.mpi31.3 => /opt/cray/lib64/libfftw3_mpi.so.mpi31.3 (0x00002aaaac4d8000)
    libfftw3_threads.so.mpi31.3 => /opt/cray/lib64/libfftw3_threads.so.mpi31.3 (0x00002aaaac6ed000)
    libfftw3.so.mpi31.3 => /opt/cray/lib64/libfftw3.so.mpi31.3 (0x00002aaaac8f3000)
    libAtpSigHandler.so.0 => /opt/cray/lib64/libAtpSigHandler.so.0 (0x00002aaaad358000)
    librca.so.0 => /opt/cray/rca/default/lib64/librca.so.0 (0x00002aaaad55e000)
    libc.so.6 => /lib64/libc.so.6 (0x00002aaaad762000)
    libmpich_gnu_49.so.3 => /opt/cray/lib64/libmpich_gnu_49.so.3 (0x00002aaaadadf000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaae033000)
    /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
    libmpich_gnu_48.so.3 => /opt/cray/lib64/libmpich_gnu_48.so.3 (0x00002aaaae251000)
    libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00002aaaae7a5000)
    librt.so.1 => /lib64/librt.so.1 (0x00002aaaae9a9000)
    libugni.so.0 => /opt/cray/ugni/default/lib64/libugni.so.0 (0x00002aaaaebb2000)
    libudreg.so.0 => /opt/cray/udreg/default/lib64/libudreg.so.0 (0x00002aaaaee25000)
    libpmi.so.0 => /opt/cray/pmi/default/lib64/libpmi.so.0 (0x00002aaaaf02f000)
    libstdc++.so.6 => /opt/gcc/4.9.2/snos/lib64/libstdc++.so.6 (0x00002aaaaf26a000)
    libgcc_s.so.1 => /opt/gcc/4.9.2/snos/lib64/libgcc_s.so.1 (0x00002aaaaf582000)

@gppezzi
Copy link
Contributor

gppezzi commented Feb 3, 2016

@boegel I will create new PR for the basic toolchain and add also fftw to the external metadata file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants