Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information in README / getting this to run following instructions #3

Closed
geerlingguy opened this issue Jun 12, 2023 · 7 comments

Comments

@geerlingguy
Copy link
Contributor

geerlingguy commented Jun 12, 2023

One thing I had to dig to find is running sudo ldconfig prior to the step "Ensure successful installation of openmpi by executing the following commands." in the README.

Edit: I also had to run sudo apt install -y gfortran before compiling MPI. Maybe we could consider just throwing in instructions assuming default paths? E.g. wget [mpi download URL], then ./configure, make -j 4 all, then make install, then sudo ldconfig?

After that, when trying to make HPL with the provided Makefile, I got the error:

mpifort -DAdd_ -DF77_INTEGER=int -DStringSunStyle  -I/opt/hpl-2.3/include -I/opt/hpl-2.3/include/Altramax_oracleblis -I/opt/MyBlisDir/include/altramax  -fomit-frame-pointer -O3 -funroll-loops -W -Wall -o /opt/hpl-2.3/bin/Altramax_oracleblis/xhpl HPL_pddriver.o         HPL_pdinfo.o           HPL_pdtest.o /opt/hpl-2.3/lib/Altramax_oracleblis/libhpl.a -L/opt/MyBlisDir/lib/altramax -lblis 
--------------------------------------------------------------------------
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--------------------------------------------------------------------------
make[6]: *** [Makefile:76: dexe.grd] Error 1
make[6]: Leaving directory '/opt/hpl-2.3/testing/ptest/Altramax_oracleblis'
make[5]: *** [Make.top:68: build_tst] Error 2
make[5]: Leaving directory '/opt/hpl-2.3'
make[4]: *** [Makefile:73: build] Error 2
make[4]: Leaving directory '/opt/hpl-2.3'
make[3]: *** [Make.top:54: build_src] Error 2
make[3]: Leaving directory '/opt/hpl-2.3'
make[2]: *** [Makefile:72: build] Error 2
make[2]: Leaving directory '/opt/hpl-2.3'
make[1]: *** [Make.top:54: build_src] Error 2
make[1]: Leaving directory '/opt/hpl-2.3'
make: *** [Makefile:72: build] Error 2
@geerlingguy
Copy link
Contributor Author

Ah, that could be from the fortran compiler not working:

root@adlink-ampere:/opt/hpl-2.3# mpifort --version
--------------------------------------------------------------------------
No underlying compiler was specified in the wrapper compiler data file
(e.g., mpicc-wrapper-data.txt)
--------------------------------------------------------------------------

I installed gfortran with sudo apt install -y gfortran, but I'm still getting that error.

@geerlingguy
Copy link
Contributor Author

I had to install gfortran, then also recompile mpi, and now that is working. Adding that to the suggestions in the original comment.

@geerlingguy
Copy link
Contributor Author

geerlingguy commented Jun 12, 2023

Now I'm running into:

./xhpl: error while loading shared libraries: libblis.so.4: cannot open shared object file: No such file or directory

The file seems to be there...

# ls /opt/MyBlisDir/lib/altramax
libblis.a  libblis.so  libblis.so.4

And I can confirm I compiled with make arch=Altramax_oracleblis -j

@amperelu
Copy link

I am not very familiar with blis, but in general, I will use LD_LIBRARY_PATH to guide the searching of .so, especially when developing applications.

export LD_LIBRARY_PATH=/opt/MyBlisDir/lib/altramax;$LD_LIBRARY_PATH

@dneary
Copy link

dneary commented Jun 12, 2023

I have not used Fortran since I was in college, and have not tried compiling HPL. Maybe @kokrysa knows more about the MPL stuff, as it related to AI?

@rbapat-ampere
Copy link
Collaborator

rbapat-ampere commented Jun 13, 2023

@geerlingguy . Thanks for the recommendations.

As for the problem with blis, try :

export LD_LIBRARY_PATH=/usr/local/lib:/opt/MyBlisDir/lib/altramax:$LD_LIBRARY_PATH

One way to check if all the libraries that the binary needs are loaded is to do a ldd. For eg.

$ ldd xhpl
        linux-vdso.so.1 (0x0000ffffb21af000)
        libblis.so.4 => not found
        libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x0000ffffb1fc0000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb1e10000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb2176000)
        libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x0000ffffb1d40000)
        libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x0000ffffb1c20000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb1b80000)

As seen from the above snippet. It did not find "libblis.so.4" , which I believe is the issue that you are facing.
So now if you execute the above export command

export LD_LIBRARY_PATH=/usr/local/lib:/opt/MyBlisDir/lib/altramax

you should see something like this.

ldd xhpl
        linux-vdso.so.1 (0x0000ffffb9706000)
        libblis.so.4 => /opt/MyBlisDir/lib/altramax/libblis.so.4 (0x0000ffffb94d0000)
        libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x0000ffffb9380000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb91b0000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb96cd000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb9110000)
        libgomp.so.1 => /lib/aarch64-linux-gnu/libgomp.so.1 (0x0000ffffb90b0000)
        libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x0000ffffb8fe0000)
        libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x0000ffffb8ec0000)

Hope this helps.

@geerlingguy
Copy link
Contributor Author

@rbapat-ampere - Indeed that was it! I am getting 985 Gflops now at 270W power consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants