Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation error when Julia is built from source #5

Closed
yurivish opened this issue Jun 24, 2018 · 144 comments
Closed

Installation error when Julia is built from source #5

yurivish opened this issue Jun 24, 2018 · 144 comments

Comments

@yurivish
Copy link

I tried installing Arpack on 0.7 beta and saw this error trying to run eigs:

julia> A = Diagonal(1:4);
WARNING: Base.Diagonal is deprecated: it has been moved to the standard library package `LinearAlgebra`.
Add `using LinearAlgebra` to your imports.
 in module Main

julia> λ, ϕ = eigs(A, nev = 2);
ERROR: error compiling saupd: could not load library ""
dlopen(.dylib, 1): image not found
Stacktrace:
 [1] aupd_wrapper(::Any, ::getfield(Arpack, Symbol("#matvecA!#24")){LinearAlgebra.Diagonal{Float64,Array{Float64,1}}}, ::getfield(Arpack, Symbol("##18#25")), ::getfield(Arpack, Symbol("##19#26")), ::Int64, ::Bool, ::Bool, ::String, ::Int64, ::Int64, ::String, ::Float64, ::Int64, ::Int64, ::Array{Float64,1}) at /Users/yurivish/.julia/packages/Arpack/Rkbg/src/libarpack.jl:42
 [2] #_eigs#17(::Int64, ::Int64, ::Symbol, ::Float64, ::Int64, ::Nothing, ::Array{Float64,1}, ::Bool, ::Any, ::Any, ::Any) at /Users/yurivish/.julia/packages/Arpack/Rkbg/src/Arpack.jl:176
 [3] (::getfield(Arpack, Symbol("#kw##eigs")))(::NamedTuple{(:nev,),Tuple{Int64}}, ::typeof(eigs), ::LinearAlgebra.Diagonal{Int64,UnitRange{Int64}}) at ./none:0
 [4] top-level scope at none:0

To see if this was solved on master, I tried installing it. But this happens when I try to develop Arpack#master:

Error: Error building `Arpack`:
│ [ Info: Downloading https://github.com/JuliaLinearAlgebra/ArpackBuilder/releases/download/v3.5.0-0.2.20/ArpackBuilder.x86_64-apple-darwin14.tar.gz to /Users/yurivish/.julia/dev/Arpack/deps/usr/downloads/ArpackBuilder.x86_64-apple-darwin14.tar.gz...
│ ┌ Warning: `wait(t::Task)` is deprecated, use `fetch(t)` instead.
│ │   caller = macro expansion at OutputCollector.jl:63 [inlined]
│ └ @ Core OutputCollector.jl:63
│ ┌ Warning: `wait(t::Task)` is deprecated, use `fetch(t)` instead.
│ │   caller = wait(::OutputCollector) at OutputCollector.jl:158
│ └ @ BinaryProvider OutputCollector.jl:158
│ ┌ Warning: `wait(t::Task)` is deprecated, use `fetch(t)` instead.
│ │   caller = wait(::OutputCollector) at OutputCollector.jl:159
│ └ @ BinaryProvider OutputCollector.jl:159
│ ┌ Warning: `wait(t::Task)` is deprecated, use `fetch(t)` instead.
│ │   caller = wait(::OutputCollector) at OutputCollector.jl:163
│ └ @ BinaryProvider OutputCollector.jl:163
│ ERROR: LoadError: LibraryProduct(nothing, ["libarpack"], :libarpack, "Prefix(/Users/yurivish/.julia/dev/Arpack/deps/usr)") is not satisfied, cannot generate deps.jl!
│ Stacktrace:
│  [1] error at ./error.jl:33 [inlined]
│  [2] #write_deps_file#134(::Bool, ::Function, ::String, ::Array{LibraryProduct,1}) at /Users/yurivish/.julia/packages/BinaryProvider/2Hlv/src/Products.jl:389
│  [3] write_deps_file(::String, ::Array{LibraryProduct,1}) at /Users/yurivish/.julia/packages/BinaryProvider/2Hlv/src/Products.jl:376
│  [4] top-level scope at none:0
│  [5] include at ./boot.jl:317 [inlined]
│  [6] include_relative(::Module, ::String) at ./loading.jl:1075
│  [7] include(::Module, ::String) at ./sysimg.jl:29
│  [8] include(::String) at ./client.jl:393
│  [9] top-level scope at none:0
│ in expression starting at /Users/yurivish/.julia/dev/Arpack/deps/build.jl:40
julia> versioninfo()
Julia Version 0.7.0-beta.0
Commit f41b1ecaec (2018-06-24 01:32 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin16.7.0)
  CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)
Environment:
  JULIA_EDITOR = subl
@andreasnoack
Copy link
Member

Might be dup of #3. Did you build from source? What do you get from otool -L usr/lib/libopenblas64_.dylib?

@bicycle1885
Copy link

I encountered the same problem on macOS, 0.7.0-beta.0.

~/v/julia ((v0.7.0-beta)|…) $ otool -L usr/lib/libopenblas64_.dylib
usr/lib/libopenblas64_.dylib:
        @rpath/libopenblas64_.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/local/opt/gcc/lib/gcc/6/libgfortran.3.dylib (compatibility version 4.0.0, current version 4.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
        /usr/local/opt/gcc/lib/gcc/6/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/local/lib/gcc/6/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)

Do we need to wait for the official binaries of Julia 0.7.0-beta?

@andreasnoack
Copy link
Member

Do we need to wait for the official binaries of Julia 0.7.0-beta?

Either that or build your OpenBLAS with GCC 7.

@bicycle1885
Copy link

Thank you. I will try.

@yurivish
Copy link
Author

@andreasnoack this was with a build from source. Thanks to the pointer to Viral's issue – I didn't see it since it was closed.

$ otool -L usr/lib/libopenblas64_.dylib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/objdump: 'usr/lib/libopenblas64_.dylib': No such file or directory

@andreasnoack
Copy link
Member

@yurivish You'll need to adjust the path to match the usr directory within your root Julia directory.

@blakejohnson
Copy link

$ otool -L usr/lib/libopenblas64_.dylib 
usr/lib/libopenblas64_.dylib:
	@rpath/libopenblas64_.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/local/opt/gcc/lib/gcc/7/libgfortran.4.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
	/usr/local/opt/gcc/lib/gcc/7/libquadmath.0.dylib (compatibility version 1.0.0, current version 1.0.0)
	/usr/local/lib/gcc/7/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)

@bicycle1885
Copy link

FYI, Julia 0.7-beta binary for macOS is now available (https://julialang.org/downloads/), which solves the problem.

@andreasnoack
Copy link
Member

@blakejohnson Things are more wrong than first anticipated. So the system libgfortran is apparently not in the RPATH. I'm able to get it working with DYLD_LIBRARY_PATH=/usr/local/opt/gcc@7/lib/gcc/7 julia-dev which is, of course, not a real solution but it identifies the issue here. Will have to hear @staticfloat what a proper solution would look like.

@RalphAS
Copy link

RalphAS commented Jun 27, 2018

Another possible workaround is to copy (or maybe hard link) libgfortran.so.4 from another Julia package (e.g. SpecialFunctions) into Arpack/XXXX/deps/usr/lib and then run pkg> build Arpack. This appears to be working for me, with Julia built from source on Linux.

This was referenced Jun 30, 2018
@maleadt
Copy link

maleadt commented Jul 2, 2018

Just for completeness (coming from #9), this obviously also occurs on Linux when not building with GCC 7. Doesn't seem easy to work around, with eg. Arch Linux' gcc7 package not providing libgfortran.4, and Debian Stable using GCC 6 / libgfortran.3 with no means to (safely) install GCC 7.

@andreasnoack andreasnoack changed the title Installation error on macOS Installation error when Julia is built from source Jul 2, 2018
@simonbyrne
Copy link
Collaborator

Until we fix this, could we throw a more informative error?

@juliohm
Copy link

juliohm commented Jul 11, 2018

The new tag of Arpack.jl now fails to compile on Julia binaries as well:

julia> using Arpack
[ Info: Precompiling module Arpack
ERROR: LoadError: No deps.jl file could be found. Please try running Pkg.build("Arpack").
Currently, the build command might fail when Julia has been built from source
and the recommendation is to use the official binaries from julialang.org.
For more info see https://github.com/JuliaLinearAlgebra/Arpack.jl/issues/5.

Julia version:

julia> versioninfo()
Julia Version 0.7.0-beta.182
Commit feaee9ebbc (2018-07-06 12:39 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 2

@staticfloat
Copy link
Member

Did you actually run Pkg.build("Arpack")?

@juliohm
Copy link

juliohm commented Jul 11, 2018

I did, there is no error message, and the build.log is empty.

@andreasnoack
Copy link
Member

It the test for the deps.jl file that is wrong. See #22

@sbromberger
Copy link

This may be stating the obvious, but the problem still exists with 1.4 on a Julia built from source (which is the only way I can do it right now). Is there a workaround for those of us in this unfortunate position?

@staticfloat
Copy link
Member

You can try downloading the official Julia binaries, extracting the libgfortran.so.4 file, and putting it into your from-source Julia installation's lib directory. That should allow your arpack to find libgfortran.so.4 and get loaded; however all bets are off as to whether that will cause Julia to freak out or not due to there being multiple versions of certain symbols available within the process space. You'll just have to try it and find out until we have a better solution available.

@sbromberger
Copy link

Thank you for the followup.

You can try downloading the official Julia binaries, extracting the libgfortran.so.4 file, and putting it into your from-source Julia installation's lib directory.

Unfortunately I won't be able to do that due to environmental restrictions here.

I know people are working on this, but to set expectations, is there an approximate timeframe? The lack of a working eigs is causing failures in LightGraphs; if a fix is expected to take longer than a few more days, then it might make sense just to pull all uses of eigs out of the package in order to make progress in reviewing current PRs.

@staticfloat
Copy link
Member

if a fix is expected to take longer than a few more days

It will definitely be more than a few days; it's very complex to create infrastructure to support every reasonable toolchain (rather than a single reasonable toolchain, which is what we do right now); think single-digit months rather than single-digit days.

If it is the official Julia distribution itself that is the problem, you don't have to get libgfortran.so.4 from that source. You could get it from your distribution's GCC channel, you could grab it from any other piece of software on your computer, or even download the GCC 7 source and compile it yourself to get libgfortran and copy that somewhere it can be used.

If none of those solutions will work in your environment, then I'm afraid you're just going to have to wait.

@sbromberger
Copy link

Thanks. That's disappointing but I'm not in a position to help other than state my use case.

it's very complex to create infrastructure to support every reasonable toolchain

I'm curious about this. I wouldn't think that downloading and building the Julia source code would be so far outside the norm that it's not part of the testing strategy. Are people doing something else? Because the upshot is that over 600 packages are using Arpack, and none of them will work on source-built installations.

(This is not really so much a complaint as a question of whether our environment is really that strange.)

@chriselrod
Copy link

Would it be feasible to support building Arpack from source as a generic fallback instead of downloading the binary?

@staticfloat
Copy link
Member

staticfloat commented Jul 13, 2018

I wouldn't think that downloading and building the Julia source code would be so far outside the norm that it's not part of the testing strategy.

Building Julia from source works just fine with these binaries, as long as you use a compiler version that is compatible with the binaries (e.g. GCC 7.x) or alternatively, are able to install the compiler support libraries (libgfortran, libgcc_s, etc...). Combine that with supporting the official Julia binaries, and we've covered a very large portion of the ecosystem. I don't have the hard numbers to tell you exactly how much of the ecosystem that covers, but anecdotally, I would guesstimate that the great majority of users use the precompiled binaries, and of those that don't, a good chunk are able to coerce their build system to be GCC 7-compatible.

This transition pain is considered worthwhile because the alternatives (using distro-provided package managers, building from source on user's machines, etc...) run into new issues constantly. It is a very large, ongoing, maintenance burden for us to constantly fix build scripts, debug user's broken toolchains, etc... It is much simpler for everyone if the binaries are built once, and simply distributed to users in the format they were going to compile down to anyway.

This of course raises the compiler compatibility issues, but I will also point out those compiler compatibility issues already existed in the "bad old days"; if you were using the official Julia binaries (as most users are) and you have to compile Arpack, you can run into similar problems. Not the exact same ones (where libraries are missing) but more subtle ones, where symbol duplication means that when Arpack tries to call a function whose functionality has changed between two compiler ABI versions, it just functions incorrectly/crashes.

Would it be feasible to support building Arpack from source as a generic fallback instead of downloading the binary?

Ah, that gives me another idea; Seth, for your particular use case, I'm sure you can manually compile Arpack and just insert it directly into Arpack.jl's deps/usr folder. Do something like this (yes, there's an obscenely long line in these instructions, oh well). Also I suggest doing a pkg> dev Arpack first, so that we can iterate on Arpack itself.

# First, clone arpack-ng and checkout a known good gitsha
git clone https://github.com/opencollab/arpack-ng.git arpack-ng
cd arpack-ng
git checkout b095052372aa95d4281a645ee1e367c28255c947

# We'll build inside the `build` directory
mkdir build; cd build/

# We're going to install to ~/.julia/dev/Arpack/deps/usr.  Change this if you have a different target installation location
prefix=$(echo ~/.julia/dev/Arpack/deps/usr)

# We need to link against OpenBLAS; get that from Julia: (Note; this only works for Julia 0.7)
openblas_dir=$(julia -e 'using Libdl; println(abspath(dirname(Libdl.dlpath(Libdl.dlopen(Base.libblas_name)))))') 

# Use cmake to configure, with a huge number of symbol renames for the fortran code.
cmake .. -DCMAKE_INSTALL_PREFIX=$prefix -DBUILD_SHARED_LIBS=ON -DBLAS_LIBRARIES="-L$openblas_dir -lopenblas64_" -DLAPACK_LIBRARIES="-L$prefix/lib -lopenblas64_" -DCMAKE_Fortran_FLAGS="-O2 -fPIC -ffixed-line-length-none -cpp -fdefault-integer-8 -Dsaxpy=saxpy_64 -Ddaxpy=daxpy_64  -Dscopy=scopy_64 -Ddcopy=dcopy_64  -Dsgemv=sgemv_64 -Ddgemv=dgemv_64  -Dsgeqr2=sgeqr2_64 -Ddgeqr2=dgeqr2_64  -Dslacpy=slacpy_64 -Ddlacpy=dlacpy_64  -Dslahqr=slahqr_64 -Ddlahqr=dlahqr_64  -Dslanhs=slanhs_64 -Ddlanhs=dlanhs_64  -Dslarnv=slarnv_64 -Ddlarnv=dlarnv_64  -Dslartg=slartg_64 -Ddlartg=dlartg_64  -Dslascl=slascl_64 -Ddlascl=dlascl_64  -Dslaset=slaset_64 -Ddlaset=dlaset_64  -Dsscal=sscal_64 -Ddscal=dscal_64  -Dstrevc=strevc_64 -Ddtrevc=dtrevc_64  -Dstrmm=strmm_64 -Ddtrmm=dtrmm_64  -Dstrsen=strsen_64 -Ddtrsen=dtrsen_64  -Dsgbmv=sgbmv_64 -Ddgbmv=dgbmv_64  -Dsgbtrf=sgbtrf_64 -Ddgbtrf=dgbtrf_64  -Dsgbtrs=sgbtrs_64 -Ddgbtrs=dgbtrs_64  -Dsgttrf=sgttrf_64 -Ddgttrf=dgttrf_64  -Dsgttrs=sgttrs_64 -Ddgttrs=dgttrs_64  -Dspttrf=spttrf_64 -Ddpttrf=dpttrf_64  -Dspttrs=spttrs_64 -Ddpttrs=dpttrs_64  -Dsdot=sdot_64 -Dddot=ddot_64  -Dsger=sger_64 -Ddger=dger_64  -Dslabad=slabad_64 -Ddlabad=dlabad_64  -Dslaev2=slaev2_64 -Ddlaev2=dlaev2_64  -Dslamch=slamch_64 -Ddlamch=dlamch_64  -Dslanst=slanst_64 -Ddlanst=dlanst_64  -Dslanv2=slanv2_64 -Ddlanv2=dlanv2_64  -Dslapy2=slapy2_64 -Ddlapy2=dlapy2_64  -Dslarf=slarf_64 -Ddlarf=dlarf_64  -Dslarfg=slarfg_64 -Ddlarfg=dlarfg_64  -Dslasr=slasr_64 -Ddlasr=dlasr_64  -Dsnrm2=snrm2_64 -Ddnrm2=dnrm2_64  -Dsorm2r=sorm2r_64 -Ddorm2r=dorm2r_64  -Dsrot=srot_64 -Ddrot=drot_64  -Dssteqr=ssteqr_64 -Ddsteqr=dsteqr_64  -Dsswap=sswap_64 -Ddswap=dswap_64  -Dcaxpy=caxpy_64 -Dzaxpy=zaxpy_64  -Dccopy=ccopy_64 -Dzcopy=zcopy_64  -Dcgemv=cgemv_64 -Dzgemv=zgemv_64  -Dcgeqr2=cgeqr2_64 -Dzgeqr2=zgeqr2_64  -Dclacpy=clacpy_64 -Dzlacpy=zlacpy_64  -Dclahqr=clahqr_64 -Dzlahqr=zlahqr_64  -Dclanhs=clanhs_64 -Dzlanhs=zlanhs_64  -Dclarnv=clarnv_64 -Dzlarnv=zlarnv_64  -Dclartg=clartg_64 -Dzlartg=zlartg_64  -Dclascl=clascl_64 -Dzlascl=zlascl_64  -Dclaset=claset_64 -Dzlaset=zlaset_64  -Dcscal=cscal_64 -Dzscal=zscal_64  -Dctrevc=ctrevc_64 -Dztrevc=ztrevc_64  -Dctrmm=ctrmm_64 -Dztrmm=ztrmm_64  -Dctrsen=ctrsen_64 -Dztrsen=ztrsen_64  -Dcgbmv=cgbmv_64 -Dzgbmv=zgbmv_64  -Dcgbtrf=cgbtrf_64 -Dzgbtrf=zgbtrf_64  -Dcgbtrs=cgbtrs_64 -Dzgbtrs=zgbtrs_64  -Dcgttrf=cgttrf_64 -Dzgttrf=zgttrf_64  -Dcgttrs=cgttrs_64 -Dzgttrs=zgttrs_64  -Dcpttrf=cpttrf_64 -Dzpttrf=zpttrf_64  -Dcpttrs=cpttrs_64 -Dzpttrs=zpttrs_64  -Dcdotc=cdotc_64 -Dzdotc=zdotc_64  -Dcgeru=cgeru_64 -Dzgeru=zgeru_64  -Dcunm2r=cunm2r_64 -Dzunm2r=zunm2r_64  -DSCOPY=SCOPY_64 -DDCOPY=DCOPY_64  -DSLABAD=SLABAD_64 -DDLABAD=DLABAD_64  -DSLAMCH=SLAMCH_64 -DDLAMCH=DLAMCH_64  -DSLANHS=SLANHS_64 -DDLANHS=DLANHS_64  -DSLANV2=SLANV2_64 -DDLANV2=DLANV2_64  -DSLARFG=SLARFG_64 -DDLARFG=DLARFG_64  -DSROT=SROT_64 -DDROT=DROT_64  -DSGEMV=SGEMV_64 -DDGEMV=DGEMV_64 -Dscnrm2=scnrm2_64 -Ddznrm2=dznrm2_64 -Dcsscal=csscal_64 -Dzdscal=zdscal_64"

# Do the make install
make install

Once you've verified that the necessary files have been dumped into your Arpack's deps/usr directory, you can write your own deps/deps.jl file:

## This file autogenerated by BinaryProvider.write_deps_file().
## Do not edit.
##
## Include this file within your main top-level source, and call
## `check_deps()` from within your module's `__init__()` method
const Arpack = joinpath(dirname(@__FILE__), "usr/lib64/libarpack.so")
function check_deps()
    global Arpack
    if !isfile(Arpack)
        error("$(Arpack) does not exist, Please re-run Pkg.build(\"Arpack\"), and restart Julia.")
    end

    if Libdl.dlopen_e(Arpack) == C_NULL
        error("$(Arpack) cannot be opened, Please re-run Pkg.build(\"Arpack\"), and restart Julia.")
    end
end

Note that on my machine, the make install process put libarpack.so into deps/usr/lib64, not deps/usr/lib. You may have to change that path to match your machine's output.

@StefanKarpinski
Copy link
Contributor

StefanKarpinski commented Apr 5, 2019

One concrete issue is that Julia has certain requirements for how BLAS and Arpack are built and those requirements are often not satisfied by system libraries. We could configure Julia to work around the inconsistencies of system libraries on various systems but that just forces those inconsistencies onto our users, which we do not want to do. We have chosen instead of ship our own versions of such libraries so that we provide a maximally consistent and portable experience to our users—and they do appreciate it. Because of this philosophy, Julia code runs very much the same everywhere. The Celeste project, for example was ported from running on Cori to running on Microsoft Azure in less than one developer-week. That would be unthinkable with your traditional HPC application that ends up being deeply coupled to the precise system it was built to run on. A large part of why that was possible was the philosophy that Julia has of trying to provide the same programming experience everywhere.

We're not unwilling to support this kind of use case but doing so is a fair amount of extra work and there is only so much time, money and manpower available so we have to pick and choose our battles. The battle we have chosen to fight first is to make Julia "just work" for the vast majority of users who do not run their code on supercomputers or need to link against specially tuned system libraries. This approach provides the most benefit to the most people. And indeed, Julia users love its package manager because they install packages and things mostly do just work.

However funding is not a fixed thing. If the national labs have special requirements, perhaps there is funding available to help get the work done to support their special use cases. There are companies (Julia Computing is one such) who can help with this kind of work and are more than willing to upstream any improvements made to Julia's build systems to the open source project.

@Keno
Copy link

Keno commented Apr 5, 2019

We actively support julia users on several top HPC systems and I personally have talked extensively with the relevant sysadmins about requirements. As a result, I do think we understand the challenges involved in getting Julia running on systems of these sorts and indeed supporting this was an explicit design goal of the Pkg and BinaryBuilder work. However, that feature is not complete yet and will certainly not be started until after we have resolved JuliaLang/Pkg.jl#841. Our focus right now is to make julia work out of the box on as many systems as possible in order to target the most number of users possible, including users on HPC systems. We have made enormous progress in this regard to the point that julia does tend to "just work" in most places. Yes, that means that it is currently harder to support centrally administered software repositories (whether on HPC systems or in Linux distros, etc), but that is simply an indication that we haven't gotten to that part yet, because it's fairly niche for our user community (especially as julia does tend to work for these users, even if using slightly suboptimal library installations which may or may not matter). We will be able to resolve them in due time, but in the mean time, additional suggestions that we don't know what we're doing or don't care about our users are not particularly productive.

@heroxbd
Copy link

heroxbd commented Apr 5, 2019

@StefanKarpinski @Keno Thank you for your constructive responses. I understand your point now. Would you please leave this issue open, so that potential GNU/Linux distribution developers and users could find this bug easily? Denying the existance of this bug puts contributions off.

In the free software community, people volunteer: funding is a good-to-have but not a must to work. I have seen heroic efforts from @cdluminate to introduce Julia to the Debian community, with millions of potential users. I could vision we work toward upstreaming @cdluminate's patches to honestly solve this bug. We care about Julia and we are willing to help.

@Keno, just to comment that it is not "slightly suboptimal", the suboptimal library could easily drag performance down by 10x, e.g. in @casparvl's case.

@StefanKarpinski
Copy link
Contributor

StefanKarpinski commented Apr 6, 2019

I have never seen a 10x difference in performance because of the way a library was configured or built—frankly, that’s a bit implausible. If that’s the case then either the default we ship should be much better or the “optimized” version is doing something very sketchy and wrong.

@heroxbd
Copy link

heroxbd commented Apr 6, 2019

I have never seen a 10x difference in performance because of the way a library was configured or built—frankly, that’s a bit implausible. If that’s the case then either the default we ship should be much better or the “optimized” version is doing something very sketchy and wrong.

Stefan, there is no need to limit our discussion within reconfiguring and rebuilding a library. This issue cares more about substituting a generic Julia-dependent library with a much more specialized, fine-tuned and optimized one. For example, for netlib blas-reference vs BLIS, the difference will be even more than 10x on exotic platforms like Intel Xeon Phi KNL. In general a 10x difference is not an exaggeration. Scientists and engineers in supercomputing centers work full time to tune performance for a reason.

Apart from performance, portability and easiness for building from source is even more important to users.

@StefanKarpinski
Copy link
Contributor

StefanKarpinski commented Apr 6, 2019

C'mon, man. Nobody is arguing that Julia users should use reference BLAS. Julia has never—even from the very earliest days—shipped with a reference BLAS because we're not incompetent and we care deeply about performance. A huge amount of work has been done to find and correctly configure high-quality, high-performance, portable libraries like OpenBLAS, dSFMT, etc. We even went so far as creating our own openlibm library because no existing libm at the time met the requirements of being fast, portable, open source and liberally licensed (since then many other projects have adopted openlibm). So talking about stuff like reference BLAS is just a straw man argument and a lecture about "scientists and engineers in supercomputing centers work full time to tune performance for a reason" is just obnoxious and unwarranted. Regarding Xeon Phi KNL, yes, that's what Cori has and @Keno got Julia running at 1.5 petaflops on that machine for the Celeste project. So we're well aware of what it takes to get Julia running on HPC systems. It's fine to discuss how to make it easier to use custom libraries on HPC systems but please stick to the actual situation rather than bogus claims that what we are shipping by default is 10x slower.

@casparvl
Copy link

casparvl commented Apr 8, 2019

@StefanKarpinski @Keno Thanks a lot for your constructive responses. First of all,

additional suggestions that we don't know what we're doing or don't care about our users are not particularly productive.

if that was how my post came accross, I apologize. I know that developers in general care a lot about their projects. Earlier in this thread, there was an argument that 'there are so many names for BLAS libraries', suggesting that this would be the problem - and that, I could not understand. But clearly, that is not the main issue. The main issue as I understand it now seems to be

  1. Technical

Julia has certain requirements for how BLAS and Arpack are built

and although I don't personally grasp the details of this statement (don't these libs have standard APIs?) I trust you know far better than I do.

  1. Design choice

We have chosen instead of ship our own versions of such libraries so that we provide a maximally consistent and portable experience to our users—and they do appreciate it.

This, clearly, is a fair design choice and far be it from me to argue with that. Apart from performance, I also encounter users with portability issues, so I understand the value of this design choice.

I know that a lot of these things are a matter of priority, because resources are always limited. And, one key factor in determining priorities is to know what is important to your users (the exact same applies to us, as an HPC center by the way). I would like my above post to be a signal that yes, linking to low level libraries could be important to a (group of) users, namely those on university clusters & HPC systems. They may not be your biggest or most important users group - you probably know that better than I do - but running on such systems has some weird implications that do not apply to 'normal' users.

For example, I do not assume that linking to system libraries will result in a 10x speedup in a typical case, but what you have to realize is that on a typical HPC system, if one can achieve a 5% speedup of all the software running on that system, that means one could save anywhere between tens of thousands and millions of USD/Euro in hardware, depending on the size of the system. Thus, even though a lot of software download instructions say 'just download the binaries, because it is easier', the HPC community tends to be willing to spend the extra effort and compile from source. In 99/100 cases, this is worthwhile because software developers themselves regularly focus on functionality, rather than performance. I'm glad to see that Julia is in the 1%.

@wsshin
Copy link

wsshin commented May 3, 2019

I have a question related to this thread. On one Mac, I have been building Arpack.jl successfully by installing gcc@7 by homebrew and setting DYLD_LIBRARY_PATH as instructed in #5 (comment), but on another Mac (with the same macOS version), the same trick does not work.

By inspecting ~/.julia/dev/Arpack/deps/usr/logs/ArpackBuilder.log on both Macs, I noticed the line

-- The Fortran compiler identification is GNU 7.3.0` 

in the first Mac as desired, but

-- The Fortran compiler identification is GNU 8.1.0`

in the second Mac. Apparently pointing DYLD_LIBRARY_PATH to gcc@7 is not forcing the second Mac to use GCC 7.

Am I missing something? Any ideas?

@staticfloat
Copy link
Member

Apparently pointing DYLD_LIBRARY_PATH to gcc@7 is not forcing the second Mac to use GCC 7.

DYLD_LIBRARY_PATH changes which libraries get loaded at runtime, but when invoking a compiler, you need to set PATH to point to where that compiler lives. Try running which -a gfortran to see where all the gfortran's are that live on your machine, and set PATH to point to your gfortran-7. (You may also be able to set export FC=/path/to/gfortran-7).

@wsshin
Copy link

wsshin commented May 6, 2019

@staticfloat, on both Macs the directory of the homebrew-installed gfortran is the top PATH directory. I also tried export FC=/path/to/gfortran-7 but it didn't work.

I even tried to link gfortran to gfortran-7, but the build log on the problematic Mac still says

-- The Fortran compiler identification is GNU 8.1.0

and I did pkg> build Arpack after pkg> rm Arpack, removing the ~/.julia/dev/Arpack directory, and pkg> dev Arpack to make sure that any build configurations from the previous build attempt does not affect the current build attempt.

Any ideas how to force the Fortran version to use for building Arpack?

@wsshin
Copy link

wsshin commented May 9, 2019

Any ideas?

@staticfloat
Copy link
Member

@wsshin So that we don't bother other people, let's open a new issue. I will briefly note that I do not think you are actually compiling anything, since the log file location leads me to believe that you are instead getting some precompiled binaries. I'm willing to bet that you compiled julia itself with GCC 8, and so it is downloading GCC 8-compatible binaries for use with Julia. In any case, please open a new issue and give as much information as possible, including how Julia was installed, what system you're running on, what compilers you have installed, etc...

@puckvg
Copy link

puckvg commented Aug 8, 2019

I understand why this bug is closed, but I do want to say it's still an issue for ordinary people. I'm trying to run something on a university cluster and just enabling the prepackaged lapack doesn't get this working. I'm working through variations of the suggestions above, and I'll post when I get this fixed, but it's not easy.

In other words — this shouldn't take priority over open bugs, but if there is a chance for a better fix in the future, that would still be great.

....

OK, it seems I've resolved this issue, though my code still isn't running. I ended up dowloading the arpack rpm, extract / configure / make, and symlinking, as suggested by staticfloat above (#5 (comment)). Now I'm getting a different and probably unrelated error: julia: symbol lookup error: /n/sw/intel-cluster-studio-2017/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs

I'm getting the same error 'julia: symbol lookup error: /home/puck/.julia/packages/Arpack/cu5By/deps/usr/lib/libarpack.so: undefined symbol: dlamch_64_'

Did you ever fix it?

@nalimilan
Copy link
Contributor

@puckvg What's your configuration?

@mauro3
Copy link

mauro3 commented Dec 4, 2019

This seems to also happen on Cirrus CI:
https://cirrus-ci.com/task/5116177080123392?command=test#L69

@wi11dey
Copy link

wi11dey commented Apr 19, 2022

I've found a workaround for those like @carstenbauer (and myself) who are on university clusters and can't simply make a symlink /usr/lib/libopenblas64_.so to the real location of OpenBLAS:

  1. Go to your ~/.julia folder
  2. find -iname 'libarpack.so to find where the libarpack.so that is trying to reference libopenblas64_.so lives
  3. patchelf --replace-needed libopenblas64_.so libopenblas.so libarpack.so (patchelf is available here if your distro doesn't have it

The process could be upstreamed into Arpack.jl and make more reliable by automatically making the libarpack.so binary reference whatever OpenBLAS_jll.libopenblas is set to, which is correct for every distro's Julia

@carstenbauer
Copy link
Member

@wi11dey FWIW, let me say that (at least for production runs) I don't build Julia from source anymore because there is almost no good reason anymore to do so. The main reason for me was to use Intel MKL but that can nowadays be easily achieved with MKL.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests