Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipes which depend on blas and lapack #80

Closed
jjhelmus opened this issue Mar 7, 2016 · 22 comments
Closed

Recipes which depend on blas and lapack #80

jjhelmus opened this issue Mar 7, 2016 · 22 comments

Comments

@jjhelmus
Copy link
Contributor

jjhelmus commented Mar 7, 2016

I have a recipe that I am looking to submit that has compiled extensions which links to blas and lapack. Does anyone have experience building package which depend on these or have any suggestions on how get these to build in a portable manner? I'm only looking to support Linux and OS X with the package which hopefully makes the process a bit easier.

@pelson
Copy link
Member

pelson commented Mar 7, 2016

No experience here I'm afraid. @msarahan any advice?

@jakirkham
Copy link
Member

I have some experience with this. Was going to submit OpenBLAS with NumPy, SciPy, and scikit-learn support. Also can add cvxopt support, but will wait for things like FFTW ( #106 ) to go into feedstocks first. Can add numexpr to the mix, but haven't written this one yet. @abergeron and I wrote the OpenBLAS recipe in conda recipes. Though I don't think I like the way I did things in that version and am contemplating how to rewrite it after discussions with @mcg1969. He has some good insight after the solver overhaul and the public release of mkl. In general and am open to some discussion about improving this scheme going forward.

The main idea is one should use a feature to select the underlying BLAS/LAPACK. This is how mkl works. Possibly two features to handle cases where one might want to use a different LAPACK with the BLAS and such. As mkl has become publicly available, I have found that I have needed to require nomkl and tracking it in alternative BLAS/LAPACK's.

While features could be used for lots of different things, the main use case seems to be select an implementation of a common API like BLAS/LAPACK. Though I think this could be coded a little more explicitly when choosing feature names. For instance, I kind of like the idea of all BLAS features being named blas_* so there would be blas_openblas, blas_atlas, etc. Similarly this could be done with LAPACK's (i.e. lapack_*). There should be a way of implementing the existing mkl and nomkl in this schema probably by implementing it above them, but I would like to see those become part of this scheme also. Going forward this could be instructive in terms of how features gets shaped in conda for these use cases like better exclusionary power (only one BLAS style feature at a time).

Any thoughts on this?

@patricksnape
Copy link
Contributor

@jakirkham I like the proposal - though I would say that as a community guideline we should direct people towards using either MKL or OpenBLAS - since most people are not dogmatic about what BLAS they use, they just want it to be fast!

I've actually struggled against this before because often recipes only require setting a key/path to the chosen BLAS library but the rest of the recipe is the same. All the more reason (IMHO) that features should be passed through to scripts in some way, perhaps as ENV variables? Would make it easier to maintain one recipe and just switch on certain feature options - not that this repo has any control over that!

@pelson
Copy link
Member

pelson commented Mar 23, 2016

@patricksnape - we can change the recipe behaviour based on external env vars if we need to...

$ > cat meta.yaml 
package:
  {% if os.environ.get('feature') == 'foo' %}
  name: foo
  {% else %}
  name: bar
  {% endif %}

$> feature=foo conda build .
BUILD START: foo--0

$ > conda build .
BUILD START: bar--0

@jakirkham
Copy link
Member

Nice proposal @pelson. Is there already a spec on how we add and select these different features? Like a yaml file we add them to or something? If not, we should definitely work on one before we start adding this stuff (my opinion at least 😄).

@patricksnape
Copy link
Contributor

@pelson Awesome. This is the kind of stuff that would make a killer little blog post somewhere - really useful information, thanks. These kind of 'best practises' would be great to compile in the Wiki or somewhere similar.

@183amir
Copy link
Contributor

183amir commented Apr 12, 2016

Hey guys, assuming that we acquire a license for mkl, how do we compile against mkl.
I mean, do we install it in our CI machines? How do we handle the license and serial number? If we compile with mkl, can we distribute our binaries?
Would we need extra recipes for packages that build against mkl? How different would those be?

@jakirkham
Copy link
Member

These are all great questions. Probably the first step would be to talk to people at Continuum and see how they handle this problem. After that we may need to talk with someone at Intel to understand the licensing constraints in this situation.

@msarahan
Copy link
Member

Yes, you'd need to install it with the license key on the CI machines. I don't think you can bundle it outside of their installer (this would probably defeat their licensing software, if it would work at all). I don't know how permissible it is to install something like that in a docker image - or even if it's possible to do their activation process in an unattended way. This is really a question for Intel. FWIW, with docker, you could install MKL in one image, and then create linked images with docker-compose so that you don't need the overhead of MKL for every build.

Packages that link against MKL don't need to be different other than providing a way of adding mkl as a dependency. This is the default in Anaconda, but it would be nicer to make a set of features for BLAS, and then choose one. This would give mutual exclusivity, which might be what you want. These schemes are not completely supported in conda right now, though. I think the ideal would be some jinja2 placeholder for blas, and which blas gets used is a matter of configuration.

@msarahan
Copy link
Member

Also, something I didn't express clearly: the hard issue here is installation of MKL on the build side of the story. Redistribution of runtimes is permissible (please check the license yourself, but I'm pretty sure this is the case.)

@183amir
Copy link
Contributor

183amir commented Apr 13, 2016

I also wanted to know that if we should compile with the same mkl version that for example numpy was compiled with if we link against numpy.

@jakirkham
Copy link
Member

Basically, let's stick with nomkl for now. The problem is we don't get headers for the mkl case. We can look into what Intel would allow us to do with mkl.

Ideally though I would like to have us use something like OpenBLAS. This is already what nomkl means on Linux. Accelerate on Mac has well known issues so it would be wise to switch to OpenBLAS. On Windows, there is no nomkl so we should make it OpenBLAS too. The first step would be to add a working OpenBLAS package. I will try to get on this soon.

Once we have that we can discuss the implications for features and feedstocks going forward.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 13, 2016

The problem here is that features really don't provide mutual exclusivity by themselves. In theory you can install the nomkl feature and still have the MKL libraries installed. And to be honest we ought to allow OpenBLAS and MKL to be installed in the same environment; we just need to make sure that only one is selected within a given process.

We need the key-value functionality for features that we discussed in another thread, or my metapackage solution we discussed elsewhere. Consider this second option for a moment. We would create a set of metapackages with the name "python_blas". One version of this package would be built for each blas variant: python_blas-mkl-0.tar.bz2 for MKL, python_blas-openblas-0.tar.bz2 for OpenBLAS, etc.

Then any package that needs to link to BLAS directly builds different variants for each BLAS type, and includes the corresponding version of the python_blas package as a dependency, as well as the BLAS library itself. Having to include both is a bit messy, so an alternative would be for python_blas itself to have those dependencies built in. This complicates the versioning strategy a bit but it's still doable.

This approach utilizes the natural mutual exclusivity offered by package names to ensure that only one BLAS is being used in a given Python dependency chain.

@jakirkham
Copy link
Member

And to be honest we ought to allow OpenBLAS and MKL to be installed in the same environment; we just need to make sure that only one is selected within a given process.

Sorry to derail this a bit. I have heard you bring this up before, @mcg1969, but I'm having trouble understanding why one would want to do this. What is the use case here? Are there some cases you have encountered where this is helpful?

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 13, 2016

I'm thinking of cases like, say, Python & R in the same environment. Why require them both to link to the same BLAS? I mean, sure, it would be convenient, but it puts more burden on the builders and the users. If it happens to be convenient to use MKL by default with Python and OpenBLAS by default with R, well, why not?

The argument is more clear with C runtime libraries. It's not easy to ensure that every program you're using in an environment links to the same C runtime. But obviously we need to make sure that every Python package does.

@jjhelmus
Copy link
Contributor Author

Redistribution of runtimes is permissible

Permissible yes, allowed with some/many open source licenses, no. Refer to the Intel EULA for details but from what I recall the last time the NumPy developers talked about distributing MKL linked NumPy binaries the sticky points were:

  • Section 3(C) which includes a restrictions against " reverse engineer, decompile,
    or disassemble the Materials;" and specifically excludes the use of MKL with software which would become subject to an "Excluded License" which explicitly mentions GPL, LGPL, MPL, and the CPL. BSD licensed software seems to be alright but the resulting binary would not be BSD.
  • The indemnification of Intel against any damages in section 8.

<Free Software Soapbox>
I will further add that any software distributed with the MKL license does not meet the definition of Free Software as it restricts at least two of the four essential freedoms, specially the freedom to study how the program works (freedom 1) and the freedom to distribution modified versions to others (freedom 3).
</Free Software Soapbox>

@jakirkham
Copy link
Member

I'm thinking of cases like, say, Python & R in the same environment. Why require them both to link to the same BLAS? I mean, sure, it would be convenient, but it puts more burden on the builders and the users. If it happens to be convenient to use MKL by default with Python and OpenBLAS by default with R, well, why not?

I am concerned about performance issues here, but would need to think a bit more to come up with a reasonable example.

Though, as @jjhelmus mentions, licensing is a concern. I do regularly interact with GPL programs that need a BLAS and NumPy. IANAL, but I feel like having MKL around is a murky area. Especially as there would be linkage through to the GPL'd library.

The argument is more clear with C runtime libraries. It's not easy to ensure that every program you're using in an environment links to the same C runtime. But obviously we need to make sure that every Python package does.

Doesn't this case become a slippery slope? How are we sure that some C/C++ library isn't later getting used by some Python library with C/C++ bindings? It seems like it would be very hard to avoid this interface between other C runtimes from ever showing up.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 13, 2016

I'm not going to address the licensing concerns. We would have these same packaging & dependency issues with BLAS or C runtimes no matter what the licenses are.

I am concerned about performance issues here, but would need to think a bit more to come up with a reasonable example.

Performance issues certainly matter, but convenience does too. We can't necessarily control who is building every package we would like to use. So as long as things don't break to run the two BLAS versions in separate processes, we should not be preventing conda from installing them into the same environment.

How are we sure that some C/C++ library isn't later getting used by some Python library with C/C++ bindings?

Well, if dependencies aren't set correctly, there's nothing we can do. What we want here is the ability to get those dependency relationships right, and hopefully we can instruct people to do so, or find ways to automate those determinations in conda build.

But the ship has sailed on multiple C runtimes. We simply cannot synchronize on a single C runtime within conda environments involving mixed platforms like Python, R, lua, node, etc.

@mcg1969
Copy link
Contributor

mcg1969 commented Apr 13, 2016

In fact, the C runtime problem is really the controlling example here. We don't have a choice but to get that one right, and if it helps us solve the BLAS problem too, even better.

@183amir
Copy link
Contributor

183amir commented May 4, 2016

Hey guys, I went ahead and tried to compile bob.math with mkl and it seems to be working. Here is my recipe:

{% set version = "2.0.3" %}

package:
  name: bob.math
  version: {{ version }}

source:
  fn: bob.math-{{ version }}.zip
  url: https://pypi.python.org/packages/source/b/bob.math/bob.math-{{ version }}.zip
  md5: 0f010af6ce20fe6614570edff94e593f

build:
  number: 3
  skip: true  # [win]
  script: python -B setup.py install --single-version-externally-managed --record record.txt
  script_env:
   - LD_LIBRARY_PATH
   - LIBRARY_PATH
   - MIC_LD_LIBRARY_PATH
   - NLSPATH
   - CPATH

requirements:
  build:
  - python
  - setuptools
  - bob.core
  - boost
  - cmake
  - numpy x.x
  - pkg-config

  run:
  - python
  - bob.core
  - boost
  - numpy x.x

test:
  requires:
  - nose

  imports:
  - bob
  - bob.math

  commands:
  - nosetests -sv bob.math

about:
  home: http://github.com/bioidiap/bob.math
  license: Modified BSD License (3-clause)
  summary: LAPACK and BLAS interfaces for Bob

extra:
  recipe-maintainers:
  - 183amir

I installed mkl in our docker image with a student license.
At first I tried to add the mkl as a feature and dependency but looks like that is not what Continuum does anymore.
And to make the environment variables available I ran something like: source /intel/mkl/bin/mklvars.sh -v intel64
Now what I don't like about this recipe is this part:

build:
  script_env:
   - LD_LIBRARY_PATH
   - LIBRARY_PATH
   - MIC_LD_LIBRARY_PATH
   - NLSPATH
   - CPATH

I guess having a package like mkl-nonfree would make the process a lot easier.
I think if we want to add mkl in conda-forge, we need to create a private repository of mkl-nonfree and not upload it on anaconda.org but make it available in our CI builds so it can be used during compile time. Probably we have to apply for the open source license.

@jakirkham
Copy link
Member

So we have a way to do BLAS now and it works ok. Details can be found in this hackpad. There is certainly room for growth, but it will probably involve an enhancement proposal (once that framework is ironed out).

@jakirkham
Copy link
Member

Closing, but feel free to discuss more as appropriate.

@jakirkham jakirkham removed their assignment Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants