Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightened up the requirement on recipes to define numpy x.x as runtime #650

Merged
merged 1 commit into from
Jun 1, 2016
Merged

Lightened up the requirement on recipes to define numpy x.x as runtime #650

merged 1 commit into from
Jun 1, 2016

Conversation

pelson
Copy link
Contributor

@pelson pelson commented Oct 26, 2015

Rolls back a little of #573 so that recipes are a little more friendly to write.
Currently one must effectively specify numpy x.x as both run and build dependencies in order to build a recipe which has a numpy ABI dependency against anything other than the latest numpy. With this change that is reduced to just specifying that as a runtime dependency.

@mwcraig
Copy link
Contributor

mwcraig commented Oct 28, 2015

@pelson -- I think the only remaining problem with this approach is that the numpy version does not end up pinned in the build. For example, if the build requirement is numpy but the runtime requirement is numpy x.x then in the build environment numpy resolves to the latest version, 1.10.1, even though the the package's build_id and dependencies list a different version of numpy (e.g. 1.9* if CONDA_NPY=1.9).

That is, of course, fixable in this approach. It comes up during the build here: https://github.com/pelson/conda-build/blob/xx_build_dep_remove/conda_build/build.py#L377-L380

Somewhere in ms_depends there needs to be a check of the numpy requirement listed in the run section if the dependencies for the build are being set up.

In #653 I pulled the check out into a new method of MetaData at https://github.com/mwcraig/conda-build/blob/numpy-build-id-x.x/conda_build/metadata.py#L426-L438 . build would need to change to run here, and the check needn't be pulled out into a separate method, of course.

Then in ms_depends I added a numpy-specific check at https://github.com/mwcraig/conda-build/blob/numpy-build-id-x.x/conda_build/metadata.py#L426-L438

It doesn't matter to me what specific implementation of your approach is used...in other words, you can feel free to grab the code from my PR (or I can do a PR on your PR, changing the x.x from build to run) or implement the numpy check in some other way, but the numpy does need to be special-cased in the build based on the whether numpy x.x is the runtime numpy dependency.

@pelson
Copy link
Contributor Author

pelson commented Oct 29, 2015

I think the only remaining problem with this approach is that the numpy version does not end up pinned in the build.

Aha! Thanks @mwcraig that was the info I needed. Thanks for digging on this.

So logically, in my opinion, the behaviour we want is:

requirements:
   run:
     - numpy x.x

means pin the built distribution's runtime dependency to the numpy version that we built against.

And:

requirements:
  build:
    - numpy >1.7

meaning pick a numpy which satisfies the numpy condition and adheres to the definition of the special CONDA_NPY environment variable/conda build flag.

This leaves the definition of:

requirements:
   build:
     - numpy x.x

with no meaning, IMHO.

Is this making sense, and does this have any flaws (apart from the fact that this (isn't what is exactly implemented in this PR, yet)?

@ilanschnell - it would be useful to get your input on this also.

Thanks,

@msarahan
Copy link
Contributor

+1 from me. You have my thanks for your efforts on this.

@mwcraig
Copy link
Contributor

mwcraig commented Oct 29, 2015

I think this makes sense, @pelson.

I'd be inclined to add a few things:

  • numpy x.x in build should raise an error that explains the fix; maybe a really strong warning would do, but I personally notice red lights on the CI services, not warnings, and it seems better prevent an incorrect build from happening than to allow it to proceed.
  • I'll try to cook up some more recipe tests, including that checks numpy version between build and run. I can move over the tests from Change handling of numpy x.x so that it is only needed as a build requirement #653, too. If you want to proceed on your own with this, feel free of course.
  • Can we fix the docs too? It would be helpful to announce the change on the conda list and have a place we can point people too.

@pelson
Copy link
Contributor Author

pelson commented Oct 30, 2015

If you want to proceed on your own with this, feel free of course.

Not at all. Please feel free to submit a PR against my branch, and we can work on this together.

Can we fix the docs too?

Yes please. Are you happy to do that part as well https://github.com/conda/conda-docs?

@mwcraig
Copy link
Contributor

mwcraig commented Nov 2, 2015

Yes please. Are you happy to do that part as well https://github.com/conda/conda-docs?

Sure, but won't get to it for a few days...

@mwcraig
Copy link
Contributor

mwcraig commented Nov 2, 2015

Not at all. Please feel free to submit a PR against my branch, and we can work on this together.

Got a PR in for tests, didn't get to this.

@mwcraig
Copy link
Contributor

mwcraig commented Nov 2, 2015

@pelson -- any feelings on how one ought to specify a pinned numpy build with bdist_conda? It used to be controlled by CONDA_NPY (I think), but now that numpy x.x is needed to trigger pinning I don't think it works (need to re-check in the morning).

Am thinking adding an option like --pin-numpy or --numpy-xx.

@stuarteberg
Copy link
Contributor

Guys, I'm sorry I'm late to this party, but I guess I don't understand why this PR would be a good thing.

The simplest thing would be to just treat x.x as a synonym for {{ environ.get('NPY_VER') }}* (but don't forget that * at the end...). If your package compiles against the C API, then this would do the trick:

requirements:
  build:
    - numpy x.x,>=1.9 # Require exactly the version specified by NPY_VER,
                       # but also enforce other version requirements we have.
                       # If the user gives a bad value for --numpy, then this 
                       # line will self-conflict, as expected.
  run:
    - numpy x.x # Obviously, the run-time requirement needs 
                # to match exactly the version we built with.

That strikes a nice balance between terse and explicit.

Above, @pelson writes:

So logically, in my opinion, the behaviour we want is:

requirements:
  run:
    - numpy x.x

means pin the built distribution's runtime dependency to the numpy version that we built against.

...I love it!

And:

requirements:
 build:
   - numpy >1.7

meaning pick a numpy which satisfies the numpy condition and adheres to the definition of the special CONDA_NPY environment variable/conda build flag.

...I don't love it. This adds "magic" behavior behind the scenes -- for numpy only -- that isn't obvious to the uninitiated. If x.x is simply defined as I suggested above, then the build requirements are explicit and terse -- just three extra characters:

requirements:
  build:
    - numpy >1.7,x.x

I think this proposal is compatible with the existing uses of x.x. I'm just suggesting that x.x be explicitly allowed as part of a complicated match spec involving >=, |, ,, etc.

@mwcraig
Copy link
Contributor

mwcraig commented Nov 2, 2015

More feedback is definitely better, and good point about not breaking existing recipes. I also see the virtue of requiring explicit over implicit.

Question about your suggestion, @stuarteberg: should the specification below raise an error in your proposal?

requirements:
  build:
    - numpy >1.7
  run:
    - numpy x.x

This currently produces a broken build in the released conda-build. A specification like the one above will not pin the dependency during the build despite advertising in the build_id and the runtime requirements that the version has been pinned.

I don't have strong feelings about whether x.x is also required to be in the build section but definitely want to make sure that a specification like that above either raises an error or really pins numpy in the build.

@stuarteberg
Copy link
Contributor

should the specification below raise an error in your proposal?

requirements:
 build:
   - numpy >1.7
 run:
   - numpy x.x

Under what I just proposed, no. Buggy recipes like the above may lead to a broken binary. But that's true whether we're talking about numpy or any other package. For instance, this recipe has the same problem:

requirements:
  build:
    - hdf5     >=1.8.14
  run:
    - hdf5     1.8.14

I'm certainly not opposed to having conda raise an error if it sees a recipe like that, but it strikes me as a separate issue from the one in this thread.

As I said, I'm know late to this topic, and realize I haven't put as much thought into this as the others in this thread. In particular, I guess I don't see what advantage the x.x feature brings over the template syntax I mentioned above (except that it's way easier on the eyes). Am I missing something?

@mwcraig
Copy link
Contributor

mwcraig commented Nov 3, 2015

The numpy x.x feature was merged in #573, released in v1.18.0...as currently written you get a broken package unless you use this spec:

requirements:
  build:
    - numpy x.x
  run:
    - numpy x.x

In v1.18.[01] you can make two mistakes:

  • If you omit the x.x in build then you build against the latest numpy, not the one in CONDA_NPY, though the version in CONDA_NPY is what is listed as the requirement in the built package.
  • If you omit the x.x in the run section then the package is built against the version of numpy in CONDA_NPY, but the runtime requirement is just numpy with no pinning.

So in the current released version it is easy to make a mistake by leaving out x.x in one of places, and relatively hard to detect the mistake, and (I think?) impossible to indicate what range of versions of numpy the package supports.

This PR and #653 are attempts to prevent those mistakes (this is the better option of those two, I think).

That said, the more I think about your suggestion x.x,>1.7, the more I like it.

I do think numpy will always be treated differently because it has been treated differently than any other python package from the beginning on conda. Wheels have the problem (for packages with extensions that compile against numpy) that there is no way to indicate a wheel has been compiled against particular version of numpy, but conda packages allow that.

Which is a long-winded way of saying that I think numpy does need to be treated a little differently than other packages in terms of checking the build/run specifications with each other.

@stuarteberg
Copy link
Contributor

I can see how giving numpy special treatment makes sense. The concerns we should keep in mind are:

  1. It would be a shame to clutter the conda code base with a bunch of lines like if spec.name == 'numpy'
  2. New conda-build users look at existing recipes as examples for building their own packages. Many packages use numpy, so they are likely to encounter whatever syntax we decide upon here. I'm afraid new users will be frustrated if they try to copy examples that use numpy, since those examples don't generalize well to other packages.

Regarding item 2., it now strikes me that x.x is an unfortunate choice of identifier for this particular purpose. Something like - numpy NPY_VER would have made it clear to the reader that this is a special feature that applies only to numpy. I mean, x.x looks almost like a regular version, doesn't it? After all, I think the following wacky strings are all valid versions according to numpy: 0.1a2, 1.post3, 0.dev2, 2.rc3. Unless you've memorized the version regex code, would you realize that x.x might be something "special"? OK, probably. But still...

(Jeez, I'm a real Debbie Downer, aren't I? I'm not trying to be a pain, just thinking out loud here...)

BTW:

(I think?) impossible to indicate what range of versions of numpy the package supports.

FWIW, you can always add additional constraints to a package's requirements by simply adding a new requirement [edit: for a package you already listed]. All the requirements are just bagged up and sent to the solver. So, this actually works:

requirements:
  build:
    - numpy x.x
    - numpy >=1.9

In some sense, I suppose the "and" operator (,) is really redundant, since this is always an option. But in general, I think putting the requirements on a single line is nicer to read.

@pelson
Copy link
Contributor Author

pelson commented Nov 3, 2015

Great feedback @stuarteberg - thanks for the comments!

I'm also of the opinion that numpy shouldn't be treated specially, I'm just trying to find a route which gets us there without breaking a number of workflows.

meaning pick a numpy which satisfies the numpy condition and adheres to the definition of the special CONDA_NPY environment variable/conda build flag.

This adds "magic" behavior behind the scenes -- for numpy only -- that isn't obvious to the uninitiated. If x.x is simply defined as I suggested above, then the build requirements are explicit and terse

The point of my statement was that if you want to specify the magic CONDA_NPY you still can, but otherwise it will get solved for you as any other dependency is.

The only point that I think we have contention on is the build requirement on numpy x.x. I do not believe it belongs in the recipe definition - I couldn't explain it to a novice (or even an expert) and to me that indicates it is a wart. Do you have a motivation for preserving it?

For instance, this recipe has the same problem:

requirements:
  build:
    - hdf5     >=1.8.14
  run:
    - hdf5     1.8.14

I'm certainly not opposed to having conda raise an error if it sees a recipe like that, but it strikes me as a separate issue from the one in this thread.

So we've got to the point where it is worth talking about removing that numpy special case. In my view the recipe above is clear as day (yet obviously not very sensible), and the proposal I put forwards extends very nicely to being able to define "baked" versions for anything (of course, I'm not proposing that in this PR):

requirements:
  build:
    - hdf5     >=1.8.14
  run:
    - hdf5     x.x

FWIW, you can always add additional constraints to a package's requirements by simply adding a new requirement

This is cool - I hadn't realised that was possible 👍

@mwcraig
Copy link
Contributor

mwcraig commented Nov 16, 2015

FWIW, you can always add additional constraints to a package's requirements by simply adding a new requirement [edit: for a package you already listed]. All the requirements are just bagged up and sent to the solver. So, this actually works:

FYI, this is mostly true, but breaks in one case (will open a separate issue for it because it shouldn't break):

requirements:
  run:
    - numpy >=1.9
    - numpy x.x  # this MUST come first for the build string be constructed properly.

@mwcraig
Copy link
Contributor

mwcraig commented Nov 21, 2015

@stuarteberg @pelson -- what do you think about a different approach to pinning the numpy version to CONDA_NPY:

requirements:
  pin-to-environment:
    # List the packages here whose version should be pinned to an environmental
    # variable.
    - numpy  # version set by CONDA_NPY
  build:
    - numpy >=1.9
  run:
    - numpy >=1.9

Right now we are trying to kludge into an otherwise straightforward list of requirements additional information about how some specific dependencies are to be handled. I think this simplifies dependency handling (numpy is no longer an odd special case) and is more explicit, making recipes easier to read/write.

This route could easily be extended to any package with a convention that the environmental variable is expected to be CONDA_PACKAGENAME, with the caveat that numpy and python also respect CONDA_NPY and CONDA_PY.

If you think this is worth pursuing I could try to write a PR to implement a couple of days from now.

@stuarteberg
Copy link
Contributor

(The following is more-or-less a restatement of my earlier opinion. I don't mean to be difficult, and I won't be offended if you guys come to a different conclusion. :-)

I think the jinja template feature already provides everything we need, without the need to extend the meta.yaml schema.

For a compiled package that uses numpy, I now use this syntax, which already works:

requirements:
  build:
    - numpy {{NPY_VER}}*,>=1.9 # Build against whichever numpy version was specified by CONDA_NPY,
                               # but fail if the user tries to use CONDA_NPY < 19
  run:
    - numpy {{NPY_VER}}* # Must run against the same version of numpy we built with

For a pure-python package, I use this:

requirements:
  run:
    - numpy >=1.9

In the former case, it's clear to the reader that the numpy version is pinned, and that the pinned version came from the environment at build time.

(Side note: Since the weird x.x syntax is already used in some existing recipes, the easiest thing to do is just to make it a synonym for {{NPY_VER}}. IMHO, x.x is not as clear as NPY_VER, but we should support it for backwards-compatibility reasons.)

On a related note: According to the conda docs, we can also use jinja templates to make a nicely descriptive build string:

build:
  number: 1
  string: np{{CONDA_NPY}}py{{CONDA_PY}}_{{PKG_BUILDNUM}}

Alas, that doesn't quite work due to two different bugs in conda-build. I've already submitted pull requests fix them both (#662 and #667). Once they are accepted, jinja templates in conda-build will behave exactly as conda's documentation already claims they do.

@mwcraig
Copy link
Contributor

mwcraig commented Nov 30, 2015

Sorry for the delay, @stuarteberg -- I had not realized jinja templating already worked in recipes!

A couple of questions about the method you are using right now:

  • Does the build string still get automagically generated by conda? Am guessing no, since you mention generating the build string in the recipe. Not that current build string generation is flawless...
  • Does the generated package actually pin the numpy version, and does that change if you list the additional version specs in run instead of build (or if you list it in both)?

I'm coming around to the view that the current x.x notation is not ideal...it is not intuitive, it currently has to be be put in both run and build or you end up with a broken package, and its behavior is different than the other "special" packages, python, R, and perl.

I'm not sold on your version yet because it seems really verbose, and suffers from the same ambiguity that x.x does in terms of whether the spec needs to be in build, run, or both, and where the additional specifications belong.

Ideally I'd like to not have to think about any of that...I would just list numpy and python requirements like any others.

PS The fact that multi-line specifications can be used has turned out to be insanely helpful. It is the only way to express both x.x and a range in the numpy dependency and the only way to pin the python version and list a range of compatible versions in the recipe. Rolling either up into one line fails.

@stuarteberg
Copy link
Contributor

Does the build string still get automagically generated by conda?

I just did a quick test. It appears that the automagic build string is activated if you define the CONDA_NPY environment variable (or provide --numpy=foo.bar). So, it depends on how you invoked conda-build, not on the contents of meta.yaml.

Does the generated package actually pin the numpy version, and does that change if you list the additional version specs in run instead of build (or if you list it in both)?

In the example I gave, yes, the package requires a specific numpy version at runtime. Adding additional specs merely constrain the requirements even further. Suppose I built my example from above via conda build --numpy=1.10. After jinja preprocessing, the meta.yaml file looks like this:

requirements:
  build:
    - numpy 1.10*,>=1.9
  run:
    - numpy 1.10*

Duplicating all the build contraints in the run section wouldn't hurt, but it is needless.

I'm not sold on your version yet because it seems really verbose, and suffers from the same ambiguity that x.x does in terms of whether the spec needs to be in build, run, or both, and where the additional specifications belong.

As I tried to show in my earlier post (compiled package vs. pure-python package), how the numpy spec should be constrained in build vs. run is not something that can be guessed in advance by conda. Generally, the beauty of conda is that it doesn't try to guess what your package requirements are -- it just gives you, the expert on your own package, the power to express those requirements succinctly. Conda doesn't know if a package compiles against numpy or merely uses numpy's python API. Therefore, the run requirement can't be automagically filled in.

The following examples are all legitimate requirements for different use-cases involving a numpy dependency:

For a package that compiles against numpy, but doesn't use any recently-added functions:

requirements:
  build:
    - numpy {{NPY_VER}}*
  run:
    - numpy {{NPY_VER}}*

For a package that compiles against numpy, and happens to use functions that were only added in version 1.9:

requirements:
  build:
    - numpy {{NPY_VER}}*,>=1.9
  run:
    - numpy {{NPY_VER}}*

For a package that uses numpy through the python API (no compilation needed), and furthermore calls numpy functions that were only added recently:

requirements:
  run:
    - numpy >=1.9

In some cases, you might have the exact same scenario, but for some reason the build script also needs numpy. This might not be common, but is certainly possible.

requirements:
  build:
    - numpy >=1.9
  run:
    - numpy >=1.9

In fact, it's even theoretically possible (though very unlikely) that some wacky package might require numpy as part of some special build script, but then not actually use numpy at run time:

requirements:
  build:
    - numpy >=1.9

As far as I can tell, there just isn't any way for conda to guess what your package needs. It is incumbent on the recipe author to understand the package requirements and encode them into meta.yaml correctly.

astrojuanlu added a commit to astrojuanlu/poliastro that referenced this pull request Dec 7, 2015
@pelson
Copy link
Contributor Author

pelson commented Jan 29, 2016

This PR has been rail-roaded a little by a pretty lengthy discussion on how a better pinning mechanism might look. I completely agree with aiming for a better solution, but I don't think it needs to hold up this obvious improvement to conda-build. At the end of the day, I suspect x.x will be removed, but I don't necessarily believe we have arrived at the best solution so far. That is a discussion best placed for a mailing-list, I believe.

In the meantime, @ilanschnell / @kalefranz / other conda-build developers, are you prepared to accept this incremental (non breaking) improvement in how recipe definitions are done, or shall I close the PR?

@msarahan
Copy link
Contributor

+1 as an incremental fix. What's up with the GIT_DESCRIBE_TAG making Travis fail?

@pelson
Copy link
Contributor Author

pelson commented Jan 29, 2016

What's up with the GIT_DESCRIBE_TAG making Travis fail?

I've assumed that wasn't me, but happy to look into it if you think it is introduced by this PR.

@stuarteberg
Copy link
Contributor

This PR has been rail-roaded a little by a pretty lengthy discussion on how a better pinning mechanism might look.

Fair enough. Sorry for my part in that.

BTW, in the intervening time since this PR was written, there have been some changes to metadata.py in the master branch. I think your branch here needs to be rebased. For example, metadata.get_contents() was moved/renamed to metadata.MetaData._get_contents(). The diffview here shows the old version.

@pelson
Copy link
Contributor Author

pelson commented Jan 29, 2016

Thanks @stuarteberg - I hadn't spotted that when I rebased. Diffs are looking much better now 👍

@msarahan
Copy link
Contributor

msarahan commented May 8, 2016

@pelson I think this is fine. I'm very sorry it has taken so long to get here. Would you like me to merge it, or hold off until after the build customization discussions tomorrow? I think whatever comes out of that meeting will probably supersede this PR, but this PR has still been conceptually valuable.

@jakirkham
Copy link
Member

Personally, I think we should hold off until the discussion tomorrow.

@pelson
Copy link
Contributor Author

pelson commented Jun 1, 2016

I dusted this PR down. Hopefully CI will go green.

@msarahan
Copy link
Contributor

msarahan commented Jun 1, 2016

Closing this to consolidate discussion to #966 where I'm trying to consolidate implementation of these ideas. Thank you everyone for your contributions to this discussion.

@msarahan msarahan closed this Jun 1, 2016
@msarahan
Copy link
Contributor

msarahan commented Jun 1, 2016

Sorry @pelson just saw your note... Reopening.

@msarahan msarahan reopened this Jun 1, 2016
@pelson
Copy link
Contributor Author

pelson commented Jun 1, 2016

I think the CI is crying wolf - I can't see that the failures are related to this PR, but may be wrong.

@msarahan
Copy link
Contributor

msarahan commented Jun 1, 2016

Yes, we have failures elsewhere. I'm well-acquianted. =(

This looks good. Thanks @pelson

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants