Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support x.x for all dependencies. #728

Closed
wants to merge 5 commits into from

Conversation

johanneskoester
Copy link
Contributor

Motivation

Currently, using x.x to allow pinning of dependencies to externally provided versions works for Python, Perl, R and numpy.
However, the functionality is critical in order to be able to pin dependencies in case of ABI incompatibilities. E.g., this mechanism can be used to build for particular boost versions.
This PR modifies conda-build to support this mechanism for all dependencies.

Changes

  1. The CLI is changed to provide an additional argument --versions which allows to pin the versions for arbitrary packages, e.g. --versions boost=1.57.
  2. The config object is changed to allow storage of these versions.
  3. The function ms_depends is changed to generalize the current pinning mechanism.

Outlook

The new implementation could be used as well to handle the previous mechanism that was specific for Python, Perl, R and numpy. However, I left the old code for backwards compatibility.

Johannes Köster on behalf of the Bioconda team.

@pelson
Copy link
Contributor

pelson commented Jan 18, 2016

Really interesting addition. Nice @johanneskoester. I'm personally not a huge fan of the current x.x. implementation, but the capability it provides is extremely valuable - so this addition only stands to strengthen that value 👍.

For me, this would need a good deal more unit tests before I had confidence that the changes are, and continue to, function in the expected way.

Great stuff!

@johanneskoester
Copy link
Contributor Author

Glad you like it! I don't know if I have the time to implement more tests. Is CA interested in adding tests? I think a lot of packages outside of Bioconda would benefit from this as well. We need this functionality quite urgently in Bioconda.

@msarahan
Copy link
Contributor

I think this is a good thing. Pinging @kalefranz and @csoja for comment on whether CA can fill in on writing tests (if so, on what timescale?)

@stuarteberg
Copy link
Contributor

This is a reasonable proposal to solve a real problem. Just for discussion, I want to point out an alternative.

As pointed out above, there are currently four packages that already support the special x.x syntax (python, numpy, perl, R). One can define x.x for each package in one of two ways:

  • via a command-line option, e.g. conda build --python=3.5 mypackage
  • via an environment variable, e.g. CONDA_PY=35 conda build mypackage

If you've done that, then you can use x.x in your recipe like so:

requirements:
  build:
    - python >=3.4
  run:
    - python x.x

or, even better (IMHO) you can avoid writing x.x and use a jinja template, which is slightly more explicit:

requirements:
  build:
    - python >=3.4,{{PY_VER}}*
  run:
    - python {{PY_VER}}*

But --python, CONDA_PY, and PY_VER are all hard-coded names, and we need a more general mechanism that can apply to other packages, too. As @johanneskoester points out, there are a lot of other packages that we probably want to pin down in our recipes.

It's worth mentioning that jinja templates can also be used to achieve our goals here. Taking inspiration from conda/conda#1959, we could define a jinja template with our preferred versions in it, and include it in meta.yaml:

{% import '$CONDA_DEFAULT_ENV/versions.yaml' as versions %}

package:
  name: mypackage
  version: 0.1

requirements:
  build:
    - boost >=1.55,{{versions.boost}}*

  run:
    - boost {{versions.boost}}*

And your versions.yaml file should look something like this:

{% set python = '3.5' %}
{% set numpy = '1.9' %}
{% set boost = '1.57' %}

For huge stacks, the number of versions one might want to pin could get large, so keeping them organized in a file might be more manageable than keeping them on the command-line. They're also readily diff-able.

Admittedly, abusing the CONDA_DEFAULT_ENV variable for this purpose is a little awkward. If we pursue this idea, perhaps it would be nice to add an alternative mechanism for the user to specify the location of versions.yaml (or other jinja templates).

@johanneskoester
Copy link
Contributor Author

@stuarteberg I like your alternative proposal. For us, it is important to be able to pin to multiple versions. Hence, it would be necessary to build for different versions.yaml.

I wasn't aware of the jinja support (is this documented somewhere?).
It seems to me that everything we want is already possible without changes to conda?

Defining a file with versions and make this flexible without hardcoding to some path in CONDA_DEFAULT_ENV:

{% import environ['versions'] as versions %}

package:
  name: mypackage
  version: 0.1

build:
  string: boost{{versions.boost}}_0

requirements:
  build:
    - boost >=1.55,{{versions.boost}}*

  run:
    - boost {{versions.boost}}*

In addition we can already pin directly:

{% set boost = environ['BOOST_VER'] %}

package:
  name: mypackage
  version: 0.1

build:
  string: boost_{{boost}}_0

requirements:
  build:
    - boost >=1.55,{{boost}}*

  run:
    - boost {{boost}}*

Am I right?

johanneskoester added a commit to bioconda/bioconda-recipes that referenced this pull request Jan 25, 2016
We are about to find general a solution for pinning to ABIs [here](conda/conda-build#728 (comment)).
@stuarteberg
Copy link
Contributor

I wasn't aware of the jinja support (is this documented somewhere?).

Good question. I thought it was documented explicitly, but now I can't seem to find it. You can see implicit reference to jinja support in the examples for git environment variables.

Defining a file with versions and make this flexible without hardcoding to some path in CONDA_DEFAULT_ENV:

{% import environ['versions'] as versions %}

Unfortunately, it's not quite that simple. By default, jinja doesn't let you import using absolute paths. Every template you wish to use must be locatable on jinja's search path. We can support absolute paths by adding / to the search path. That's easy enough -- we just need to add jinja2.FileSystemLoader('/') to this list in metadata.py. Then your idea would work like this:

$ versions=/path/to/versions.jinja conda build mypackage

In addition we can already pin directly:

Yes, your example will work perfectly, as far as I can tell. Environment variables are loaded directly into the jinja context, so you can simplify your example like so:

package:
  name: mypackage
  version: 0.1

build:
  number: 0
  string: boost{{BOOST_VER}}_{{PKG_BUILDNUM}}

requirements:
  build:
    - boost >=1.55,{{BOOST_VER}}*

  run:
    - boost {{BOOST_VER}}*
$ BOOST_VER=1.57 conda build mypackage

(OK, the PKG_BUILDNUM variable might not work correctly until the next conda-build release (1.18.3). It's been working in the conda-build master for a couple months now...)

BTW, If you forget to define BOOST_VER in your environment, you'll see an error:

Error: Failed to render jinja template in /tmp/mypackage/meta.yaml:
'BOOST_VER' is undefined

@johanneskoester
Copy link
Contributor Author

Thanks, that sounds good and much more flexible than the x.x approach.

@msarahan
Copy link
Contributor

Is this then something that really needs documentation more than code
change? Should we write up @stuarteberg ideas here, make his change to
metadata.py, and call it good?

On Tue, Jan 26, 2016, 01:48 Johannes Köster notifications@github.com
wrote:

Thanks, that sounds good and much more flexible than the x.x approach.


Reply to this email directly or view it on GitHub
#728 (comment).

@johanneskoester
Copy link
Contributor Author

I would support that, and I am fine with abandoning my PR in favor of @stuarteberg's proposal.

@pelson
Copy link
Contributor

pelson commented Jan 26, 2016

We just need to bear in mind that the proposed Jinja approach is not entirely equivalent. Jinja variables are expanded at MetaData creation time, whereas the x.x form is expanded at MetaData.ms_depends time. Essentially, that entirely prevents any tool inspecting a recipe to determine if there is an environment dependent variable being used. This is currently done by tools such as conda-build-all to inspect whether a recipe depends on a pinned version of numpy by identifying the x.x, and adjusting the appropriate variable to determine if multiple versions should be built against.
I can come up with workarounds around such a situation, but none of them are particularly pleasant nor maintainable.

A proposal for how such a tool might do this inspection would be an interesting use case to consider.

@stuarteberg
Copy link
Contributor

Should we write up @stuarteberg ideas here, make his change to metadata.py, and call it good?

The suggested change was:

we just need to add jinja2.FileSystemLoader('/') to this list in metadata.py.

I wonder if we should also add jinja2.FileSystemLoader(os.getcwd()). That would allow both of these:

VERSIONS_FILE=/absolute/path/to/versions.jinja conda build mypackage
VERSIONS_FILE=./relative/path/to/versions.jinja conda build mypackage

Thinking of jinja templates in general, it occurs to me that we might want the ability to use selectors in the imported templates, too. That is, something like this:

{% set python = '3.5' %}
{% set numpy = '1.9' %}
{% set boost = '1.57' %} # [not win]
{% set boost = '1.58' %} # [win]

It shouldn't be hard. I think we just need to implement our own 'loader' that is aware of selectors. I think this is all that's needed in MetaData._get_contents() (but I haven't tested this):

class FilteredLoader(jinja2.BaseLoader):
    """
    A pass-through for the given loader, except that the loaded source is
    filtered according to any metadata selectors in the source text.
    """
    def __init__(self, unfiltered_loader):
        self._unfiltered_loader = unfiltered_loader
        self.list_templates = unfiltered_loader.list_templates

    def get_source(self, environment, template):
        contents, filename, uptodate = self._unfiltered_loader.get_source(environment, template)
        return select_lines(contents, ns_cfg()), filename, uptodate

loaders.append(jinja2.FileSystemLoader('/'))
loaders.append(jinja2.FileSystemLoader(os.getcwd()))

loaders = map(FilteredLoader, loaders)

@stuarteberg
Copy link
Contributor

We just need to bear in mind that the proposed Jinja approach is not entirely equivalent. Jinja variables are expanded at MetaData creation time, whereas the x.x form is expanded at MetaData.ms_depends time. Essentially, that entirely prevents any tool inspecting a recipe to determine if there is an environment dependent variable being used.

This is obviously related to the discussion in #650. For better or worse, I'm bringing the same biases to this discussion. :-)

If I were rewriting conda from scratch, I would leave out the x.x feature for several reasons.** But the nice thing about x.x is -- for those who need to write recipes that support more than one configuration of dependency versions -- that it provides a standard-ish way to do it. That means that other tools (e.g. conda-build-all) have a standard string to look for when it comes to introspecting recipes. If we start recommending the jinja solution, then users are free to do use all sorts of unique conventions when it comes to the imported version variables. (Which makes the tool-maker's life more difficult.)

In my opinion, jinja templates are explicit and more generalizable without relying on updates to the conda code base when we need to support new packages (or languages) with strange version semantics. But I will readily acknowledge that @pelson's opinion on this may be better informed, from his experience with the conda-forge project.

If you'd like to merge this PR, I have no complaints. It does seem carefully written and does not preclude jinja-based approaches for those of us who would prefer that instead.


**My complaints about x.x are as follows:

  • The name x.x is poorly chosen.

    • For new users, it is not even close to self-explanatory.
    • It seems to imply a MAJOR.MINOR convention, such that mypackage 1.2.0 is compatible with mypackage 1.2.1, and conversely, mypackage 1.2 is not compatible with mypackage 1.3. For some packages, like numpy and python, that's true. For others (e.g. hdf5), it's not true at all. I really wish we lived in a world where all packages adhered to such a convention, but that's not the world we live in.
  • It adds complexity to the conda-build code base to handle special cases related to specific packages (so far, numpy, python, perl, R). Originally, that made sense, because conda didn't yet support templated recipe files, and keeping N different versions of all python/numpy packages was just not practical. But if conda is going to take over the world (as we all hope it will), we need to be really strict about avoiding special cases in the conda code base itself, and push all package-specific logic into the recipes themselves.

  • It violates the "explicit is better than implicit" mantra. What happens if a user types conda build --numpy=1.9.3 mypackage for a recipe that uses - numpy x.x? To know the answer, you'll have to read the conda source code. With a jinja template variable (or an unadulterated environment variable), then the user can just look at her own recipe templates to see what - numpy {{numpy_version}} means.

  • As discussed in Lightened up the requirement on recipes to define numpy x.x as runtime #650, it isn't obvious how to simultaneously use x.x and constrain the build requirements in a recipe. For instance, this is not allowed:

      requirements:
        build:
          - numpy >1.7,x.x

    (However, as mentioned in Lightened up the requirement on recipes to define numpy x.x as runtime #650, there happens to be a non-obvious workaround.)

if any(i in v for i in ',|>!<'):
break
res.append(ms.name + v.strip('*'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_id() (and thus the default build string) will now mention any packages that have a specified version. But in this implementation, the order of the package names in the build string will depend on the order in which they happened to be listed in meta.yaml. Instead -- just for consistent naming conventions -- should they be sorted before they get added to the build_id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, yes.

@msarahan
Copy link
Contributor

I am generally in favor of @stuarteberg's proposal, because I find the readability of things like:

  • numpy {{NPY_VER}}*,>=1.9

vastly superior to x.x.

Regarding @pelson's point about a tool for introspection, could such a tool inspect the template itself, rather than its filled-in content? That seems like an easier target.

@stuarteberg
Copy link
Contributor

Regarding @pelson's point about a tool for introspection, could such a tool inspect the template itself, rather than its filled-in content? That seems like an easier target.

I don't want to speak for @pelson, but here are my thoughts: Yes, such a tool could be written to inspect a recipe according to a given standard. But the problem he faces now is that the jinja and/or environment variable method allows anyone to implement their own conventions for how package versions are injected into meta.yaml. Presumably, he can make his tool follow any one convention we throw at him. He can check specific environment variables, or he can read a particular template file, etc. But he can't be expected to handle everything people might think of.

I don't think that's a show-stopper. It just means that the developers of "matrix build" tools such as conda-build-all will have to be explicit in their documentation about which conventions your "configurable" recipe must adhere to if you hope to use their tools. I'm thinking something like "to use this tool, you must explicitly hard-code all of your dependency versions OR set them as jinja template variables with the following naming conventions..."

@jakirkham
Copy link
Member

cc @ukoethe

@jankatins
Copy link

See also conda-forge/conda-forge.github.io#157 where I argue that low level libraries like boost should get a package name change to make the coinstallable.

@msarahan
Copy link
Contributor

msarahan commented Jun 1, 2016

@JanSchulz, please consolidate discussion to #966 where I'm trying to consolidate implementation of these ideas.

Closing this - but thank you everyone for your contributions to this discussion.

@msarahan msarahan closed this Jun 1, 2016
@github-actions github-actions bot added the locked [bot] locked due to inactivity label May 15, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants