Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Build customization #848

Closed
wants to merge 6 commits into from
Closed

Conversation

ukoethe
Copy link
Contributor

@ukoethe ukoethe commented Mar 25, 2016

Conda-build Customization

This PR implements a number of enhancements that greatly simplify customization of the build and run requirements of a recipe. The general idea is this:

  • the recipe's build requirements should state source compatibility constraints
  • constraints on build dependency binaries are added at build time via customization
  • constraints on run requirements are computed at build time

The PR is not necessarily meant to be merged as-is, but should primarily serve as a solid basis for discussion of customization ideas.

Note: This PR intentionally omits the problems of features, which will be the subject of another proposal.

Option --build-config

This option allows the user to pass a build config file to conda-build via

conda build --build-config /path/to/config.yaml   foo

config.yaml is similar to meta.yaml, but only contains the section requirements/build (everything else is currently ignored). The contents of this file are simply concatenated to the recipe's build requirements. If this results in multiple lines referring to the same package, all corresponding constraints are joined and must be valid simultaneously. This behavior is already realized in conda's version resolution algorithm, and there is no need for any additional logic to be implemented. Semantically, recipes state the specific requirements of particular packages, whereas config files state requirements to be shared by a family of builds (e.g. (meta)packages that activate desired track_features declarations). In an automated deployment system, config files can be auto-generated from a build matrix.

Option --bootstrap

This option allows the user to pass a bootstrap environment to conda-build via

conda build --bootstrap some_env_name   foo

This behaves like --build-config, but the config file is automatically constructed from the given environment: All packages currently installed in some_env_name are considered as build requirements of the package to be built, in precisely the version found in the environment. This approach leverages the full power of conda's version resolution logic to free the user from the tedious and error-prone task of maintaining config files manually.

Jinja2 variable {{ installed }}

This variable is a dictionary whose keys are the package names found in the current _build environment, and the corresponding values contain the complete metadata from _build/conda-meta/package_name.json. This information can be used to configure run requirements in meta.yaml at build time with up-to-date information from the build itself:

requirements:
  build:
    - zlib
    - jpeg 

  run:
    - zlib  {{ installed['zlib']['version'] }}
    - jpeg  {{ installed['jpeg']['version'] }}

This pins the run requirements to the exact dependency versions present during the build. Of course, strict run requirements like this are a bit too pessimistic to be practical, so it is desirable to suitably relax the constraints. A solution using jinja filters is described in the next section. Another good idea would be to adapt the "compatible release" operator ~= from PEP 440 to conda-build, so that one could write

  run:
    - zlib  ~={{ installed['zlib']['version'] }}
    - jpeg  ~={{ installed['jpeg']['version'] }}

to permit all versions that are considered compatible with the build dependency.

Jinja2 configuration callback

If the recipe directory contains a file jinja_config.py that defines a function jinja_config(jinja_env), this function is called by conda-build just before asking jinja to parse meta.yaml. This allows the user to add additional variables and filters to the jinja namespace. A useful application of this capability is a filter that approximates the behavior of the compatible release operator ~= by transforming a version number like 1.2.3 into a version constraint like 1.2*,>=1.2.3. A simple implementation of the callback function might look like this:

def jinja_config(jinja_env):

    def compatible_versions(version):
        v = version.split('.')
        if len(v) > 1:
            return v[0]+'.'+v[1]+'*,>='+version
        else:
            return version

    jinja_env.filters['compatible'] = compatible_versions

and could be used in meta.yaml via the pipe operator |

  run:
    - zlib  {{ installed['zlib']['version'] | compatible }}
    - jpeg   {{ installed['jpeg']['version'] | compatible }}

The callbeck mechanism would provide recipe designers a very useful tool for experimentation with jinja magic. Eventually, the best ideas will be implemented natively within conda-build to make them conveniently accessible to everyone. (Then, the configuration callback will mainly serve as a last resort.)

@groutr groutr added pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding enhancement labels Mar 25, 2016
@stuarteberg
Copy link
Contributor

This PR is a great leap forward for the related issues of (1) configuring build requirement versions without hard-coding them into the recipe and (2) "pinning" runtime requirements so they match the build versions.

Here are some more detailed comments on each section:

Option --build-config

Specifying build requirements in a config file is a good idea. I like your proposal for the config file syntax. One point worth discussing: In your current implementation (IIUC), the config file is permitted to contain selectors (e.g. # [unix]), but not jinja templates. For example, this wouldn't be allowed in the build-config:

build:
  requirements:
    - python {{ PY_VER }}*

Obviously, the whole point of the build-config is that it's supposed to be an alternative to environment variables like PY_VER, so users probably shouldn't be mixing-and-matching configuration mechanisms like that anyway. Still, it's worth thinking about what the "right thing" is here.

Option --bootstrap

Now this is even better. Instead of requiring the user to curate an exact set of build requirements in a build-config, you let the user install the ones he cares about and let conda itself resolve the other versions. Love it. (And BTW, I was surprised to see how little code was necessary to implement this feature. Nice.)

Jinja2 variable {{ installed }}

This makes a lot of sense, and seems very powerful!

Regarding the proposed (but not yet implemented) "compatible release" operator (~=): Although it's nice to reuse "standard" syntax where possible, I don't know if it's valuable in this case. How many recipe authors will be familiar enough with PEP440 to recognize what ~= means? Furthermore, PEP440 applies only to python packages (well, the subset of python packages that comply with it). I guess my initial reaction is to prefer jinja filters, as you propose in the next section.

Jinja2 configuration callback

I like that this gives us the power to outsource recipe-specific logic into the recipe itself. Seems very powerful.

@msarahan
Copy link
Contributor

Pinging @mcg1969 - we have discussed if there are good ways of moving away from features, and he'd be very interested in this discussion.

Am I understanding that the --bootstrap option similar in spirit to #741? I'm a little skeptical on the value of this. If people are being meticulous about creating an env for any given project and minimizing the scope of that env, then I think this will work. Otherwise, I think it stands to collect more packages than are actually necessary for a package to be built. Does this support or supplant #741, or is it something completely different?

I like the Jinja {{ installed }} idea a lot. That syntax is a LOT more clear than the x.x stuff, and I like your "compatible release" idea. Like @stuarteberg, I think the Jinja2 configuration callback is probably the better way to go than introducing the ~= syntax into the recipe itself.

To help flesh out where this PR fits, I'd like to come up with concrete use cases where conda-build falls down right now, and how this PR addresses them. Some ideas to start with (but please add more):

  • Matching packages in an ecosystem that have all been built with the same compiler. Currently done with features. Does this PR address the same need?
  • Matching packages that have been built with differing compiler optimization options. Can the Jinja2 configuration check logic define compatible fallback optimization selections?
  • Pinning versions that a package is built with exactly, as pin dependencies when building a package #741 does
  • Where do features fit with this PR? What purpose would they be needed for, if any?

@pelson
Copy link
Contributor

pelson commented Mar 28, 2016

TLDR; Let's put the order of the resolve and build phases right before we come up with the user interface sugar.

Pinning versions that a package is built with exactly, as #741 does

If you want to be able to do this in the metadata using jinja, there is only one option - you must resolve the dependencies by the second pass of reading meta.yaml.

In exactly the same way, if you want to resolve the build_id (aka build/string) then you either have to provide configuration options (e.g. CONDA_NPY, CONDA_PY, --build-config) or you have to resolve the build dependencies.

In #747 I have implemented the changes necessary to resolve the build dependencies before the build takes place. As I said in that PR, the pinning scheme is academic - it is trivial to implement any of them once the resolve then build order is reversed. Attempting to implement anything without reversing that order will result in needing more configuration as we simply wont have all of the context needed to pin appropriately.

I'm not against any of the jinja implemented by this PR, though I do dislike the addition of more configuration (akin to CONDA_PY and CONDA_NPY IMO). Incidentally, in #747 I just added meta to the jinja context and was able to build the following recipe as one might expect *:

package:
  name: foobar
  version: 1

requirements:
  build:
    - python

  run:
{% for _, pkg in meta.resolve_build_deps().items() %}
    - {{ pkg.name }} {{ pkg.version }}*
{% endfor %}

* Maybe it isn't true that is is what one might expect - this example resolves all dependencies, and so includes the version of Python as well as the version of zlib, tk etc. . How we implement the user interface, I'm all ears 👂

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

we have discussed if there are good ways of moving away from features

When preparing this PR, I didn't have the intention to move away from features (in fact, I'm working on another PR that addresses features). If you have any insights on how the proposed enhancements may serve to make features simpler or unnecessary, I'd be very interested to hear them.

Am I understanding that the --bootstrap option similar in spirit to #741?

IIUC, there are two major difference between this PR and #741:

  1. pin dependencies when building a package #741 addresses pinning on the client side during conda install, whereas the proposed options --build-config and --bootstrap are meant to simplify configuration of the _build environment during conda build. Although related, these are two different tasks.
  2. pin dependencies when building a package #741 works by adding pin... statements to recipes , whereas this PR aims at configuration without changing recipes.

I'm a little skeptical on the value of this [the --bootstrap option].

The aim of the --bootstrap option is to lower the entry barrier for occasional conda-build users (for example, people providing plug-ins for a larger project like ilastik). These people could simply create an environment where their new package is the only missing ingredient, and then call conda build --bootstrap to make sure that this package fits in exactly as intended. This is much simpler than setting up the appropriate config.yaml file, and much better than hard-coding all requirements in the plugin's recipe. On the other hand, power users might prefer config files, especially if they can be autogenerated by some master build process.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 29, 2016

When preparing this PR, I didn't have the intention to move away from features (in fact, I'm working on another PR that addresses features). If you have any insights on how the proposed enhancements may serve to make features simpler or unnecessary, I'd be very interested to hear them.

Well, features have been overused, by us, and because they complicate the solution process they have been the source of many errors and difficulties, as we all know. For instance, the vc9/vc10/vc14 features used by the Windows versions of the python package were unnecessary, and they're being replaced with a proper packaging of the VC runtimes.

I would definitely want to participate in any discussion that considers standardizing or recommending a particular use of features, so we can make sure that it doesn't conflict with the solver's current objectives. And my preference will always be to avoid the use of features if an alternate strategy can accomplish the same thing.

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

The Jinja2 configuration callback is probably the better way to go than introducing the ~= syntax

I fully agree. Taking the idea of the compatible filter one step further, one could add an (optional) parameter format to the filter that controls exactly (i.e on a per-package basis) how the version number is transformed into the compatibility constraint. I haven't thought about suitable format syntax yet, but taking the 'x.x' idea for the moment, one might write

requirements:
  run:
    - python {{ installed['python']['version'] | compatible('x') }} # '3.3.5' => '3*,>=3.3.5'
    - zlib {{ installed['zlib']['version'] | compatible('x.x') }}   # '1.2.3.1' => '1.2*,>=1.2.3.1'
    - bar {{ installed['bar']['version'] | compatible('x.x.x') }}   # '0.9.4.2' => '0.9.4*,>=0.9.4.2'

This can be further improved when recipe designers add a new field compatibility to their mata.yaml which contains the appropriate format spec for the compatibility guarantees the respective package will give for subsequent versions, resulting in

requirements:
  run:
    - bar {{ installed['bar']['version'] | compatible(installed['bar']['compatibility']) }}
      #  retrieve format string from bar's metadata:  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

(this syntax is too verbose, but can easily be abbreviated by a convencience function if desired, for example by storing the full compatibility constraint in the metadata).

One can even think about a science fiction idea like a compatibility database at anaconda.org which is continuously fed with test results from the CI servers about what fits or does not fit together. The compatible filter can then consult the database to insert compatibility information that actually has been tested empirically.

@msarahan
Copy link
Contributor

If you have any insights on how the proposed enhancements may serve to make features simpler or unnecessary, I'd be very interested to hear them.

I think what this might do to alleviate the need for features is making the compiler version and runtime stuff line up more explicitly with only dependencies, rather than features.

@mcg1969 @ukoethe Features serve an important role; one that was not clear to me until I expressed it to myself this way:

Features ensure that runtimes, build flags (optimization), or other compatibility-breaking aspects all line up within a single process.

This is the same sort of role that WinSxS fills, I think. I do not feel that we overused features (we did, however overconstrain their meaning to say that only one runtime could exist in an environment at once). We do have to enforce that for a given Python process, only one runtime is in use.

The room for improvement that I see is: how do you handle two (or more) different processes in one environment with the same class of feature, but different values? This is where I think we could learn from WinSxS or similar efforts.

Anyway, that's not part of this PR. I'll look forward to future discussions, perhaps on @ukoethe's features PR.

@msarahan
Copy link
Contributor

@ukoethe do you have any reply to @pelson's phase ordering point?

more configuration (akin to CONDA_PY and CONDA_NPY IMO).

@pelson I see this as a generalization of the CONDA_PY stuff, which I would argue is not very extensible at all. Any environment variable needs new handling code. I'm not completely clear on how this config file would handle collisions, but that's an implementation detail - and a problem that the CONDA_PY stuff shares anyway. We'll still have to support CONDA_PY and CONDA_NPY for legacy reasons, but IMHO, this is a better way.

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

I'd like to come up with concrete use cases

conda build already offers the options --python, --numpy, --perl, and --R to configure the versions of these packages in the _build environment. However, this approach doesn't scale well if you need to add support for additional packages or properties (e.g. compiler, BLAS variant, AVX acceleration), and fails if something outside the predefined set is needed. The option --build-config provides exactly the same functionality as the options mentioned above, except that a single uniform mechanism suffices to configure versions for any package you want.

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

@ukoethe do you have any reply to @pelson's phase ordering point?

To be honest, I don't quite understand @pelson's question. My PR doesn't touch phase ordering at all. The options --build-config and --bootstrap apply to the resolve phase only, and they don't involve any jinja magic. In contrast, the jinja variable {{ installed }} is only available during the second pass over meta.yaml and is therefore unusable in the build requirements section. For technical reasons (jinja can ignore missing variables, but not missing filters), the jinja_config() callback is invoked both before the first and second pass over meta.yaml, but I expect it to be most useful in the second pass.

I like @pelson's idea to expose the entire MetaData object to jinja. It mostly serves the same purpose as my {{ installed }} variable. For example, his loop to pin run requirements can be replicated in terms of {{ installed }} almost identically (Edit: This code is just meant to illustrate the capabilities of the {{ installed }} variable - I do not recommend to write recipes this way, see @stuarteberg's comment below):

  run:
{% for pkg, data in installed.items() %}
    - {{ pkg }} {{ data['version'] }}*   # optionally: add the 'compatible' filter
{% endfor %}

We should discuss if we want both or should prefer either one. For one thing, the {{ installed }} variable may be easier to understand/document because it simply reflects the contents of _build/conda-meta verbatim.

I also think that the 'x.x' syntax or something similar can be implemented in the format string of the compatible filter I outlined above. This would have the advantage that conda-build doesn't need to be changed to support the desired functionality. @pelson, would you mind checking this possibility?

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 29, 2016

Features ensure that runtimes, build flags (optimization), or other compatibility-breaking aspects all line up within a single process.

I think features can be a tool to accomplish this goal but they're certainly not sufficient.

For instance, we can accomplish most of this with custom build strings and dependencies, much in the same way that we do for Python versions and NumPy versions. Features make it simpler to select for a particular build dependency, but in theory it could be done by hand.

I'd feel better about it if features were not just keys but optionally key-value pairs. So for instance, we could do blas:openblas, blas:mkl, or crt:vc10, etc. An environment could only have one value for a particular feature class.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 29, 2016

(Ah, but if we implement this option we have to be careful to differentiate by language. That is, we can't have one crt feature for all programs in the environment, otherwise Python and R would have to be compiled against the same runtime. So we'd want, say, python_crt:vc10 or somesuch.)

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

For instance, we can accomplish most of this with custom build strings and dependencies

This would be possible, but requires a big effort to standardize the structure and contents of build strings (e.g. is py35np110 the same as np110py35?). IIRC, you said elsewhere that this is not planned.

I'd feel better about it if features were not just keys but optionally key-value pairs.

This is exactly the suggestion I was going to make. It may even be sufficient when only the track_features: declarations are changed into key-value pairs, whereas the features: declarations are kept in order to maintain backwards compatibility (However, I'm not sure if this simplification would cover the case where Python and R with different crts are placed in the same environment.)

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 29, 2016

This would be possible, but requires a big effort to standardize the structure and contents of build strings (e.g. is py35np110 the same as np110py35?). IIRC, you said elsewhere that this is not planned.

The only requirement build strings have is to provide uniqueness. They could be a hash key for all I care. It's the dependencies that provide the distinguishing features for the solver.

As for your track_features/features suggestion---they are tied too closely together and depend on each other so what you propose wouldn't work. But we can just define a punctuation mark, like the colon, to define "key/value" versions of features, moving ahead. So features without a colon would work the same way as they currently do.

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

The only requirement build strings have is to provide uniqueness.

For build strings to become unique, a canonical ordering of their parts (including future extensions) is needed. This is one way of standardization, isn't it?

As for your track_features/features suggestion---they are tied too closely together and depend on each other so what you propose wouldn't work.

I'm not so pessimistic before actually trying. I thought about reversing the order of key and value into

track_features:
  vc14:  crt

Then, a loop for feature in track_features produces the same output as if feature tags were given in a sequence rather than a mapping. But of course, you have a much better understanding of the complications.

@mcg1969
Copy link
Contributor

mcg1969 commented Mar 29, 2016

The point is that track_features and features are tied together tightly in the conda metadata itself. Even if we improve conda-build we have to contend with the way the current metadata is structured. If we change the way track_features works, we are, unavoidably, changing how features works too.

But this isn't pessimism at all. We can implement a key/value system for features in a way that works alongside the existing, value-free approach. I'm actually quite happy to let everyone else hash out how to implement the YAML side of this, I can take on how it's done in the Python dictionaries :-)

@mingwandroid
Copy link
Contributor

@mcg1969
(Ah, but if we implement this option we have to be careful to differentiate by language. That is, we can't have one crt feature for all programs in the environment, otherwise Python and R would have to be compiled against the same runtime. So we'd want, say, python_crt:vc10 or somesuch.)

R uses vc6, yeah, really old.

Here I think we may want to come up with another concept of link-groups in meta.yaml. You can combine as many different CRTs together even in a single recipe as you want either as build or run dependencies if they communicate with each other via pipes or stdio, isolated into different executables. The problems come when you try to link them together (in very limited situations you can get away with it).

I'm keen to hash out the YAML side of crt versioning for mingw-w64 packages, since I want to mark a whole lot of them as being vc6 soon.

@stuarteberg
Copy link
Contributor

I like @pelson's idea to expose the entire MetaData object to jinja.

We've discussed this previously, and I still have the same minor concern: If we expose the actual MetaData object to the jinja templates, we're blurring the line between "public" parts of the MetaData API and "private" implementation details. If a conda developer merely renames a variable in metadata.py, it could potentially break recipes in the wild.

FWIW, I would not be opposed to adding MetaData.meta to the jinja context, since that's just the parsed yaml data.

@stuarteberg
Copy link
Contributor

For example, his loop to pin run requirements can be replicated in terms of {{ installed }} almost identically:

 run:
{% for pkg, data in installed.items() %}
   - {{ pkg }} {{ data['version'] }}*   # optionally: add the 'compatible' filter
{% endfor %}

I could be missing something, but wouldn't this be a bad idea? This would make all implicit (indirect) dependencies of a package explicit dependencies. For packages near the top of the a development stack, the list of run requirements would get HUGE. The problem with that is that, every now and then, a package might drop a dependency between versions while staying API/ABI compatible with previous versions.

"Real" cases of this (where a package's true requirements really changed) are not so common, but there is an "accidental" case which is more likely: If a recipe erroneously includes a run requirement that it didn't really need, the fix is simple: Just delete that requirement, bump the build number, and rebuild the package. But if some downstream package used the above looping code, it will also have to be rebuilt. This "accidental" scenario seems even more likely when meta-packages are involved.

@ukoethe
Copy link
Contributor Author

ukoethe commented Mar 29, 2016

I'm keen to hash out the YAML side of crt versioning for mingw-w64 packages, since I want to mark a whole lot of them as being vc6 soon.

I don't quite understand this. Why would you mark mingw-w64 as being vc6? As long as only C and Fortran are concerned, mingw-w64 binaries are compatible with many (all?) versions of Visual Studio. For example, I use mingw-w64 in my Visual Studio 2012 build in two (not uncommon) situations: to compile Fortran in scipy and other packages (MS doesn't provide a Fortran compiler), and to compile C libaries that only come with a make (or automake) build system, such as openblas and iconv. This works suprisingly well. (Eventually, I will try to use mingw-w64 throughout, but this is another story.)

@stuarteberg
Copy link
Contributor

To respond to @kalefranz:

a recipe should explicitly define exactly what it is building, rather than implicitly relying on an undefined and blackbox environment

This is already impossible, because the exact versions that are chosen for a build will always depend on things defined outside the recipe. For instance, even a simple recipe like the following depends on external state:

requirements:
  run:
    - python 2.7*
    - numpy {{NPY_VER}}*
  • The exact numpy version depends on what the user passed via --numpy=X.Y (or, after this PR, some other configuration mechanism).
  • For both python and numpy the selected version depends on what happened to be available on anaconda.org at the moment the recipe was built. (Not to mention which channels were available.)
  • And even if you could somehow guess the build version, you still don't know which build number the user ended up with.

If you really wanted to eliminate all sources of "non-determinism" in the recipe, you would have to require all recipes to specify their dependencies in full (including exact version and build string):

requirements:
  run:
    - python 2.7.9 1
    - numpy 1.9.1 py27_0

... but that's obviously not viable. Like it or not, the build will always depend on some external state.

Question 1: How can a recipe permit customization of build requirements...

"Environment variables" with robust in-line jinja2 filter support.

Fine by me.

**Question 2: ... meta.yaml needs to somehow refer to the versions we're building against.

I fully support parameterization via jinja2, along with the use of jinja2 filters.

Sounds good.

Question 3: ... how should we relax the version spec ...

I'm uncomfortable coupling a recipe's definition to the behavior of the conda solver.

I don't think I understand this point; no one is proposing to depend on the solver per se. The correct runtime spec depends on the build version, but not on the particular algorithm that selected that build version in the first place.

Anyway, from the example you gave in Question 2, it sounds like we're on the same page.

Question 4: How should the build string be specified?

If the build string is specified by the author, that should override conda's default, as is the case now. And this should support variable expansion via jinja2.

Works for me!

Question 4.5: Is it a requirement ... to predict the tarball filename with a naive parse ... ?

I think the answer here is no.

Works for me, but we need some buy-in from @pelson on this point. Even though predictable tarball names were never guaranteed in conda-build recipes, I think conda-build-all was written with exactly that assumption. Do I understand that correctly?

Since conda-forge depends on conda-build-all, it seems that we have a choice to make. IIUC, these are the options we have:

  • Abandon this assumption: Forbid uses of jinja templates in package/version and build/string, and also deprecate __conda_version__.txt, __conda_buildstr__.txt, etc.
  • Come up with a way to fix conda-smithy and/or conda-build-all so that it doesn't depend on this assumption
  • Leave things as they are, and just declare that conda-forge will only support recipes that don't violate the tarball name assumption.

Question 5: Should conda-build itself perform "matrix" builds... ?

Conda build should support build matrixes.

I still think this is a mistake, but I guess it won't negatively affect those of us who choose not to use it. I'd be interested to hear what the other conda-build users in this thread think, but if they remain silent, I'll give up and spend my energy on the other questions.

Question 6: Should we constrain the usage of jinja templates...?

...we need to enforce early and full validation of a recipe...

I think we can agree that this question is really orthogonal to the other concerns in this thread. In the interest of wrapping up this discussion as soon as possible, I propose that we open a new issue for Question 6 and discuss it there.

@ukoethe
Copy link
Contributor Author

ukoethe commented Apr 28, 2016

@kalefranz
If I understand correctly, you want to restrict the use of jinja to prevent recipe authors from making stupid mistakes. However, in the history of computing, the successful strategy has always been to empower the users, even if this means that some people shoot themselves in their feet (think of C/C++ for example).

@stuarteberg
Copy link
Contributor

In the context of this PR, what would be your advice to the conda forge folks with regard to how they've currently parameterized their builds? For example, the single h5py package

IIUC, the files you linked to will barely need to change. Right now they call conda build several times, but they configure some environment variables before each call. After this PR, they'll need to write a build-config file or maybe create a boostrap env to achieve the same thing. (Whatever we decide.)

As far as conda-forge goes, I think we need to get those folks involved in the discussion about Question 4.5.

@kalefranz
Copy link
Contributor

If I understand correctly, you want to restrict the use of jinja to prevent recipe authors from making stupid mistakes.

I don't think that's quite right. I want jinja2 use to be restricted to values of key/value pairs, so that meta.yaml follows a schema I can validate early. That way, later code execution can concern itself with more narrowly scoped logic and not have to itself worry about inputs being valid.

@ukoethe
Copy link
Contributor Author

ukoethe commented Apr 28, 2016

@kalefranz

I want jinja2 use to be restricted to values of key/value pairs, so that meta.yaml follows a schema I can validate early.

Exactly when and how would you like to validate meta.yaml? Before or after the first pass, you can only check syntactic correctness, but jinja variables can still fail to resolve into what was intended, even if only key/value pairs are allowed. After the second pass, all jinja has been resolved, so it doesn't matter if it was restricted or not. Since nothing serious happens until after the second pass, isn't a check at that point enough? In case of error, it might be sufficient to display the parsed meta.yaml to inform users what went wrong.

@stuarteberg
Copy link
Contributor

Guys, this PR already touches on several interlocking issues that need to be addressed as a whole. But thankfully, Question 6 is not one of them -- we can address it separately, in a different thread. Let's continue that discussion in #857.

Meanwhile, we can focus on Questions 1-5.

@msarahan
Copy link
Contributor

Here's a "fun" use case, that I hope provides @kalefranz with some motivation to see this through quickly:

I need to build Qt5's webengine. Webengine is Chrome for Qt. Webengine supports only MSVC>=2013, and requires Python 2.7 as a build tool.

I can force the compiler to be MSVC 2015, BUT:
If I build with --python=2.7, my shared libraries are all vc9 featured
If I list python=2.7 as a build dependency, same story
I can't build with --python=3.5, because Webengine will have none of it.

This PR, or something like it, are the only way out of this situation. We desperately need to decouple compiler from Python, and we need to be able to explicitly control which libraries packages need. The feature system is completely blocking me here.

I will try to install a system Python 2.7 outside of conda to get this done - but the recipe will not be a fully reproducible conda recipe with that.

CC @csoja - this is a hard block on qt5.

@stuarteberg
Copy link
Contributor

stuarteberg commented Apr 29, 2016

@msarahan:

I will try to install a system Python 2.7 outside of conda to get this done

I agree it would be nice to actually list python in the recipe build requirements, but since that's not workable (for now), can't you just install python 2.7 into a sibling environment as part of your build.sh script? At least you can avoid using yum or apt-get or whatever.

# my-recipes/qt5/build.sh

conda remove -y -n tmpenv --all || true
conda create -y -n tmpenv python=2.7

conda_root=$(conda info --root)
export PATH=$conda_root/envs/tmpenv/bin:$PATH

# Now build...
./configure --yada-yada
...

conda remove -y -n tmpenv --all

@msarahan
Copy link
Contributor

@stuarteberg good idea - that's very related to what @pelson proposed in the meeting just now, and I think it is the right approach, at least until we have these feature things straightened out.

@jakirkham
Copy link
Member

jakirkham commented Apr 29, 2016

I think there is a really good use case for this stuff, but maybe this should be a new issue. Just to cite some examples before it moves, needing to build with an old Python (e.g. scons) and with a newer VC, needing to decompress an xz source file (e.g. in a VS 2008 environment), etc.. Some non-build environment that we can grab executables would be really nice for this stuff.

@jakirkham
Copy link
Member

I'd rather have to know that I have to add a compiler as part of the recipe--and have the build fail with a helpful error if I don't--rather than having conda-build magically use whatever compiler it finds.

@kalefranz, I think this is a valid concern and have known cases where I do want to control which compiler is used. Though maybe the solution is not to package the compiler per se, but to have some metapackage that finds and verifies a compiler matches certain constrains (e.g. gcc, C++11 support, etc.) failing to install if not met and then set CC and CXX (possibly other environment variables) if passing. We should discuss this more with any other concerned parties in some other issue though right now I can't think of where this should live.

@msarahan
Copy link
Contributor

msarahan commented May 3, 2016

Great minds think alike: @mwiebe has proposed another idea along these lines: #911 - I hope Mark can join us on Monday morning.

@msarahan
Copy link
Contributor

msarahan commented May 8, 2016

I have re-read the issues in preparation for tomorrow. Thanks everyone for your civil and insightful comments.

I see the value in @pelson's request to have dependency resolution earlier in the process. I would like to separate rendering of the meta.yaml from the build, and have done so in #908. That probably needs further work to move the dependency resolution. It might need one further jinja pass:

  1. render well-defined things, like package version, that don't depend on anything else
  2. render source-dependent template variables
    2.a. resolve dependencies
  3. render dependency-containing template variables (pinning, for example)

Regardless, I want meta.yaml template rendering as its own complete, self-contained step, and I want source downloading and dependency resolution to be done efficiently (not multiple times) as much as reasonably achievable. The fact that rendering the recipe and building it both need the source code is a little messy, but nothing we can't work past. Also, rendering needs to be smart enough to recognize when it needs the source code to complete a meta.yaml, and when it does not. Same story for dependency resolution. Each of them are fairly slow operations.

If anyone sees a way to achieve this without the extra jinja pass, that would be good too, but one more jinja pass is a small price to pay to have all of this work nicely.

I hope that this might unify #747 and #848. If so, let's use the meeting tomorrow to plan around use cases, such as:

  1. pinning
  2. the desire to support a Python 2.7 ecosystem on Windows with a compiler other than VS 2008
  3. how work such as @mcg1969's notion of variants can be supported: WIP: key:value "variant" support conda#2427

@jakirkham
Copy link
Member

I must say I am very happy to see things are really starting to coalesce around rendering. Thanks for seeing ( #908 ) through, @msarahan. While it does seem a bit tangential at first, it seems very crucial to both providing the flexibility that people are wanting from recipes and yet still allowing them to sit in automated pipelines where other tools must reason about various aspects of the created packages in advance. Hopefully, we can now put the other pieces together around this important functionality.

@stuarteberg
Copy link
Contributor

@msarahan:

If anyone sees a way to achieve this without the extra jinja pass, that would be good too

I guess I don't understand why a third pass is necessary.

  1. render well-defined things, like package version, that don't depend on anything else

At the moment, even the package version cannot always be determined in advance, but I don't think that's a problem per se. The only section that needs to be independently "well-defined" in advance is source.

  1. render source-dependent template variables 2.a. resolve dependencies

IIUC, there are no circumstances in which the build dependencies aren't already known (or could be known) after the first parse, even under the current proposal. I think this render step isn't needed.

  1. render dependency-containing template variables (pinning, for example)

Moving on:

I would like to separate rendering of the meta.yaml from the build, and have done so in #908. That probably needs further work to move the dependency resolution.

Sounds good. Currently, the build proceeds as follows (quoted from above):

  1. Parse meta.yaml (with possibly missing jinja)
  2. Download source
  3. Resolve all versions for build step
  4. Populate _build environment
  5. Parse meta.yaml (final parse)
  6. Build the package
  7. Resolve all versions for test step
  8. Populate _test environment using run (and test) requirements
  9. Run tests.

If I understand you correctly, we want to finish the last render step before populating the _build environment. That should be doable; we just need to swap steps 4 and 5, like so:

  1. ...
  2. Resolve all versions for build step
  3. Parse meta.yaml (final parse)
    (4.a. If desired, just print the rendered meta.yaml and exit.)
  4. Populate _build environment
  5. ...

Now we can't use this PR exactly as written because the installed variable is populated from the contents of _build/conda-meta/*.json (from the already-installed _build environment). Fair enough; maybe we can just populate a special dict with the build versions (between steps 3 and 4).

msarahan added a commit to msarahan/conda-build that referenced this pull request May 22, 2016
@msarahan
Copy link
Contributor

msarahan commented Jun 1, 2016

Closing this to consolidate discussion to #966 where I'm trying to consolidate implementation of these ideas. Thank you everyone for your contributions to this discussion.

@msarahan msarahan closed this Jun 1, 2016
@msarahan msarahan mentioned this pull request Jul 6, 2016
@kenodegard kenodegard added type::feature request for a new feature or capability and removed type::enhancement labels Jan 19, 2022
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Jan 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding type::feature request for a new feature or capability
Projects
None yet
Development

Successfully merging this pull request may close these issues.