-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Build customization #848
Conversation
This PR is a great leap forward for the related issues of (1) configuring build requirement versions without hard-coding them into the recipe and (2) "pinning" runtime requirements so they match the build versions. Here are some more detailed comments on each section: Option
|
Pinging @mcg1969 - we have discussed if there are good ways of moving away from features, and he'd be very interested in this discussion. Am I understanding that the --bootstrap option similar in spirit to #741? I'm a little skeptical on the value of this. If people are being meticulous about creating an env for any given project and minimizing the scope of that env, then I think this will work. Otherwise, I think it stands to collect more packages than are actually necessary for a package to be built. Does this support or supplant #741, or is it something completely different? I like the Jinja {{ installed }} idea a lot. That syntax is a LOT more clear than the x.x stuff, and I like your "compatible release" idea. Like @stuarteberg, I think the Jinja2 configuration callback is probably the better way to go than introducing the ~= syntax into the recipe itself. To help flesh out where this PR fits, I'd like to come up with concrete use cases where conda-build falls down right now, and how this PR addresses them. Some ideas to start with (but please add more):
|
TLDR; Let's put the order of the
If you want to be able to do this in the metadata using jinja, there is only one option - you must resolve the dependencies by the second pass of reading In exactly the same way, if you want to resolve the build_id (aka In #747 I have implemented the changes necessary to resolve the build dependencies before the build takes place. As I said in that PR, the pinning scheme is academic - it is trivial to implement any of them once the I'm not against any of the jinja implemented by this PR, though I do dislike the addition of more configuration (akin to CONDA_PY and CONDA_NPY IMO). Incidentally, in #747 I just added
* Maybe it isn't true that is is what one might expect - this example resolves all dependencies, and so includes the version of Python as well as the version of zlib, tk etc. . How we implement the user interface, I'm all ears 👂 |
When preparing this PR, I didn't have the intention to move away from features (in fact, I'm working on another PR that addresses features). If you have any insights on how the proposed enhancements may serve to make features simpler or unnecessary, I'd be very interested to hear them.
IIUC, there are two major difference between this PR and #741:
The aim of the |
Well, features have been overused, by us, and because they complicate the solution process they have been the source of many errors and difficulties, as we all know. For instance, the I would definitely want to participate in any discussion that considers standardizing or recommending a particular use of features, so we can make sure that it doesn't conflict with the solver's current objectives. And my preference will always be to avoid the use of features if an alternate strategy can accomplish the same thing. |
I fully agree. Taking the idea of the
This can be further improved when recipe designers add a new field
(this syntax is too verbose, but can easily be abbreviated by a convencience function if desired, for example by storing the full compatibility constraint in the metadata). One can even think about a science fiction idea like a compatibility database at anaconda.org which is continuously fed with test results from the CI servers about what fits or does not fit together. The |
I think what this might do to alleviate the need for features is making the compiler version and runtime stuff line up more explicitly with only dependencies, rather than features. @mcg1969 @ukoethe Features serve an important role; one that was not clear to me until I expressed it to myself this way:
This is the same sort of role that WinSxS fills, I think. I do not feel that we overused features (we did, however overconstrain their meaning to say that only one runtime could exist in an environment at once). We do have to enforce that for a given Python process, only one runtime is in use. The room for improvement that I see is: how do you handle two (or more) different processes in one environment with the same class of feature, but different values? This is where I think we could learn from WinSxS or similar efforts. Anyway, that's not part of this PR. I'll look forward to future discussions, perhaps on @ukoethe's features PR. |
@ukoethe do you have any reply to @pelson's phase ordering point?
@pelson I see this as a generalization of the CONDA_PY stuff, which I would argue is not very extensible at all. Any environment variable needs new handling code. I'm not completely clear on how this config file would handle collisions, but that's an implementation detail - and a problem that the CONDA_PY stuff shares anyway. We'll still have to support CONDA_PY and CONDA_NPY for legacy reasons, but IMHO, this is a better way. |
|
To be honest, I don't quite understand @pelson's question. My PR doesn't touch phase ordering at all. The options I like @pelson's idea to expose the entire MetaData object to jinja. It mostly serves the same purpose as my
We should discuss if we want both or should prefer either one. For one thing, the I also think that the 'x.x' syntax or something similar can be implemented in the format string of the |
I think features can be a tool to accomplish this goal but they're certainly not sufficient. For instance, we can accomplish most of this with custom build strings and dependencies, much in the same way that we do for Python versions and NumPy versions. Features make it simpler to select for a particular build dependency, but in theory it could be done by hand. I'd feel better about it if features were not just keys but optionally key-value pairs. So for instance, we could do |
(Ah, but if we implement this option we have to be careful to differentiate by language. That is, we can't have one |
This would be possible, but requires a big effort to standardize the structure and contents of build strings (e.g. is
This is exactly the suggestion I was going to make. It may even be sufficient when only the |
The only requirement build strings have is to provide uniqueness. They could be a hash key for all I care. It's the dependencies that provide the distinguishing features for the solver. As for your |
For build strings to become unique, a canonical ordering of their parts (including future extensions) is needed. This is one way of standardization, isn't it?
I'm not so pessimistic before actually trying. I thought about reversing the order of key and value into
Then, a loop |
The point is that But this isn't pessimism at all. We can implement a key/value system for features in a way that works alongside the existing, value-free approach. I'm actually quite happy to let everyone else hash out how to implement the YAML side of this, I can take on how it's done in the Python dictionaries :-) |
R uses vc6, yeah, really old. Here I think we may want to come up with another concept of I'm keen to hash out the YAML side of crt versioning for mingw-w64 packages, since I want to mark a whole lot of them as being vc6 soon. |
We've discussed this previously, and I still have the same minor concern: If we expose the actual FWIW, I would not be opposed to adding |
I could be missing something, but wouldn't this be a bad idea? This would make all implicit (indirect) dependencies of a package explicit dependencies. For packages near the top of the a development stack, the list of run requirements would get HUGE. The problem with that is that, every now and then, a package might drop a dependency between versions while staying API/ABI compatible with previous versions. "Real" cases of this (where a package's true requirements really changed) are not so common, but there is an "accidental" case which is more likely: If a recipe erroneously includes a run requirement that it didn't really need, the fix is simple: Just delete that requirement, bump the build number, and rebuild the package. But if some downstream package used the above looping code, it will also have to be rebuilt. This "accidental" scenario seems even more likely when meta-packages are involved. |
I don't quite understand this. Why would you mark mingw-w64 as being |
To respond to @kalefranz:
This is already impossible, because the exact versions that are chosen for a build will always depend on things defined outside the recipe. For instance, even a simple recipe like the following depends on external state: requirements:
run:
- python 2.7*
- numpy {{NPY_VER}}*
If you really wanted to eliminate all sources of "non-determinism" in the recipe, you would have to require all recipes to specify their dependencies in full (including exact version and build string): requirements:
run:
- python 2.7.9 1
- numpy 1.9.1 py27_0 ... but that's obviously not viable. Like it or not, the build will always depend on some external state.
Fine by me.
Sounds good.
I don't think I understand this point; no one is proposing to depend on the solver per se. The correct runtime spec depends on the build version, but not on the particular algorithm that selected that build version in the first place. Anyway, from the example you gave in Question 2, it sounds like we're on the same page.
Works for me!
Works for me, but we need some buy-in from @pelson on this point. Even though predictable tarball names were never guaranteed in conda-build recipes, I think Since conda-forge depends on conda-build-all, it seems that we have a choice to make. IIUC, these are the options we have:
I still think this is a mistake, but I guess it won't negatively affect those of us who choose not to use it. I'd be interested to hear what the other conda-build users in this thread think, but if they remain silent, I'll give up and spend my energy on the other questions.
I think we can agree that this question is really orthogonal to the other concerns in this thread. In the interest of wrapping up this discussion as soon as possible, I propose that we open a new issue for Question 6 and discuss it there. |
@kalefranz |
IIUC, the files you linked to will barely need to change. Right now they call As far as conda-forge goes, I think we need to get those folks involved in the discussion about Question 4.5. |
I don't think that's quite right. I want jinja2 use to be restricted to values of key/value pairs, so that meta.yaml follows a schema I can validate early. That way, later code execution can concern itself with more narrowly scoped logic and not have to itself worry about inputs being valid. |
Exactly when and how would you like to validate |
Guys, this PR already touches on several interlocking issues that need to be addressed as a whole. But thankfully, Question 6 is not one of them -- we can address it separately, in a different thread. Let's continue that discussion in #857. Meanwhile, we can focus on Questions 1-5. |
Here's a "fun" use case, that I hope provides @kalefranz with some motivation to see this through quickly: I need to build Qt5's webengine. Webengine is Chrome for Qt. Webengine supports only MSVC>=2013, and requires Python 2.7 as a build tool. I can force the compiler to be MSVC 2015, BUT: This PR, or something like it, are the only way out of this situation. We desperately need to decouple compiler from Python, and we need to be able to explicitly control which libraries packages need. The feature system is completely blocking me here. I will try to install a system Python 2.7 outside of conda to get this done - but the recipe will not be a fully reproducible conda recipe with that. CC @csoja - this is a hard block on qt5. |
I agree it would be nice to actually list python in the recipe # my-recipes/qt5/build.sh
conda remove -y -n tmpenv --all || true
conda create -y -n tmpenv python=2.7
conda_root=$(conda info --root)
export PATH=$conda_root/envs/tmpenv/bin:$PATH
# Now build...
./configure --yada-yada
...
conda remove -y -n tmpenv --all |
@stuarteberg good idea - that's very related to what @pelson proposed in the meeting just now, and I think it is the right approach, at least until we have these feature things straightened out. |
I think there is a really good use case for this stuff, but maybe this should be a new issue. Just to cite some examples before it moves, needing to build with an old Python (e.g. scons) and with a newer VC, needing to decompress an xz source file (e.g. in a VS 2008 environment), etc.. Some non-build environment that we can grab executables would be really nice for this stuff. |
@kalefranz, I think this is a valid concern and have known cases where I do want to control which compiler is used. Though maybe the solution is not to package the compiler per se, but to have some metapackage that finds and verifies a compiler matches certain constrains (e.g. |
I have re-read the issues in preparation for tomorrow. Thanks everyone for your civil and insightful comments. I see the value in @pelson's request to have dependency resolution earlier in the process. I would like to separate rendering of the meta.yaml from the build, and have done so in #908. That probably needs further work to move the dependency resolution. It might need one further jinja pass:
Regardless, I want meta.yaml template rendering as its own complete, self-contained step, and I want source downloading and dependency resolution to be done efficiently (not multiple times) as much as reasonably achievable. The fact that rendering the recipe and building it both need the source code is a little messy, but nothing we can't work past. Also, rendering needs to be smart enough to recognize when it needs the source code to complete a meta.yaml, and when it does not. Same story for dependency resolution. Each of them are fairly slow operations. If anyone sees a way to achieve this without the extra jinja pass, that would be good too, but one more jinja pass is a small price to pay to have all of this work nicely. I hope that this might unify #747 and #848. If so, let's use the meeting tomorrow to plan around use cases, such as:
|
I must say I am very happy to see things are really starting to coalesce around rendering. Thanks for seeing ( #908 ) through, @msarahan. While it does seem a bit tangential at first, it seems very crucial to both providing the flexibility that people are wanting from recipes and yet still allowing them to sit in automated pipelines where other tools must reason about various aspects of the created packages in advance. Hopefully, we can now put the other pieces together around this important functionality. |
I guess I don't understand why a third pass is necessary.
At the moment, even the package version cannot always be determined in advance, but I don't think that's a problem per se. The only section that needs to be independently "well-defined" in advance is
IIUC, there are no circumstances in which the
Moving on:
Sounds good. Currently, the build proceeds as follows (quoted from above):
If I understand you correctly, we want to finish the last render step before populating the
Now we can't use this PR exactly as written because the |
Closing this to consolidate discussion to #966 where I'm trying to consolidate implementation of these ideas. Thank you everyone for your contributions to this discussion. |
Conda-build Customization
This PR implements a number of enhancements that greatly simplify customization of the build and run requirements of a recipe. The general idea is this:
The PR is not necessarily meant to be merged as-is, but should primarily serve as a solid basis for discussion of customization ideas.
Note: This PR intentionally omits the problems of features, which will be the subject of another proposal.
Option
--build-config
This option allows the user to pass a build config file to conda-build via
config.yaml
is similar tometa.yaml
, but only contains the sectionrequirements/build
(everything else is currently ignored). The contents of this file are simply concatenated to the recipe's build requirements. If this results in multiple lines referring to the same package, all corresponding constraints are joined and must be valid simultaneously. This behavior is already realized in conda's version resolution algorithm, and there is no need for any additional logic to be implemented. Semantically, recipes state the specific requirements of particular packages, whereas config files state requirements to be shared by a family of builds (e.g. (meta)packages that activate desiredtrack_features
declarations). In an automated deployment system, config files can be auto-generated from a build matrix.Option
--bootstrap
This option allows the user to pass a bootstrap environment to conda-build via
This behaves like
--build-config
, but the config file is automatically constructed from the given environment: All packages currently installed insome_env_name
are considered as build requirements of the package to be built, in precisely the version found in the environment. This approach leverages the full power of conda's version resolution logic to free the user from the tedious and error-prone task of maintaining config files manually.Jinja2 variable
{{ installed }}
This variable is a dictionary whose keys are the package names found in the current
_build
environment, and the corresponding values contain the complete metadata from_build/conda-meta/package_name.json
. This information can be used to configure run requirements inmeta.yaml
at build time with up-to-date information from the build itself:This pins the run requirements to the exact dependency versions present during the build. Of course, strict run requirements like this are a bit too pessimistic to be practical, so it is desirable to suitably relax the constraints. A solution using jinja filters is described in the next section. Another good idea would be to adapt the "compatible release" operator
~=
from PEP 440 to conda-build, so that one could writeto permit all versions that are considered compatible with the build dependency.
Jinja2 configuration callback
If the recipe directory contains a file
jinja_config.py
that defines a functionjinja_config(jinja_env)
, this function is called by conda-build just before asking jinja to parsemeta.yaml
. This allows the user to add additional variables and filters to the jinja namespace. A useful application of this capability is a filter that approximates the behavior of the compatible release operator~=
by transforming a version number like1.2.3
into a version constraint like1.2*,>=1.2.3
. A simple implementation of the callback function might look like this:and could be used in
meta.yaml
via the pipe operator|
The callbeck mechanism would provide recipe designers a very useful tool for experimentation with jinja magic. Eventually, the best ideas will be implemented natively within conda-build to make them conveniently accessible to everyone. (Then, the configuration callback will mainly serve as a last resort.)