Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new YAML based format for "conda-build" files #54

Merged
merged 13 commits into from
Oct 30, 2023
325 changes: 325 additions & 0 deletions cep-20.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,325 @@
# A new recipe format – part 1

<table>
<tr><td> Title </td><td> A new recipe format </td>
<tr><td> Status </td><td> Proposed </td></tr>
<tr><td> Author(s) </td><td> Wolf Vollprecht &lt;wolf@prefix.dev&gt;</td></tr>
<tr><td> Created </td><td> May 23, 2023</td></tr>
<tr><td> Updated </td><td> October 20, 2023</td></tr>
<tr><td> Discussion </td><td>https://github.com/conda-incubator/ceps/pull/54</td></tr>
<tr><td> Implementation </td><td>https://github.com/prefix-dev/rattler-build</td></tr>
</table>

## Abstract

We propose a new recipe format that is heavily inspired by conda-build. The main
change is a pure YAML format without arbitrary Jinja or comments with semantic
meaning.

## Motivation

The conda-build format has grown over the years to become quite complex.
Unfortunately it has never been formally "specified" and it has grown some
features over time that make it hard to parse as straightforward YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one really important motivating factor that isn't mentioned in the motivation section is increasing the ease by which we can parse and modify these files (making them more machine readable friendly). I know that there is currently a desire to be able to do this, but the old format holds us back. It's very hard to parse and multiple attempts have already been made to try to do this if I'm not mistaken.

I know it's mentioned below, but it might be worth emphasizing here briefly too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

The CEP attempts to introduce a subset of the conda build format that allows for
fast parsing and building of recipes.

### History

A discussion was started on what a new recipe spec could or should look like.
The fragments of this discussion can be found here:
https://github.com/mamba-org/conda-specs/blob/master/proposed_specs/recipe.md
The reason for a new spec are:

- Make it easier to parse ("pure yaml"). conda-build uses a mix of comments and
jinja to achieve a great deal of flexibility, but it's hard to parse the
recipe with a computer
- iron out some inconsistencies around multiple outputs (build vs. build/script
and more)
- remove any need for recursive parsing & solving
wolfv marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive parsing and solving was mainly to support things like inheriting specs, like run_constrained, from explicitly specified dependencies. Is there any kind of explicit spec for whether/how packages add/impose dependencies on sections outside of the one where they are specified?

What about the pin_compatible jinja? I'm getting the sense that it will be cut in this spec. I'm not sure whether you could impose limits on it to avoid the need to be recursive. It is very useful, so I hope it can stay somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In boa (and rattler-build) we have "deferred" evaluation for the pins, so that still works. We just require that the name of the pin is available, in order to build the proper topologically sorted build-graph :)

A dependency list in rattler-build contains an "enum" of type:

https://github.com/prefix-dev/rattler-build/blob/665c981a392a6ebeba5842eee1d968b8753b2dcd/src/render/dependency_list.rs#L22-L27

Indeed, pin_subpackage is not yet implemented but quite trivial to achieve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the package name it isn't known. For example cross-compilation usage or compilers themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham in these cases you can infer those variables from the build-time config though (or variant config) so no recursion needed :)

- cater to needs for automation and dependency tree analysis via a determinstic
format

## Major differences with conda-build

- no full Jinja2 support: no block support `{% set ...` support, only string
interpolation. Variables can be set in the toplevel "context" which is valid
YAML (all new features should be native to YAML specs)
- Jinja variable syntax is changed to begin with `${{` so that it becomes valid
wolfv marked this conversation as resolved.
Show resolved Hide resolved
YAML, e.g. `- ${{ version }}`
- Selectors use a YAML dictionary with `if / then / else` (vs. comments in
conda-build) and are only allowed in lists (dependencies, scripts, ...). The
syntax looks like:
```yaml
- if: win
then: this
else: that # optional
```
jaimergp marked this conversation as resolved.
Show resolved Hide resolved
- for inline values, the Jinja ternary operator can be used, e.g. `number: ${{ 0
jaimergp marked this conversation as resolved.
Show resolved Hide resolved
if linux else 100 }}`

## Selectors

Selectors in the new spec are only allowed in lists and take an explicit `if /
then / else` syntax.

For example the following `script` section:

```yaml
script:
- if: unix
then: |
# this is the unix script
- if: win
then: |
@rem a script for batch
```

The same could have been expressed with an `else`:

```yaml
script:
- if: unix
then: |
# this is the unix script
else: |
@rem a script for batch
```

This is a valid YAML dictionary. Selector if statements are simple boolean
expressions and follow Python syntax. The following selectors are all valid:

```
win and arm64
(osx or linux) and aarch64
something == "test"
```

If the value of a selector statement is a list, it extends the "outer" list. For
example:

```yaml
build:
- ${{ compiler('cxx') }}
- if: unix
then:
- make
- cmake
- pkg-config
```

evaluates for `unix == true` to a list with elements `[${{ compiler('cxx') }},
make, cmake, pkg-config]`.

### Preprocessing selectors

You can add selectors to any item, and the selector is evaluated in a
preprocessing stage. If a selector evaluates to `true`, the item is flattened
into the parent element. If a selector evaluates to `false`, the item is
removed.

```yaml
source:
- if: not win
then:
# note that we omit the `-`, both is valid
jaimergp marked this conversation as resolved.
Show resolved Hide resolved
url: http://path/to/unix/source
sha256: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
else:
- url: http://path/to/windows/source
sha256: 06f961b802bc46ee168555f066d28f4f0e9afdf3f88174c1ee6f9de004fc30a0
```

Because the selector is a valid Jinja expression, complicated logic is possible:

```yaml
source:
- if: win
then:
url: http://path/to/windows/source
- if: (unix and cmp(python, "2"))
then:
url: http://path/to/python2/unix/source
- if: unix and cmp(python, "3")
then:
url: http://path/to/python3/unix/source
```

Lists are automatically "merged" upwards, so it is possible to group multiple
items under a single selector:
wolfv marked this conversation as resolved.
Show resolved Hide resolved

```yaml
test:
commands:
- if: unix
then:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
- if: win
then:
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)

# On unix this is rendered to:
test:
commands:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
```

## Templating with Jinja

The spec supports simple Jinja templating in the `recipe.yaml` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth elaborating what "simple" means in this context, what wouldn't be covered by this?

Would this also work with MiniJinja for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple means:

  • Nothing that could go between {% ..., means no for loops, no set, no if / else etc.
  • Everything that can go between {{ ... }} is still supported, including the pipe operator, so that {{ version | lowercase }} and things work. They can be assigned to helper variables inside the context, e.g.
context:
  version: "0.1.2.POST1234"
  lower_version: "{{ version | lower }}"  # evaluates to 0.1.2.post1234
  • Inside the selectors, the Jinja expression language is used which only knows a few primitive types (str, int, float, ...).

This works fantastically with MiniJinja (which is what we use in rattler-build). However, MiniJinja would support much more than what we are currently supporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to preserve the meta.yaml filename and cause conda-build or rattler-build to detect the format from its contents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly since we are doing a bit of a redesign here, think it is worth asking whether we still want Jinja to play a role here. In other words, if we were not already using Jinja today, how would we have solved these use cases? How do others approach YAML templating? Could we think of pure YAML solutions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, what does jinja accomplish that we can't do with the other new features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well other new features would also have to be implemented and I am not sure we would necessary end up with something simpler.
I think there is a future where the entire build tool looks a bit different (and the recipe as well) and wolfi-os / melange have a nice approach: https://github.com/wolfi-dev/os/blob/main/libidn2.yaml

But I am not sure how to proceed here - we could either go down that road right now or build something that is more reasonable for people to port existing recipes to and then iterate in a next step on the recipe format (when we have the necessary infrastructure nicely implemented in rattler-build).


You can set up Jinja variables in the context YAML section:

```yaml
context:
name: "test"
version: "5.1.2"
major_version: ${{ version.split('.')[0] }}
```

Later in your `recipe.yaml` you can use these values in string interpolation
with Jinja. For example:

```yaml
source:
url: https://github.com/mamba-org/${{ name }}/v${{ version }}.tar.gz
```

Jinja has built-in support for some common string manipulations.

In the new spec, complex Jinja is completely disallowed as we try to produce
YAML that is valid at all times. So you should not use any `{% if ... %}` or
similar Jinja constructs that produce invalid YAML. We also do not use the
standard Jinja delimiters (`{{ .. }}`) because that is confused by the YAML
parser as a dictionary. We follow Github Actions and others and use `${{ ... }}`
instead:

```yaml
package:
name: {{ name }} # WRONG: invalid yaml
name: ${{ name }} # correct
```

Jinja functions work as usual. As an example, the `compiler` Jinja function will
look like this:

```yaml
requirements:
build:
- ${{ compiler('cxx') }}
```

## Shortcomings

Since we deliberately limit the amount of "Jinja" that is allowed in recipes
there will be several shortcomings.

For example, using a `{% for ... %}` loop is prohibited. After searching through
the conda-forge recipes with the Github search, we found for loops mostly used
in tests.

In our view, for loops are a nice helper, but not necessary for many tasks: the
same functionality could be achieved in a testing script, for example. At the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that it's generally possible to move stuff to test scripts, this is IMO very disadvantageous:

  • far away from meta.yaml (if they use the "magic" test script names, they're not even mentioned and then often not up-to-date because people forget to look at them)
  • duplication between platforms (aside from the pain of writing loops in batch)

For example, it would be a real pain to rewrite the following

        {% for each_lib in core_libs + core_cpp_libs + vendored_libs %}
        # presence of shared libs
        - test -f $PREFIX/lib/lib{{ each_lib }}.so              # [linux]
        - test -f $PREFIX/lib/lib{{ each_lib }}.dylib           # [osx]
        - if not exist %LIBRARY_BIN%\{{ each_lib }}.dll exit 1  # [win]
        - if not exist %LIBRARY_LIB%\{{ each_lib }}.lib exit 1  # [win]

        # absence of static libs (unix)
        - test ! -f $PREFIX/lib/lib{{ each_lib }}.a             # [unix]
        {% endfor %}

        # binaries
        {% for each_lang in binaries_plugin_langs %}
        - test -f $PREFIX/bin/grpc_{{ each_lang }}_plugin                    # [unix]
        - if not exist %LIBRARY_BIN%\grpc_{{ each_lang }}_plugin.exe exit 1  # [win]
        {% endfor %}

        # pkg-config (no metadata for vendored libs)
        {% for each_lib in core_libs %}
        - pkg-config --print-errors --exact-version "{{ core_version }}" {{ each_lib }}
        {% endfor %}
        {% for each_lib in core_cpp_libs %}
        - pkg-config --print-errors --exact-version "{{ version }}" {{ each_lib }}
        {% endfor %}

to the point that those tests would just not be written (or rather: maintained) as well as now (I moved them out of the heavily outdated test scripts for exactly that reason).

Now, it's possible that this is still a worthwhile trade-off compared to what "pure YAML" gets us, but it will reduce the quality of our testing (at least, as someone who's added a lot of test coverage to feedstocks I got involved with, I believe it will strongly impact my output that front).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two options:

  • We can allow Jinja (plus the values from context) inside the script text blocks. I wouldn't mind that too much since it doesn't affect the recipe execution at all.
  • You can rewrite your tests to take advantage of the new test helpers in boa. In that case it would look something like this:
      exists:
        lib:
          - gpr
          - grpc
          - grpc_unsecure
          - grpc++
          - grpc++_unsecure
          - address_sorting
          - upb
        glob:
          # absence of static libs (unix)
          - ~ **/*.a
        bin:
          - grpc_cpp_plugin
          - grpc_csharp_plugin
          - grpc_node_plugin
          - grpc_objective_c_plugin
          - grpc_php_plugin
          - grpc_python_plugin
          - grpc_ruby_plugin
        pkg_config: 
          - grpc
          - grpc++
          - grpc_unsecure
          - grpc++_unsecure
      commands:
        # final CMake test to compile examples
        - ./test_grpc.sh   # [unix]
        - ./test_grpc.bat  # [win]

Note that this is implemented in a prototype-y fashion in boa and not yet implemented in rattler, and we'd spec this out in another CEP. Of course, there is duplication of the core libs. We could think of ways of being able to refer to lists that are defined elsewhere to make that less repetitive but IMO that can come later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also hypothetically allow lists to be defined in the context, and then use some other syntax to extend other lists. For example:

context:
  core_libraries: ["grpc", "a", "b"...]

tests:
  exists:
    lib:
      - @ core_libraries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the comment I left in the other thread was meant for both conversations - sorry if that wasn't clear.

I think it'll be manageable in the way that you propose. Being able to define lists would be nice, but isn't critical IMO.

same time we also plan to formalize a more powerful testing harness ([prototyped
in
boa](https://github.com/mamba-org/boa/blob/main/docs/source/recipe_spec.md#check-for-file-existence-in-the-final-package)).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks very nice, but it's critically missing a way to check for the absence of files (which is crucial for many recipes, especially multi-output ones).

It would actually be a nice improvement over what conda currently does to check for absence of files in the output under testing, rather than just looking at the path in the prefix. That's because a given path might be satisfied -- intentionally -- by another output; for example, it's currently not possible to check1 that in a lib + bin + bindings setup, the binary output (that's depending on the lib) does not accidentally package any library artefacts.

Footnotes

  1. without an extra output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came up with a syntax to test the absence (negation) of the statement by using the ~ prefix. This shoudl be fairly straightforward to add. And as you suggested, these functions would operate on the paths.json of the package (and not the environment).

But on the other hand this is a bit off topic since this is not the right CEP for that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. conda doesn't do anything -- what people do is write ugly bash scripts that need special knowledge to handle Windows vs. Unix ... which is why testing is generally pretty bad on conda-forge. Another thing that we want to fix :) (but off-topic).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the gitignore syntax (e.g. ! for negation)? Devs are already familiar with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ! has a special meaning in YAML so you'd have to quote it ... :) YAML is a complex spec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy if there's a way to test absence of files, syntax is secondary to me. That said:

YAML is a complex spec

I'm not deeply enough in this whole discussion, but I'd be thrilled to move to TOML as well. One thing that scares me about the sel / if discussion is that YAML breaks horrifically if you get the indentation levels wrong within a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since many folks are already coming from Python, think significant whitespace is already an accepted thing

Not to say moving to a simpler markup language might be beneficial for us, but that seems like a separate discussion altogether. Right now Wolf is trying to just move use from 2 meta languages and 1 markup language to just 1 markup language, which would be a huge improvement over the status quo. Maybe a future discussion could be moving from 1 markup language to another (once we are fully in 1 :)


This could be used instead of a for loop to check the existence of shared
libraries or header files cross-platform (instead of relying on Jinja templates
as done
[here](https://github.com/conda-forge/opencv-feedstock/blob/2fc7848655ca65419050336fe38fcfd87bec0649/recipe/meta.yaml#L131)
or
[here](https://github.com/conda-forge/boost-cpp-feedstock/blob/699cfb6ec3488da8586833b1500b69133f052b6f/recipe/meta.yaml#L53)).

Other uses of `for` loops should be relatively easy to refactor, such as
[here](https://github.com/conda-forge/libiio-feedstock/blob/1351e5846b753e4ee85624acf3a14aee4bcf321d/recipe/meta.yaml#L45-L51).

However, since the new recipe format is "pure YAML" it is very easy to create
and pre-process these files using a script, or even generating them with Python
or any other scripting language. That means, many of the features that are
currently done with Jinja could be done with a simple pre-processing step in the
future.

Another option would be to allow "full" Jinja inside the test script text blocks
(as long as it doesn't change the structure of the YAML).

## Examples

### xtensor

Original recipe [found
here](https://github.com/conda-forge/xtensor-feedstock/blob/feaa4fd8015f96038168a9d67d69eaf06a36d63f/recipe/meta.yaml).

```yaml
context:
name: xtensor
version: 0.24.6
sha256: f87259b51aabafdd1183947747edfff4cff75d55375334f2e81cee6dc68ef655

package:
name: ${{ name|lower }}
version: ${{ version }}

source:
fn: ${{ name }}-${{ version }}.tar.gz
url: https://github.com/xtensor-stack/xtensor/archive/${{ version }}.tar.gz
sha256: ${{ sha256 }}

build:
number: 0
# note: in the new recipe format, `skip` is a list of conditional expressions
# but for the "YAML format" discussion we pretend that we still use the
# `skip: bool` syntax
skip: ${{ true if (win and vc14) }}
wolfv marked this conversation as resolved.
Show resolved Hide resolved

requirements:
build:
- ${{ compiler('cxx') }}
- cmake
- if: unix
then: make
host:
- xtl >=0.7,<0.8
run:
- xtl >=0.7,<0.8
run_constrained:
- xsimd >=8.0.3,<10

test:
commands:
- if: unix
then:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfigVersion.cmake
- if: win
then:
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)

about:
home: https://github.com/xtensor-stack/xtensor
license: BSD-3-Clause
license_family: BSD
license_file: LICENSE
summary: The C++ tensor algebra library
description: Multi dimensional arrays with broadcasting and lazy computing
doc_url: https://xtensor.readthedocs.io
dev_url: https://github.com/xtensor-stack/xtensor
jaimergp marked this conversation as resolved.
Show resolved Hide resolved

extra:
recipe-maintainers:
- some-maintainer
```