Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support annotations in bzlmod's pip extension #1213

Closed
jsharpe opened this issue May 5, 2023 · 23 comments · Fixed by #1278
Closed

Support annotations in bzlmod's pip extension #1213

jsharpe opened this issue May 5, 2023 · 23 comments · Fixed by #1278
Labels
type: bzlmod bzlmod work

Comments

@jsharpe
Copy link
Contributor

jsharpe commented May 5, 2023

🚀 feature request

Description

The pip extension is currently missing support for the annotation API.

Describe the solution you'd like

Probably this needs to be implemented as a new tag class on the extension to define annotations for packages loaded from pip because you can't load the macro that is needed for the package_annotation.

Describe alternatives you've considered

I've attempted to put the code in a custom extension but obviously this isn't an ideal solution.

@chrislovecnm
Copy link
Collaborator

@jsharpe can you provide more context?

@jsharpe
Copy link
Contributor Author

jsharpe commented May 8, 2023

The issue is you can't load package_annotation in a MODULE.bazel file and so you can't construct the arguments to pass to the annotations parameter.

for instance you can't easily translate the following WORKSPACE snippet:

load("@rules_python//python:pip.bzl", "package_annotation", "pip_parse")

ANNOTATIONS = {
    "mpi4py": package_annotation(
        additive_build_content = """\
cc_library(
    name = "hdrs",
    hdrs = glob(["site-packages/mpi4py/include/mpi4py/*.h"]),
    includes = ["site-packages/mpi4py/include"],
    visibility = ["//visibility:public"],
)
""",
    ),
}

pip_parse(
    name = "pypi",
    annotations = ANNOTATIONS,
    python_interpreter_target = interpreter,
    requirements_lock = "//python:requirements_lock.txt",
)

@chrislovecnm
Copy link
Collaborator

chrislovecnm commented May 8, 2023

It got yah. If we modify the following example to use bzlmod, is that example an accurate test?

https://github.com/bazelbuild/rules_python/tree/main/examples/pip_repository_annotations

The fix is pretty doable. A lot like how we fixed the interpreter. We need a tag class that calls a repository rule that creates a BUILD file with the package_annotations in it. We then reference that Label in the pip_parse.

The following is the repository rule that we used to implement the interpreter.

def _host_hub_impl(repo_ctx):

@jsharpe
Copy link
Contributor Author

jsharpe commented May 9, 2023

Yes, updating that example will be sufficient.

@chrislovecnm chrislovecnm added the type: bzlmod bzlmod work label May 17, 2023
@chrislovecnm
Copy link
Collaborator

@jsharpe do you always have to use package_annotation with annotations?

We would need a new tag class that uses a repository rule to write out a file with the JSON-encoded contents like package_annotation does. We would pass to pip.parse a dict of labels:strings, and then flip the dict around to build the contents.

@chrislovecnm
Copy link
Collaborator

chrislovecnm commented Jun 14, 2023

I’m thinking that we have annotation files. The only thing that the rule package annotation is doing is json encoding the data. Rules rust is also using json files to set the annotations. @rickeylev thoughts?

@jsharpe
Copy link
Contributor Author

jsharpe commented Jun 14, 2023

I'd be happy with annotation files - I only make use of the additive_build_content option of package_annotation; I don't have any need for the other options; I guess the approach that gazelle's go generation would be the more general approach of applying diffs to the generated BUILD files?

@jsharpe
Copy link
Contributor Author

jsharpe commented Jun 14, 2023

In fact I'd go as far to say I don't want to be writing BUILD file content as a string in my MODULE.bazel file - especially as it all has to be in a single file.

@rickeylev
Copy link
Collaborator

Yeah, +1 to specifying a file. jsharpe makes good points.

Applying a more general patch is an interesting idea, too, though I'm not sure what modifications (i.e. not just appending content) people would want to apply?

For the particular case of adding a cc_library for e.g. headers, I think ideally we'd somehow generate that as part of the regular rule, though I'm not sure what, if anything, in a wheel tells us that info.

@rickeylev
Copy link
Collaborator

Another idea: Allow a wheel to have a build file. Basically check if it has a BUILD.bazel file within it. If so, use that instead of generating one.

@chrislovecnm
Copy link
Collaborator

Another idea: Allow a wheel to have a build file. Basically check if it has a BUILD.bazel file within it. If so, use that instead of generating one.

Good interesting idea, but that is much more work than adding the annotations as files. What are the benefits?

@chrislovecnm
Copy link
Collaborator

chrislovecnm commented Jun 15, 2023

@jsharpe and @rickeylev, we have a couple of different options.

Currently, the annotations attribute is a string_dict. But I recommend that have a new attribute name for bzlmod.

Either we put all of the annotations into a file that contains a JSON encoded dict[whl_name] that has the values of the annotations, or we create an attribute that is a string_dict, and we have a key that is the whl_name and a value is the name of the file that contains the JSON encoded data for the wheels annotations.

Either

annotation_files["requests"] = ':requests-annotations.json

or

annotation_file = ":all-of-my-annotations.json"

So either we put everything in a file that we can decode, or we have multiple files as values in a dict keyed by the wheel name.

Thoughts?

@rickeylev
Copy link
Collaborator

Good interesting idea, but that is much more work than adding the annotations as files. What are the benefits?

It's not a lot of work on our end, not technical-wise. The benefit is the owning library gets control over how their library appears when used with Bazel. SO if a library wants to expose an extra cc_library() file of your headers, or more discrete targets, or other some such, the owning library can define it once instead of every user having to inject extra build file content themselves.

As a bonus, the build file is automatically versioned with their release, so it can reflect any particulars of that whl version. Doing the equiv in our pip rules would probably be a pain (a user would have to have some way to associate particular build content to a version)

dict[name, file] vs single json file

Between those two choices, a single json file sounds more appealing, I think? My thinking is, if you have multiple pip versions, you'll have to specify the annotation for each pip version. The content is probably the same the vast majority of the time.

That said, writing build file content within a json file doesn't sound very appealing. I mean it works. That's all that's in the json file, right? Just a name -> build content map?

Maybe a list of files, where the filename is the whl name? Can we take a directory as input? Does glob() work in MODULE files?

@jsharpe
Copy link
Contributor Author

jsharpe commented Jun 16, 2023

That said, writing build file content within a json file doesn't sound very appealing. I mean it works. That's all that's in the json file, right? Just a name -> build content map?

Maybe a list of files, where the filename is the whl name? Can we take a directory as input? Does glob() work in MODULE files?

Strongly agree with this point - I'd prefer my build file content used in annotations to sit in a file that will have editor support for formatting and linting which json will not.

@arrdem
Copy link
Contributor

arrdem commented Jun 16, 2023

I would not bet on wheels ever having BUILD.bazel files embedded in them. The majority of our deps will probably be Bazel-oblivious forever.

I'm strongly in favor of adopting rules_go style arbitrary patchfiles phased between laying down the template BUILD and WORKSPACE resources and performing pip actions. Part of the value of how rules_go does it is you can patch stuff if you have to without vendoring it properly.

Definitely against trying to embed BUILD content in a JSON document. Even in the small I agree with @jsharpe it'd be extremely awkward to work with.

@chrislovecnm
Copy link
Collaborator

chrislovecnm commented Jun 16, 2023

So we have a lot of functionality being put into an "annotation." We have:

def package_annotation(
additive_build_content = None,
copy_files = {},
copy_executables = {},
data = [],
data_exclude_glob = [],
srcs_exclude_glob = []):
"""Annotations to apply to the BUILD file content from package generated from a `pip_repository` rule.
[cf]: https://github.com/bazelbuild/bazel-skylib/blob/main/docs/copy_file_doc.md
Args:
additive_build_content (str, optional): Raw text to add to the generated `BUILD` file of a package.
copy_files (dict, optional): A mapping of `src` and `out` files for [@bazel_skylib//rules:copy_file.bzl][cf]
copy_executables (dict, optional): A mapping of `src` and `out` files for
[@bazel_skylib//rules:copy_file.bzl][cf]. Targets generated here will also be flagged as
executable.
data (list, optional): A list of labels to add as `data` dependencies to the generated `py_library` target.
data_exclude_glob (list, optional): A list of exclude glob patterns to add as `data` to the generated
`py_library` target.
srcs_exclude_glob (list, optional): A list of labels to add as `srcs` to the generated `py_library` target.

This is then handled using the wheel_installer.py https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/tools/wheel_installer/wheel_installer.py

We have built quasi patch like functionality. I say quasi since we are editing parts of the BUILD.bazel file as generated and not applying a full patch. For instance, we do this with

data_exclude = list(
set(
[
"**/* *",
"**/*.py",
"**/*.pyc",
"**/*.pyc.*", # During pyc creation, temp files named *.pyc.NNNN are created
# RECORD is known to contain sha256 checksums of files which might include the checksums
# of generated files produced when wheels are installed. The file is ignored to avoid
# Bazel caching issues.
"**/*.dist-info/RECORD",
]
+ data_exclude
)
)
the data exclude.

We need a format that the wheel_installer.py can read, basically deserialize.

@chrislovecnm
Copy link
Collaborator

I'm thinking, unless we want to do a bunch of work, I can create an extension that looks like this:

# You call this for each of your wheel annotations
annotation.create(
    wheel_name = "foo",
    additive_build_content = "",
    copy_files = {},
   # etc
)

You can then do:

pip.parse(
    annotation_files = { "foo" : "@wheel_annotations:foo.json"},
)

That will give us the same functionality that we have now. If we want to rework this, then we are modifying wheel_installer.py.

@chrislovecnm
Copy link
Collaborator

Not to myself that we need a name parameter. We may have the same wheel name in different hubs or python versions.

@rickeylev
Copy link
Collaborator

Oh geeze, I didn't know annotations had so many settings! I thought they were just extra build content.

More thoughts.


An extension isn't a bad idea. It's one of the few ideas that lets us capture all those args so they can be used later. With the wheel name in the annotation.create() call, I don't think we need the annotation_files arg to be a map?

An annoying catch, though: the extension is going to create the repos in the rules_python namespace. So all the names have to be unique across modules (again).


Another idea: specify the path to a bzl file. When we generate the BUILD file, we load that path and pass through the various args.


How about a json config, but we make build_file_content a path to a file? e.g.

# foo.json
{
  "additive_build_file": "extra.txt"
}

The value could even be a label; it just needs to be resolveable by rctx.path.


Allow passing either a json config file, or just a path to a file. If it's not a json file, assume it's additive build content and treat it as such. I'm assuming only specifying additive build content is the common case.

@chrislovecnm
Copy link
Collaborator

@rickeylev, the only thing I can see that makes sense as a flat file is build contents. Otherwise, we have lists and dicts.

@chrislovecnm
Copy link
Collaborator

PTAL at #1278

@rickeylev
Copy link
Collaborator

I'm strongly in favor of adopting rules_go style arbitrary patchfiles phased between laying down the template BUILD and WORKSPACE resources and performing pip actions.

@arrdem Where is this shown? I couldn't find this in the rules_go docs.

@jsharpe
Copy link
Contributor Author

jsharpe commented Jun 20, 2023

I'm strongly in favor of adopting rules_go style arbitrary patchfiles phased between laying down the template BUILD and WORKSPACE resources and performing pip actions.

@arrdem Where is this shown? I couldn't find this in the rules_go docs.

This is a gazelle feature rather than rules_go AFAIK..

github-merge-queue bot pushed a commit that referenced this issue Jul 10, 2023
This commit implements a bzlmod extension that allows users to create
"annotations" for wheel builds. The wheel_builder.py accepts a JSON file
via a parameter called annotations; this extension creates those JSON
files. The pip extension accepts a Label -> String dict argument of the
JSON files.

This feature is renamed to `whl_mods` because the JSON files are handled
differently
and the name "annotations" is uninformative. This modifies the creation
of the BUILD
files and their content, and is much more than just adding some notes
about a whl.

The whl_mod extension wheel names and the wheel names in pip must match.

Closes: #1213
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bzlmod bzlmod work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants