Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cc_toolchain to be defined across multiple repositories #7746

Open
nathaniel-brough opened this issue Mar 18, 2019 · 25 comments
Open

Allow cc_toolchain to be defined across multiple repositories #7746

nathaniel-brough opened this issue Mar 18, 2019 · 25 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: support / not a bug (process)

Comments

@nathaniel-brough
Copy link
Contributor

Description of the problem / feature request:

Currently the workflow to define a cc_toolchain is a good fit for a monorepo. However it is difficult to have a toolchain configuration span across multiple repos.

Why would I like to be able to have a toolchain span multiple repos?
I would like to share a configurable toolchain as an open source project. Therefore I would like to be able to download and use a "CcToolchainConfigInfo" provider rule from an external repository (e.g. using http_archive).

Feature requests: what underlying problem are you trying to solve with this feature?

To my knowledge this is not currently possible to span a toolchain config across multiple repositories. This is due to the fact that tool_path from "@bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl" is relative to the package directory. The problem with this is that when you load a toolchain config rule from an external repository the relative paths are relative to where the rule is instantiated rather than where the rule is defined.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Follow the tutorial on configuring c++ toolchains. Then attempt to load and use cc_toolchain_config from a different workspace. Doing so will result in an error where the given toolpath cannot be found.

What operating system are you running Bazel on?

Manjaro Linux Rolling Release

What's the output of bazel info release?

release 0.23.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

N/A

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

I have found that repository rules are a potential workaround. This works by passing through the attrs from the repository_rule through to a toolchain config rule. This bypasses the problem of relative paths by instantiating the config rule in the given repository. For our particular use case this is a fairly heavy weight solution as it involves creating a new repository for each of our targets. We currently have about 10 different configurations for different cc_toolchain rules, as we target a number of different embedded devices.

Any other information, logs, or outputs that you want to share?

I think that a potential solution would be to include a new rule that can reference a target rather than a relative path. e.g.

@toolchain_workspace//:BUILD

sh_binary(
    name = "gcc",
    srcs = ["gcc_wrapper.sh"],
)

@toolchain_workspace//:cc_toolchain_config.bzl

def _impl(ctx):
    tool_paths = [
        # Create a new toolchain target rule 
        tool_target(
            name = "gcc",
            target = ctx.attr._gcc_executable,
        ),
    # Truncated
    ]
    return cc_common.create_cc_toolchain_config_info(
        ctx = ctx,
        # Truncated
        tool_paths = tool_paths,
    )

cc_toolchain_config = rule(
    implementation = _impl,
    attrs = {
        "_gcc_executable" = attr.label(default="@toolchain_workspace//:gcc"),
        "foo" = attr.string(),
    },
    provides = [CcToolchainConfigInfo],
)

This should support the following usage;
//:BUILD

load("@toolchain_workspace//:cc_toolchain_config.bzl", "cc_toolchain_config")
cc_toolchain(
    name = "my_toolchain",
    toolchain_config = ":my_config",
)
cc_toolchain_config(
   name = "my_config",
   foo = "foo_param",
)
@aiuto aiuto added untriaged team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Mar 18, 2019
@dslomov dslomov added team-Configurability platforms, toolchains, cquery, select(), config transitions and removed team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. labels Mar 21, 2019
@programmablereya programmablereya added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Mar 21, 2019
@katre katre added team-Rules-CPP Issues for C++ rules untriaged and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions labels Mar 21, 2019
@katre katre removed their assignment Mar 21, 2019
@katre
Copy link
Member

katre commented Mar 21, 2019

Re-assigning to team-rules-cpp for triage: this is strictly part of the setup of cc toolchains, and doesn't impact toolchain resolution at all.

@scentini scentini added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 28, 2019
@scentini
Copy link
Contributor

cc/ @hlopko

@hlopko
Copy link
Member

hlopko commented Mar 29, 2019

I think with Crosstool in Starlark you cat get get quite far, and the last step can be simplified. I played with this a bit. I have a cc_toolchain_config rule:

cc_toolchain_config = rule(
    implementation = _impl,
    attrs = {
        "cpu": attr.string(mandatory = True),
        "compiler": attr.string(),
        "header": attr.label(allow_single_file = True, mandatory = True,),
        "gcc": attr.label(allow_single_file = True, mandatory = True,),
    },
    provides = [CcToolchainConfigInfo],
)

And I instantiate this rule like this:

cc_toolchain_config(
    name = "local",
    cpu = "k8",
    compiler = "compiler",
    header = "@tools//:my/cool/header/header.h",
    gcc = "@tools//:my/cool/gcc.sh",
)

Now this allows me for example to get the path of the package where header is for isystem:

include_directories = [ctx.file.header.owner.package]

This also allows me to get the path to the gcc no matter which external repository it is by:

ctx.file.gcc.path

The problem is that path are assumed to be relative to the package in which cc_toolchain_config target is, as you correctly pointed out. Another problem is that paths must be normalized (#5809) so you cannot walk up from the package to a different repository.

Is my understanding correct that solving this last problem is what this issue is about?

@nathaniel-brough
Copy link
Contributor Author

The problem is that path are assumed to be relative to the package in which cc_toolchain_config target is, as you correctly pointed out.

I believe that is the problem

Another problem is that paths must be normalized (#5809) so you cannot walk up from the package to a different repository.

I did try using un-normalized paths and failed running into the same problem as in the pull request you mentioned

@nathaniel-brough
Copy link
Contributor Author

To answer your question; yes, I do believe your understanding of the problem is correct

@deeglaze
Copy link
Contributor

deeglaze commented Apr 8, 2019

The Asylo project (github.com/google/asylo) has a similar use case. We have a toolchain in a repository @com_google_asylo_toolchain that inserts include paths to the @com_google_asylo workspace by duplicating the -I flags across asylo/path and external/com_google_asylo/asylo/path prefixes. The includes are to supplemental POSIX and system header files that trampoline into our runtime for actions like secure I/O and crypographically-secure randomness from /dev/urandom.

I ran into the relative path problem a year ago, and resolved it with a repository_rule that takes the absolute path to the toolchain installation directory as an attribute and then creates a symlink so that paths can stay relative https://github.com/google/asylo/blob/master/asylo/bazel/asylo_deps.bzl#L102

Splitting a toolchain across repos is still preferable to extra machinery around the POSIX and system includes to break the cyclic dependency since the solutions I've come up with all involve having to reinstall the toolchain whenever we add more POSIX support.

@hlopko
Copy link
Member

hlopko commented Apr 17, 2019

@deeglaze if it's only for include paths, cannot you use something like include_directories = [ctx.file.header.owner.package] in your C++ toolchain? That way you might be able to decouple @com_google_asylo from knowing where include directories from @com_google_asylo_toolchain are. But I admit I don't think I understand exactly what is the problem.

@ghost
Copy link

ghost commented Apr 23, 2019

The problem is that path are assumed to be relative to the package in which cc_toolchain_config target is, as you correctly pointed out. Another problem is that paths must be normalized (#5809) so you cannot walk up from the package to a different repository.

Is my understanding correct that solving this last problem is what this issue is about?

@hlopko So, where do we go from here? :)

I noticed that CcToolchainProviderHelper.resolveIncludeDir has special support for path prefixes like %workspace%/ and %package(...)/. Would it make sense for CppToolchainInfo to support the same? I haven't tried it yet, but I think if CppToolchainInfo handled %workspace%/ similarly, then "%workspace%/" + ctx.file.gcc.path could do the trick.

Update: I happened upon 473ea10 and realized it's a bit more involved than I thought.

@nathaniel-brough
Copy link
Contributor Author

Is there a particular reason why relative paths are used, or is it more of a side effect of the original CROSSTOOL configurations being in proto rather than starlark?

The behaviour seems a little odd and from my perspective doesn't feel consistent with the rest of bazel's starlark API.

I would be happy to contribute towards a fix. However I have limited understanding of bazel internals and rudimentary experience with java so it may be a long time before I get to it.

Any ideas on a general approach for a fix, that might minimise breaking changes and avoid introducing other issues?

@hlopko
Copy link
Member

hlopko commented May 15, 2019

Relative paths are just a side effect of the history.

I also think this is odd, but so far we didn't find enough motivation to fix it. So there are 2 use cases:

  1. we need to be able to express absolute paths somehow for non hermetic toolchains (unless we decide we don't want to support non hermetic toolchains, which is too deep question for a github issue reply :)
  2. we need to be able to express relative paths - for these using Artifacts (e.g. ctx.file.foo) makes a lot of sense. But it will be confusing on another level. Now you have to pass your 'gcc' artifact to CcToolchainConfigInfo rule for the command line, and also to 'cc_toolchain' for action inputs. And it's hard to unify those 2 uses - we need to be able to configure tool paths using features, and we definitely don't want to teach cc_toolchain about all the intricacies of CcToolchainConfigInfo.

If @silvergasp you want to contribute smth, then adding file attribute to https://source.bazel.build/bazel/+/8bd7e4822ed96615ccac77cc40ca96eddc74c510:tools/cpp/cc_toolchain_config_lib.bzl;l=421 and using that in C++ rules might make sense. @scentini wdyt?

@nathaniel-brough
Copy link
Contributor Author

I have finally had a bit of time to look into this a little further. Mostly just trying to familiarise myself with the source code. I would just like to consolidate my current understanding before I start putting any significant amount of time into this.

  1. The ToolInfo provider that you suggested I add a file attribute to, is used for only for the ActionConfigInfo provider (Can you confirm this?).
  2. This path is then retrieved by the c++ rules through cc_common.get_tool_for_action() (here), and executed using ctx.actions.run() (here).
  3. I would suggest that I attempt an implementation using the new skylark API. Implementing the rules_cc and modifying cc_common.get_tool_for_action() to handle both file and path attributes, with some form of conflict resolution (If the user decides to define both). Would I be on the right track here or am I missing something significant?

I admit I did look through the current java implementations for the cc rules and found it too difficult to follow in the time I have available for this.

Is the plan to support both native cc_rules rules and skylark cc_rules, or to move exclusively to skylark?

Are there any other updates on this so far? I would like to avoid duplicating effort :)

@katre
Copy link
Member

katre commented Aug 1, 2019

Cc @scentini since @hlopko is still out.

@scentini
Copy link
Contributor

scentini commented Aug 1, 2019

There is no ongoing effort on this just yet, so feel free to go ahead.
Long term plan is to implement the native rules in Starlark, but that is O(years) :)
Note that currently (close to) all of the cc rules are implemented in Java.

  1. Correct.
  2. The cc_common.get_tool_for_action() is exposed to Starlark, but it is actually implemented in Java (here) and can be used in custom Starlark code, like the example you found, but is also extensively used by the native rules. So, you can't really touch only the Starlark part.
    However, looks like you don't really have to modify that method. Instead you can modify the constructor for the ToolInfo provider, which is here, to take into account a new, file attribute, and to obtain the path from that attribute, if specified.

@elklein
Copy link
Contributor

elklein commented Feb 10, 2020

I took a look into implementing this. I have it working locally, but I believe my implementation is incomplete. The part I'm unsure about is the interaction between the CcToolchainFeatures.Tool class and the CToolChain.Tool protobuf. I've added an Artifact member to CcToolchainFeatures.Tool, but I'm not sure how to serialize/deserialize that into and out of the protobuf. I can guess how you might get a string out of it to serialize, but when it comes to deserializing via the private constructor, I really have no idea. I'm assuming that the protobuf code is for sending the toolchain config across the wire for remote execution, but... I'm just guessing there. Obviously this is going to be incomplete without that bit.

Any suggestions on how to proceed? I'm happy to cook up a pull request so you can see what I've done, but obviously it wouldn't be a final product yet.

@oquenchil oquenchil added type: support / not a bug (process) P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 19, 2020
@jheaff1
Copy link
Contributor

jheaff1 commented Mar 30, 2021

Is there any update on this?

It appears that rules_foreign_cc does not work with non-hermetic toolchains (and wrapper scripts provided to the tool_paths required by cc_common.create_cc_toolchain_config_info). See #8438 (comment)

@a-pushkin
Copy link

New year, new check for updates ;)
It would be great to be able to reference tools from external packages without wrapper scripts.

@nathaniel-brough
Copy link
Contributor Author

This has mostly been addressed in #10967.

I think there is a little bit more work to do before closing this;

  • Improve the docs, around defining tool path/targets using the action_config/tool functionality added in Add optional tool_path_origin to Tools in C++ crosstool #10967.
  • Update the cc_toolchain tutorial to prefer using the new action_config based approach
  • Potentially deprecate the tool_path field altogether in favour of the action_config's? It seems rather strange to have two entirely separate mechanisms to define tool paths/targets.

I'd be happy to create a PR updating the docs/tutorials if that is something that would be considered useful?

@nathaniel-brough
Copy link
Contributor Author

An additional point would be that bazelbuild/rules_cc#72 would probably need to be merged to match the main Bazel repository as well.

@kalcutter
Copy link

I am confused how to use this. My understanding is, now you can use tool(tool = ...) instead of tool_path(). I tried something like:

cc_common.create_cc_toolchain_config_info(
    ...
    compiler = tool(tool = ...),
)

But when do this I get:

Error in create_cc_toolchain_config_info: in call to create_cc_toolchain_config_info(), parameter 'compiler' got value of type 'struct', want 'string'

@nathaniel-brough
Copy link
Contributor Author

You need to use the tool constructor with an action_config.

I answered a StackOverflow question earlier today on this, if you are looking for an example.

https://stackoverflow.com/a/73505313/4955829

@nathaniel-brough
Copy link
Contributor Author

Personally I think the tool constructor could probably entirely replace the tool_path constructor. I think having both just gets confusing, with very little added value.

@bedbad
Copy link

bedbad commented Feb 17, 2023

I agree. The architecture of toolchain config is unduly affected by the gnu/binutils way which in turn was set due to the Unix way having a single separate executable for every singular task. Other toolchains, like zig, which is a very good match for Bazel uses zig approach and writing 4 - 7 wrapper script files to then put them in the tool_paths shouldn't be the default approach of Bazel user for something as common as specifying toolchain for cross-compilation.

@fmeum
Copy link
Collaborator

fmeum commented Feb 17, 2023

The recently created design doc at https://docs.google.com/document/d/1-etGNsPneQ8W7MBMxtLloEq-Jj9ng1G-Pip-cWtTg_Y/edit?usp=sharing by the author of the issue should probably be mentioned here.

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 23, 2024
@fmeum
Copy link
Collaborator

fmeum commented Apr 23, 2024

Between https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313 and https://github.com/bazelbuild/rules_cc/blob/main/cc/toolchains/README.md, this appears to be mostly resolved (although still under-documented).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests