Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aliases fail to be equivalent when matching toolchains and platforms #14605

Closed
cpsauer opened this issue Jan 20, 2022 · 13 comments
Closed

Aliases fail to be equivalent when matching toolchains and platforms #14605

cpsauer opened this issue Jan 20, 2022 · 13 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@cpsauer
Copy link
Contributor

cpsauer commented Jan 20, 2022

Hi, wonderful Bazel folks. Thanks for all the great work you do, including extending official support for platforms.

Update: This wasn't at all the issue I thought it was. Please read through the discussion to see where it went.

This is a bug report at the intersection of aliases, platforms, and toolchains.
It shouldn't be high priority, but it definitely does seem like unintended behavior that pins people to a deprecated alias.

I think the easiest way to explain is via an example to reproduce the issue:

Update: Use the reproducing workspace in this comment below.

When you run it, you'll see the following output:

Building a macOS binary rule targeting that cpu (--macos_cpus=arm64) fails with:

No matching toolchains found for types @bazel_tools//tools/cpp:toolchain_type.

And building with --toolchain_resolution_debug=@bazel_tools//tools/cpp:toolchain_type to debug:

INFO: ToolchainResolution:     Type @bazel_tools//tools/cpp:toolchain_type: target platform //:macos-arm64: Rejected toolchain @local_config_cc//:cc-compiler-darwin_arm64; mismatching values: aarch64

...which indicates that @platforms//cpu:arm64 is failing to match @platforms//cpu:aarch64, which, recall, is an alias to it. (There's the bug.)

And indeed, changing the config setting to match (the deprecated alias) @platforms//cpu:aarch64 works. Strange!

I'm seeing this on the latest macOS (12.1) with the latest Bazel rolling (6.0.0-pre.20220112.2).

Thanks so much for taking a peek!
Chris
(ex-Googler)

@aiuto aiuto added team-Configurability platforms, toolchains, cquery, select(), config transitions untriaged labels Jan 20, 2022
@gregestren
Copy link
Contributor

Aliases are perpetual bug fodder. :( Thanks for this report. Usually alias problems, annoying as they are, are pretty straightforward to fix.

@gregestren gregestren self-assigned this Jan 26, 2022
@cpsauer
Copy link
Contributor Author

cpsauer commented Jan 29, 2022

Thanks for being awesome, Greg :)

As an aside, John and I were going to get in touch about the platform mappings I'd cooked up that led to this. Parallel to the iOS convo we'd had, but he'd responded in Android land. If you'd be interested on the iOS side or more broadly, just shoot me an email, too, and I'll keep you looped in. See #9012 (comment)

@gregestren
Copy link
Contributor

gregestren commented Feb 1, 2022

I'm having trouble replicating this. I tried reducing to a minimal repro and every combo worked for me. See https://gist.github.com/gregestren/f740512e322c51cca160a5937e763f1c.

Can you trace through your build logic more precisely? Particularly:

  • Do you think platform mappings are related (I suspect not, but still worth checking)
  • Where are you consuming the config_setting?
  • Is the config_setting really triggering this? That doesn't look like an error for config_setting matching.
  • Who exactly is setting aarch64?

@gregestren gregestren added the P1 I'll work on this now. (Assignee required) label Feb 1, 2022
@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 4, 2022

Uh oh! Sorry, Greg for not getting you enough info on round 1.

A combined response to the the first two questions:

Platform mappings are where I was seeing the bug, and where I was consuming the config_setting. Sounds like you've checked the other cases, so platform_mappings is my new hypothesis.

  • Who exactly is setting aarch64?

My understanding is that this comes from the setup of the default macOS toolchain. If you're on gLinux (wow, I didn't know Goobuntu had been replaced!) then we should be able to setup a parallel example for android NDK toolchains. NDK example, switching everything in the above to android from macOS, including setting --fat_apk_cpu=arm64-v8a would do it, I should think.

  • Is the config_setting really triggering this? That doesn't look like an error for config_setting matching.

^ Sorry--not quite sure what you're asking with this last one.

Thanks,
Chris

@gregestren
Copy link
Contributor

I'm sorry, but I'm still having trouble reproducing. I'm pretty sure I've confirmed aliases are followed for platform/toolchain resolution (so a platform with aarch64 matches a toolchain with arm64 or vice versa).

To move this forward, can you provide precise reproduction steps? Like a sample git project one could check out and a specific build command?

Re: my earlier questions, I'm confused where config_setting fits into the above. The config_setting declares a set of conditions that a select() may match again. It shouldn't appear in the destination of a platform mapping. So for:

flags:
    --cpu=macos_arm64
           //:macos-arm64

I'd expect //:macos-arm64 to be a platform(), not a config_setting().

I've tried reproducing with:

platform(
    name = "macos-arm64",
    constraint_values = [
        "@platforms//os:osx",
        "@platforms//cpu:arm64",
    ],
)

And I never see the value aarch64 in --toolchain_resolution_debug output: only arm64.

It's possible our default toolchains are configured differently (I'm doing this from a Mac too but it could default to a different architecture). But I'd expect --ios_multi_cpus to override system defaults.

Also FYI I can run:

$ bazel query @local_config_cc//:cc-compiler-darwin_arm64 --output=build
apple_cc_toolchain(
  name = "cc-compiler-darwin_arm64",
  all_files = "@local_config_cc//:osx_tools_darwin_arm64",
  compiler_files = "@local_config_cc//:osx_tools_darwin_arm64",
  strip_files = "@local_config_cc//:osx_tools_darwin_arm64",
  objcopy_files = "@local_config_cc//:empty",
  as_files = "@local_config_cc//:osx_tools_darwin_arm64",
  ar_files = "@local_config_cc//:osx_tools_darwin_arm64",
  linker_files = "@local_config_cc//:osx_tools_darwin_arm64",
  dwp_files = "@local_config_cc//:empty",
  supports_param_files = True,
  toolchain_identifier = "darwin_arm64",
  toolchain_config = "@local_config_cc//:darwin_arm64",
)

and

$ bazel query @local_config_cc_toolchains//:cc-toolchain-darwin_arm64-aarch64 --output=build
toolchain(
  name = "cc-toolchain-darwin_arm64-aarch64",
  exec_compatible_with = ["@platforms//os:osx", "@platforms//cpu:aarch64"],
  toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
  target_compatible_with = ["@platforms//os:osx", "@platforms//cpu:aarch64"],
  toolchain = "@local_config_cc//:cc-compiler-darwin_arm64",
)

The top command shows the definition of the toolchain in question. The bottom shows the platform / constraint bindings applied to that toolchain. That's helpful for basic diagnostics, at the very least. I had to dig through some magic to identify the correct constraint label @local_config_cc_toolchains//:cc-toolchain-darwin_arm64-aarch64 because Bazel specially sets up C++ toolchains with custom logic.

Sorry I can't more faithfully reproduce. I'm lowering this to a P3 since there's no clear actionable yet. Happy to up-prioritize if we can consolidate a reproduction.

@gregestren gregestren added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P1 I'll work on this now. (Assignee required) labels Feb 16, 2022
@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 16, 2022

On it.

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 16, 2022

Have verified in our main repo, where I first observed it, using the latest rolling. Building a minimal example.

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 17, 2022

ooookay, after a bunch of fiddling (and some dinner), I've got a pretty small, self-contained workspace that reproduces the issue.

Grab platform_mapping_alias_repro.zip
Run bazel run :hello --toolchain_resolution_debug=@bazel_tools//tools/cpp:toolchain_type


But it's got a weird and unexpected twist--and a mea culpa. There was a surprising WORKSPACE change required to get into the problem case: I needed to load gRPC's extra deps to cause aarch64 to appear in --toolchain_resolution_debug, and for the above messages to appear. (Marked with a comment in the last line of WORKSPACE. Maybe worth toggling it!)

Absolutely on me for not creating a self-contained example from the get go--and I'm sorry for the time of yours I spent trying to reproduce, Greg. In retrospect, there's no way anyone would have figured it out without systematically cutting my existing repo down. So thanks for asking me to create an example when you did. And I also compounded the issue with some typo-level mistakes (like, for example copying over my constraint_value definition instead of my platform definition--we declare both in parallel because we need both). Anyway, I'm so sorry about those mistakes. I did intend well, though, I promise. I'd slimmed down the issue in my original workspace to what I thought was the minimal reproducing target. It just didn't occur that something else in WORKSPACE would be affecting the apple toolchains. I've also updated the original post to point down here to avoid confusing future readers.


Weirdness and mistakes as they were, I still think that this is a bug along the lines of the original report? Shouldn't those two match and be equivalent?

(If useful: The results of the toolchain queries you ran looked the same to me. And adding grpc_extra_deps() basically just changed arm64->aarch64 under --toolchain_resolution_debug.)

@pcjanzen
Copy link
Contributor

grpc_extra_deps pulls in go, which pulls in platforms at 681f1ee03, which is old enough to have unique constraints for aarch64 and arm64 (they are not in fact aliases of each other). try adding your own declaration of platforms at v0.0.3 or later.

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 17, 2022

🤦‍♂️ Man, that's got to be my biggest ever case of "not the bug I thought it was"

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 17, 2022

Okay, so before we close this down with tag: facepalm....

@pcjanzen, sounds like you might have known about this issue previously--or have had a good way of identifying it. I'm keen to learn about either :) So, do you know if there's an existing issue on this? I'd like to follow, or if not create one or help get a fix in. Unlikely that I'm the last person to run into this. And if you have a good way of quickly introspecting things like this, I'd love to learn it! (I can indeed go it and find the BUILD file with the separate definitions, but I don't know of a way to see there versioning or import path besides going through all the source.)

Update: Didn't see an issue, so I tossed up bazel-contrib/rules_go#3071 for consideration. Seems like the root, root cause is that maybe() doesn't protect against redefining built-in workspaces like platforms. However, if the expectation is that most users should instead bring in platforms manually, lmk and I'll close that one down.

@cpsauer cpsauer closed this as completed Feb 18, 2022
@pcjanzen
Copy link
Contributor

If you happen to be the sort of person who vaguely remembers things like this then it's pretty simple to bazel query --output build @platforms//cpu/... and confirm or deny your suspicions, then grep around in the output root for the offending repository. If not, look on the bright side, you're probably fun at parties.

Not sure what the right answer is here, other than just encouraging your dependencies to stay up-to-date. It seems like Bazel is doing the right thing by providing an implementation of last resort.

@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 18, 2022

Heh, well, thank you for your incredible cross-ref. Both fun and funny at this party. I appreciate your style of humor...and usually tend towards the same :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

No branches or pull requests

4 participants