Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse rustc target names #1413

Merged
merged 12 commits into from
Mar 13, 2025
Merged

Parse rustc target names #1413

merged 12 commits into from
Mar 13, 2025

Conversation

madsmtm
Copy link
Collaborator

@madsmtm madsmtm commented Feb 24, 2025

Parse target names instead of using a pre-generated list. Allows target names like those used by certain Linux distributions, even when compiling outside build.rs.

I feel fairly confident that the target parsing is complete, and that the parsing approach taken (described in the code, but roughly "parse arch, then env/ABI, then OS, and lastly vendor") is correct.
Unfortunately, since this information is now parsed instead of pre-generated, we cannot just update the information automatically with a PR. Instead, I've added a CI step that opens an issue comment in #1426 whenever a nightly rustc target names fails to parse correctly. This isn't as nice, but it's at least workable.

Note that ensuring we construct the same LLVM triples as rustc is a lot more work; I've started on it a little bit, but it will have to be postponed (it probably also requires a lot of tweaks to rustc to really be nice). So for now, I opted to keep a pre-generated list of rustc to LLVM triples.

Fixes #1297.
Fixes #1317.
Fixes #1405.
Fixes #1407.

But keep using a generated LLVM/Clang triple mapping for now.
@NobodyXu
Copy link
Collaborator

Hmmm the solution here is more complex than I'd imagined.

I was thinking of simply splitting target, and replace alpine/* with unknown if os is linux

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

I was thinking of simply splitting target, and replace alpine/* with unknown if os is linux

Yeah, and that approach would solve the Alpine/Chimera issues, but won't solve the problem people have otherwise been having regarding Rust bootstrapping and custom/new targets, see rust-lang/rust#137064 / rust-lang/rust#137460 (bootstrap has downgraded to cc v1.1.22 because of this, which breaks a lot of other things).

Personally, I'm still a little bit on the fence about this, it'd be really nice if we could just only support upstream targets. But ultimately, I don't think that's a good decision for the Rust project as a whole.

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

WDYT @ChrisDenton? I get the feeling you're more in touch with other rustc devs than I, what's your general take?

@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

For completeness, I'll note that there are other options, I just dislike them even more than this:

  • Bootstrap sets TARGET, CARGO_CFG_TARGET_ARCH, CARGO_CFG_TARGET_VENDOR, CARGO_CFG_TARGET_OS, CARGO_CFG_TARGET_ENV and CARGO_CFG_TARGET_ABI before invoking cc.
  • We use the pre-generated list, and fall back to very simple "split into three/four parts, and assume arch-vendor-os/arch-vendor-os-env" if not available there.

@NobodyXu
Copy link
Collaborator

but won't solve the problem people have otherwise been having regarding Rust bootstrapping and custom/new targets, see rust-lang/rust#137064 / rust-lang/rust#137460 (bootstrap has downgraded to cc v1.1.22 because of this, which breaks a lot of other things).

I got it, but it feels like this PR has hard coded so many stuff and make maintaining harder

I still prefer some of the information to be generated from rustc or somewhere else instead of hardcoding everything, maybe we can modify gen-target-info to check env/abi etc for non-straight forward mappings and put them in a sorted array?

@ChrisDenton
Copy link
Member

From a rustc perspective, we do want cc-rs to work with new/custom targets as well as possible. Though that doesn't necessarily mean accepting any arbitrary target string. There could be at least some rules if it helps.

Also there are perhaps some optimizations here. How much of the target info do we actually use? I suspect we only care about the targets we have special cases for, e.g. targert_os=="windows" etc.

Discerning the llvm target is a bit trickier. Though I think in part it's made worse by rustc itself not always being consistent (or even accurate, particularly for tier 3 targets). That is something that could be improved on the rustc side.

I'd also add that llvm deals with parts of the triple it doesn't understand by replacing them with "unknown". I'm not sure if we can/should take advantage of that but I thought I should note it.

madsmtm and others added 2 commits February 28, 2025 11:48
Co-authored-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
Co-authored-by: Jiahao XU <30436523+NobodyXu@users.noreply.github.com>
@madsmtm
Copy link
Collaborator Author

madsmtm commented Feb 28, 2025

I got it, but it feels like this PR has hard coded so many stuff and make maintaining harder

I don't like it either, but it really seems like the only option to me.

I still prefer some of the information to be generated from rustc or somewhere else instead of hardcoding everything, maybe we can modify gen-target-info to check env/abi etc for non-straight forward mappings and put them in a sorted array?

Hmm, not a bad idea, though it wouldn't help to reduce the majority of the work here, it would only really help for this part of the parsing (e.g. parse_arch is still necessary, as is parse_envabi).

The bigger problem is that "non-straight forward mappings" is hard to define, since the mapping isn't defined at all!

From a rustc perspective, we do want cc-rs to work with new/custom targets as well as possible. Though that doesn't necessarily mean accepting any arbitrary target string. There could be at least some rules if it helps.

I doubt cc-rs has the sway to say no to a proposed target name, that's the purview of t-compiler, no? But yes, I'd really like some rules for target names!

Also there are perhaps some optimizations here. How much of the target info do we actually use? I suspect we only care about the targets we have special cases for, e.g. targert_os=="windows" etc.

Probably true, I think target_env isn't really used for anything other than on Windows (msvc/gnu) and for musl atm. But we'll likely need all the information to generate the right LLVM triple, and I suspect it'll be useful for the CROSS_COMPILE prefix too in the future.

Discerning the llvm target is a bit trickier. Though I think in part it's made worse by rustc itself not always being consistent (or even accurate, particularly for tier 3 targets). That is something that could be improved on the rustc side.

Indeed, and I plan to do that in the future (but wanted to land this PR first).

@ChrisDenton
Copy link
Member

I doubt cc-rs has the sway to say no to a proposed target name, that's the purview of t-compiler, no? But yes, I'd really like some rules for target names!

It has as much sway as the rest of the project, The best thing would be to make an MCP (or maybe it'd require an RFC) to specify some constraints we'd want or otherwise define how to choose target tuples for new targets. Otherwise it would be a case of monitoring new target MCPs and raising objections if their naming isn't great (which would be more hassle at the end of the day).

@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 1, 2025

CC @workingjubilee, you've been fairly involved with rustc target names. Could you verify my assumptions about the target names (and if you have time, look over the parsing code)?

Core assumptions about rustc target names:

  • Each target name has 2, 3 or 4 components.
  • 2-components names are of the format arch-os.
  • 3-components names are of either the format arch-vendor-os or arch-os-env+abi.
    • To differentiate these, we inspect the last component, and try to parse the env/ABI - and if that fails, assume that the target is of the format arch-vendor-os.
    • I guess we could try to parse the OS instead, but the list of OSes is long and may be changed by custom toolchains, so that seems more likely to cause problems? Besides, we already have logic to split the env/ABI, might as well re-use it for this.
  • 4-components names are of the format arch-vendor-os-env+abi.

@madsmtm madsmtm changed the title Parse target triples Parse rustc target names Mar 1, 2025
jhpratt added a commit to jhpratt/rust that referenced this pull request Mar 4, 2025
…ieyouxu

Set `target_vendor = "openwrt"` on `mips64-openwrt-linux-musl`

OpenWRT is a Linux distribution for embedded network devices. The target name contains `openwrt`, so we should set `cfg(target_vendor = "openwrt")`.

This is similar to what other Linux distributions do (the only one in-tree is `x86_64-unikraft-linux-musl`, but that sets `target_vendor = "unikraft"`).

Motivation: To make correctly [parsing target names](rust-lang/cc-rs#1413) simpler.

Fixes rust-lang#131165.

CC target maintainer `@Itus-Shield`
tgross35 added a commit to tgross35/rust that referenced this pull request Mar 4, 2025
…ieyouxu

Set `target_vendor = "openwrt"` on `mips64-openwrt-linux-musl`

OpenWRT is a Linux distribution for embedded network devices. The target name contains `openwrt`, so we should set `cfg(target_vendor = "openwrt")`.

This is similar to what other Linux distributions do (the only one in-tree is `x86_64-unikraft-linux-musl`, but that sets `target_vendor = "unikraft"`).

Motivation: To make correctly [parsing target names](rust-lang/cc-rs#1413) simpler.

Fixes rust-lang#131165.

CC target maintainer ``@Itus-Shield``
@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 13, 2025

Update: I've now filed rust-lang/compiler-team#850, to try to make rustc target triples more consistent and easier to parse.

I'd still prefer to not have to call out to separate processes to determine information like this, but I guess if we can avoid it in build.rs, I'd probably be okay with invoking $RUSTC --print cfg --target $target - for some reason I've been very convinced while making this issue that this wasn't an option, but now I don't see why it isn't, apart from performance? Maybe because it will require changes in bootstrap to make it correctly set $RUSTC, and other projects could similarly be slightly incorrect because they don't set that? Uncertain (EDIT: See #1430).

Regarding LLVM triples, Chris filed rust-lang/compiler-team#846 to request a flag that we could use to get the triple. Meanwhile, I've begun the work of normalizing a few LLVM triples (EDIT: see #1431), to make them more consistent with rustc's target information (I'd prefer that route, but it depends a bit on how consistent I can make the LLVM triples that rustc uses).

Copy link
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I thought it was waiting for review from @workingjubilee and others so I didn't comment

sorry as I should have provide this feedback much earlier

@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 13, 2025

Cool. I've opened #1430 to discuss using rustc --print cfg instead, and #1431 to track the construction of LLVM triples.

(And yeah, I think it was waiting, but I felt that two weeks was long enough ;) ).

@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 13, 2025

I merged the changes from #1429, but I explicitly did not translate wali to unknown; this means that the test-rustc-targets.yml workflow will fail, which will give me a chance to test it (I will activate it manually after this is merged, and then file a follow-up PR that resolves the issue).

@madsmtm madsmtm merged commit 968da78 into rust-lang:main Mar 13, 2025
73 checks passed
@madsmtm madsmtm deleted the target-parsing branch March 13, 2025 11:59
@q66
Copy link

q66 commented Mar 13, 2025

i can't quite read this whole thing, but how does the rust triple to llvm triple translation work? e.g. just translating to unknown generic triple would not be quite correct because e.g. for chimera triples, the rust triples strictly match the llvm triples (e.g. x86_64-chimera-linux-musl is for both rust and for llvm)

@madsmtm
Copy link
Collaborator Author

madsmtm commented Mar 13, 2025

i can't quite read this whole thing, but how does the rust triple to llvm triple translation work? e.g. just translating to unknown generic triple would not be quite correct because e.g. for chimera triples, the rust triples strictly match the llvm triples (e.g. x86_64-chimera-linux-musl is for both rust and for llvm)

We try a static list, and fall back to constructing the LLVM triple from the target info, see src/target/llvm.rs. See also #1431, the fall-back logic isn't quite correct yet, but it should be for x86_64-chimera-linux-musl.

@github-actions github-actions bot mentioned this pull request Mar 14, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 21, 2025
…try>

Bump boostrap `cc` to 1.2.17 and `cmake` to 0.1.54

The `cc` version in `bootstrap` was reverted down to 1.1.22 in rust-lang#137460 (previously at 1.2.0). The offending issue has since then been resolved in rust-lang/cc-rs#1413, and a new version of `cc` has been released in rust-lang/cc-rs#1435, so let's try to update the version again.

See [the changelog](https://github.com/rust-lang/cc-rs/blob/d9dd20e376368c7535f6ef89b809098f5f203c1a/CHANGELOG.md) for exact details on what has changed here.

r? jieyouxu who tried this last in rust-lang#137022.
`@rustbot` label T-bootstrap
try-job: dist-apple-various
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants