-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New rustc and Cargo options to allow path sanitisation by default #3127
Conversation
This seems reasonable to me, including the step of (eventually) making this default to enabled in release builds. That'll allow people to turn it on in debug if they need to ship debug builds, or conversely, turn it off in release mode if they have a specific need for absolute paths in release builds. |
Finally, soon i can remove my 2km long |
text/3127-trim-path.md
Outdated
Some interactions with compiler-intrinstic macros need to be considered, though these are entirely down to `rustc`'s implementation of | ||
`--remap-path-prefix`: | ||
1. Path (of the current file) introduced by [`file!()`](https://doc.rust-lang.org/std/macro.file.html) *will* be remapped. **Things may break** if | ||
the code interacts with its own source file at runtime by using this macro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want an example of things breaking to link to: rust-num/num-traits#139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just not apply remapping when building build.rs
, since it'll be up to the build script to maintain privacy and reproducibility anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If file!()
is relative to CARGO_MANIFEST_DIR
, then build.rs
should be able to fix these paths, and it's not even difficult: manifest_dir.join(relative_or_absolute_file_path)
just works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix in num
was to just emit a relative path, so I assume Cargo already treats relative paths as relative to CARGO_MANIFEST_DIR
(though I see that isn't documented on https://doc.rust-lang.org/nightly/cargo/reference/build-scripts.html#rerun-if-changed).
text/3127-trim-path.md
Outdated
E.g. `/home/username/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs` -> | ||
`/rustc/1.52.1/library/core/src/result.rs` | ||
2. Path to the working directory will be replaced with `.`. E.g. `/home/username/crate/src/lib.rs` -> `./src/lib.rs`. | ||
3. Path to packages outside of the working directory will be replaced with `[package name]-[version]`. E.g. `/home/username/deps/foo/src/lib.rs` -> `foo-0.1.0/src/lib.rs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somewhat assumes uniqueness of a crate name + version, but it's possible to have two crates with the same name and version being included in the build from different registries, or git repos, or path sources (though cargo has issues with some overlaps).
foo v0.1.0 (/tmp/tmp.NPvWf4JOsU/foo)
├── futures-core v0.3.15 (/tmp/tmp.NPvWf4JOsU/bar)
└── futures-core v0.3.15
One option would be to use the "hashed index url" as a leading directory, so in this case assuming foo
is from crates.io it would be github.com-1ecc6299db9ec823/foo-0.1.0/src/lib.rs
. Git dependencies similarly have a "hashed git url" that could be used, e.g. futures-rs-b0bea7d4c3745ece
for https://github.com/rust-lang/futures-rs
.
Path dependencies I'm not sure about, it could be possible to generate a similar hash based on the actual path, but while that would alleviate privacy issues it would still have reproducibility problems. It's (currently) not possible to have two path dependencies with overlapping name+version, so maybe just a leading segment such as path/
to distinguish them from non-path dependencies would work?
error: package collision in the lockfile: packages bar v0.1.0 (/tmp/tmp.NPvWf4JOsU/bar) and bar v0.1.0 (/tmp/tmp.NPvWf4JOsU/bar2) are different, but only one can be written to lockfile unambiguously
Alternatively, this could be declared as a non-issue since it will probably never actually occur, but that should be documented so when it does happen to someone they have a better chance of figuring out what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The referenced issue rust-lang/rust#40552 has a more detailed remapping scheme
- For code from crates.io or mirrors, the root name is [crates.io]. Example:
C:\Users\username.cargo\registry\src\github.com-1ecc6299db9ec823\winapi-0.2.8\ will be mangled to [crate.io]\winapi-0.2.8\- For code from remote git repository, the root name is [username@git server name]. Example: https://github.com/rust-lang/rust will be manged to [rust-lang@github.com]/rust
- For code from local filesystem, the root name is [local]. Example: D:\workspaces\foobarng\ will be mangled to [local]\foobarng\
- For code from the crate itself, the root name is crate. Example: C:\Users\username\Documents\foobar\ will be mangled to [crate]\foobar\
We could take a similar approach here.
There were some discussion about what happens when two paths are mapped to the same thing by --remap-path-prefix
: rust-lang/rust#83813 (comment). For reproducibility reasons the stable hash (used to generate the stable crate hash) of a source path uses the remapped path if available, so there is a chance of collision. There were some discussion around if we should simply error out when this happens. We probably should have this in mind here.
With regard to one of the stated drawbacks:
This seems like a fairly concerning drawback to me. I do a lot of profiling with binaries compiled in release mode, and find it extremely useful that the source code is displayed in the profiling tools. It sounds to me like what this is saying is that workflow will break, and I will have to know to disable
But debugging a release binary is not that uncommon in my experience. So it sounds to me like we are making that experience worse by default. I'm not sure that's the right trade off to make, although it is somewhat murky. |
Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files? I distribute executables, but keep dSYM bundles private. This way I can still symbolicate and debug crashes, but I'm not shipping bloated executables or exposing symbol information in the executable itself. Trimming of paths in the executables, like paths from |
I know some people use |
That seems like a really good plan. We could start by having trim-path available, and then enable trim-path by default on platforms where we have separated debug symbols by default. People can also still enable it themselves, and we can consider the tradeoffs for enabling it by default in release mode if not generating debug symbols. |
text/3127-trim-path.md
Outdated
|
||
1. Path to the source files of the standard and core library will begin with `/rustc/[rustc version]`. | ||
E.g. `/home/username/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs` -> | ||
`/rustc/1.52.1/library/core/src/result.rs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"x86_64-unknown-linux-gnu" seems like useful information.
It isn't sensitive info, and it probably won't break reproducibility, because changing the target will probably change the binary in other ways.
Can we add the target to the compiler version?
Would it also help to add the rustc commit hash to the compiler version?
(We don't want the build date, because that would break reproducibility.)
text/3127-trim-path.md
Outdated
``` | ||
|
||
With `trim-path` option enabled, the compilation process will not introduce any absolute paths into the build output. Instead, paths containing | ||
certain prefixes will be replaced with something stable by the following rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules don't seem to cover modules with the #[path]
attribute which might point to a file outside the working directory.
https://doc.rust-lang.org/reference/items/modules.html#the-path-attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the path is still inside the current project directory (where Cargo.toml
lives), it will still be caught by the "Path to packages outside of the working directory" rule. If it points to some arbitrary location on the local file system then we can't reasonably expect Cargo to automatically sanitise it.
Co-authored-by: teor <teor@riseup.net>
Something that came up in today's @rust-lang/cargo meeting: we should set a variable |
Reading over this one idea I had is that this could be a nice opportunity to leverage community norms and such to make "linkifying" backtraces more easy. For example given a path coming out of a Rust backtrace it'd be awesome to be able to generate an HTTP URL to the exact copy of the source code (with a line number) that the code references for some paths. An RFC would be, in theory, a great place to specify that two canonical sources of Rust code have well-defined paths in "remapped debuginfo situations":
I think it would be a lot more difficult to automatically translate paths to URLs for other sources of code (e.g. general git repositories, path dependencies, etc), but if we could specify what happens to rustc and crates.io-sourced code, that'd be nice! We could then leave a different default remapping source for all other crates that don't fall in the crates.io category, such as Also, when reading this RFC, I think that we'll also want |
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@alexcrichton One thing I'm slightly annoyed about a path-like prefix (which was proposed in the pre-RFC) is the differences between path separators on Windows vs *nix. That's purely aesthetic and we could format it differently depending on the host, or we could simply not bother and say Regarding linking sysroot and crates.io sources to a URL, I assume that's something the backtrace library can do at runtime? I'm not quite sure how |
trim-path
to sanitise absolute pathstrim-paths
to sanitise absolute paths
My general idea is that any project which wants to linkify backtraces in Rust code can automatically linkify Rust standard library source code and dependencies coming from crates.io since those are hosted at canonical locations. Cargo is uniquely positioned with a flag like this to set precedent for how everyone should do this, so everyone canonicalizes around the same way that filenames look. For issues like path separators, we'd canonicalize on one style. For windows/unix, we'd still just canonicalize on one style (the files won't exist locally anyway). Some of this may require rustc support rather than "purely just a flag from Cargo", but my point is more general in that this is an opportunity ripe for the picking in an RFC like this, I think. Note that I'm not expecting libraries like For |
Coming back to this after a month of busy uni stuff - GCC and Clang both have three separate path remapping flags:
Seeing the feedbacks to this RFC, I think people want granular treatments to the two different types of paths (debuginfo vs macro expansion) similar to what GCC and Clang has been doing for a while. If rustc were to have both
And the defaults under
This way only the paths that will be contained in the binary will be affected, leaving debuginfo potentially untouched unless they are also embedded in the binary. I think this is nice and neat and should be the end product we are looking for. But |
cant tell from thread. will be explicit opt out (always trim in --release) or opt in (always need flag)? seems most add trim-path in go anyway so maybe make sense to be default in --release and not need extra flag always. except if paths are want for some reason |
Heuristically undo path prefix mappings. Because the compiler produces better diagnostics if it can find the source of (potentially remapped) dependencies. The new test fails without the other changes in this PR. Let me know if you have better suggestions for the test directory. I moved the existing remapping test to be in the same location as the new one. Some more context: I'm exploring running UI tests with remapped paths by default in rust-lang#105924 and this was one of the issues discovered. This may also be useful in the context of rust-lang/rfcs#3127 ("New rustc and Cargo options to allow path sanitisation by default").
Heuristically undo path prefix mappings. Because the compiler produces better diagnostics if it can find the source of (potentially remapped) dependencies. The new test fails without the other changes in this PR. Let me know if you have better suggestions for the test directory. I moved the existing remapping test to be in the same location as the new one. Some more context: I'm exploring running UI tests with remapped paths by default in rust-lang/rust#105924 and this was one of the issues discovered. This may also be useful in the context of rust-lang/rfcs#3127 ("New rustc and Cargo options to allow path sanitisation by default").
@ehuss Apologies for the silence. I've been pretty busy over the past 6 months but now I'm freed up a lot more, so I'll pick this up again and respond more promptly so people don't have to build up contexts repeatedly. Regarding the rationale on the available options, I've simply bucketed all sources of absolute paths into several composable groups. They are very granular for the purpose of use-case discovery. I cannot provide use-cases on all of the individual options at this stage because I don't know them yet. I did make it clear in the RFC that not all options are aiming to be stabilised. In fact, I fully anticipate most of the I'll continue the discussion regarding |
I'm not going to block on this point, but I think it is strange to propose changes that we expect to discard. For example, if there are no use cases for I would ask that when possible, add a sentence as to why the option exists. Perhaps in the form "This option would be used for situations when …". It looks like |
Nudge @Eh2406 @michaelwoerister @nikomatsakis @pnkfelix @wesleywiser for concerns or approval. The voting comment already got collapsed #3127 (comment) 😅 |
|
||
- `macro` - apply remappings to the expansion of `std::file!()` macro. This is where paths in embedded panic messages come from | ||
- `diagnostics` - apply remappings to printed compiler diagnostics | ||
- `unsplit-debuginfo` - apply remappings to debug information only when they are written to compiled executables or libraries, but not when they are in split debuginfo files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidtwco, do you know if we can actually implement this (when using the LLVM backend)? I.e. can we control what paths show up in what context? As far as I know, we produce a single LLVM metadata description and then LLVM takes care of splitting things apart, right?
Checked my box. Thanks for keeping at this so persistently, @cbeuw! |
Nominating this RFC for T-compiler, let's reload the context here and see if it can be approved (voting comment) |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Huzzah! The @rust-lang/compiler and @rust-lang/cargo teams have decided to accept this RFC. To track further discussion, subscribe to the tracking issue here: |
IRLO pre-RFC thread: https://internals.rust-lang.org/t/pre-rfc-cargo-profile-setting-to-sanitise-host-dependent-absolute-paths-enabled-by-default-for-release-builds/14504
Relevant GitHub issue and discussions: rust-lang/rust#40552
Rendered