Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would you consider a new informational advisory class, "distributes-executable"? #1737

Open
mulkieran opened this issue Aug 3, 2023 · 46 comments

Comments

@mulkieran
Copy link
Contributor

The category would be something like "distributes-executable". This came up for me recently because of the issue here: serde-rs/serde#2538, which caused me difficulty in packaging. The particular difficulty here was that our project distributes a vendor tarfile upstream. Then, downstream, the tarfile is used by the packaging system. It would have been helpful to be forewarned about what was going to turn out to be a packaging problem downstream. As it was, the remediation had to occur in the downstream packaging, (or there would have to be a new upstream release).

@alex
Copy link
Member

alex commented Aug 3, 2023 via email

@tarcieri
Copy link
Member

tarcieri commented Aug 3, 2023

For examples of additional categories we might consider adding, you can look at the CWE database: https://cwe.mitre.org/

@mulkieran
Copy link
Contributor Author

mulkieran commented Aug 3, 2023

This seems to go beyond the scope of what's appropriate for a security advisory DB IMO.

advisory-db has a category "unmaintained", which I think goes even further beyond security than my proposed category. I think its mission has already crept, witness the "unmaintained" category, and it might be reasonable for it to creep in this direction, also.

@tarcieri
Copy link
Member

tarcieri commented Aug 3, 2023

unmaintained isn't a "category", it's a class of informational advisory, which is a non-security advisory

@mulkieran mulkieran changed the title Would you consider a new category, "distributes-executable"? Would you consider a new informational advisory class, "distributes-executable"? Aug 3, 2023
@mulkieran
Copy link
Contributor Author

unmaintained isn't a "category", it's a class of informational advisory, which is a non-security advisory

Thanks! I've updated the title appropriately.

@tarcieri
Copy link
Member

tarcieri commented Aug 3, 2023

The existing informational = "notice" class may be more appropriate for this use case

@mulkieran
Copy link
Contributor Author

Ok! I've given that a try.

@jirutka
Copy link

jirutka commented Aug 12, 2023

This seems to go beyond the scope of what's appropriate for a security advisory DB IMO.

I disagree; downloading some binary code from the internet and running it during build without the user’s knowledge and consent is a serious security issue that opens the door to various attacks.

@jhpratt
Copy link
Contributor

jhpratt commented Aug 18, 2023

I've strongly disagree with the decision to close this. Distributing an executable that is run without the user knowing is inherently a security vulnerability. Doubly so when the executable cannot be reproduced, as is the case with serde.

@Shnatsel
Copy link
Member

Shnatsel commented Aug 18, 2023

This is being actively discussed on the PR adding the actual advisory: #1738

Reopening since there is actually work being done on an instance of this, which may set a precedent.

@Shnatsel Shnatsel reopened this Aug 18, 2023
@dtolnay
Copy link
Contributor

dtolnay commented Aug 19, 2023

Thanks for reopening. I think this is a worthwhile proposal.

Does the executable in "distributes-executable" refer specifically to "a file that can immediately be executed", or generally to "artifact containing executable machine code"?

For example which of the following would it be useful for such a notice to cover?—

  1. A shell script containing ASCII shell code, with executable bit set, run at build time
  2. A precompiled dynamically linked .so / .dll that gets linked into the target artifact, or into a build script or macro (https://docs.rs/crate/windows_x86_64_msvc/0.48.5/source/lib/windows.0.48.5.lib)
  3. Any other non-human-readable artifact with security ramification potential

From @mulkieran's description at the very top of this issue, it's not clear which one might have been their immediate motivation. The discussion of vendor tarfiles makes it sound like number 1 is the closest, and would have been equally problematic and worth notifying about?

Others particularly on the PR seem more concerned with number 3.

Obviously number 2 has been standard practice for a long time.

If "distributes-executable" is intended to encompass more than one of these, it would be a good idea to affirm the choice to use a single advisory class for them all, or clarify the names to distinguish what different people are interested in knowing.

@kayabaNerve
Copy link
Contributor

I'd appreciate an informational advisory for any non-transparent piece of code placed on the system. This would not require it being labelled executable, yet would include everything from executables to libraries to base64-encoded bash in a non-executable txt file.

@BlackHoleFox
Copy link
Contributor

  1. A precompiled dynamically linked .so / .dll that gets linked into the target artifact, or into a build script or macro (https://docs.rs/crate/windows_x86_64_msvc/0.48.5/source/lib/windows.0.48.5.lib)

The .lib files used in the windows crates aren't executable code (that is an annoyingly different type of .lib), they're just import libraries that help the linker. Basically just the equivalent of Apple's .tbd definition files as another reference point. So vendoring something like it doesn't pose any theoretical risks to runtime execution, unlike a staticlib .lib or a .dll. Throw it into IDA for example, there's no executable code inside. Due to all that, windows is not really the best example of including precompiled libraries in the ecosystem.

@dtolnay
Copy link
Contributor

dtolnay commented Aug 19, 2023

I'm told that is not a useful distinction. One must do work to ascertain "doesn't pose any theoretical risks to runtime execution", because that is not possible to tell from the crate's human-readable source code in build.rs or lib.rs.


Screenshot from 2023-08-18 23-18-01

@ChrisDenton
Copy link

That said, it may be possible to automate scanning lib files to make sure they're not doing anything more than defining imports.

@dtolnay
Copy link
Contributor

dtolnay commented Aug 19, 2023

The purpose of the informational advisory would be to call attention that setting up such automation is necessary for this crate. No different from precompiled executable code where you may want automation to recompile from source and compare.

@Shnatsel
Copy link
Member

I wonder if a database is even a good solution for this.

We have a security advisory database because it's currently infeasible to reliably detect security issues in an automated manner. By contrast, distributing an executable should be reasonably easy to detect automatically and alert the users without waiting on humans to notice it and file an advisory. Human-filed advisories also invariably miss more obscure crates that humans didn't bother reviewing.

I believe an automated tool that detect binary blobs of executable types in the downloaded code is a better solution for this problem than a database of known instances of it. https://github.com/EmbarkStudios/cargo-deny springs to mind.

@Nemo157
Copy link
Contributor

Nemo157 commented Aug 19, 2023

I believe an automated tool that detect binary blobs of executable types in the downloaded code is a better solution for this problem than a database of known instances of it. https://github.com/EmbarkStudios/cargo-deny springs to mind.

See EmbarkStudios/cargo-deny#43

@joshka
Copy link

joshka commented Aug 20, 2023

For examples of additional categories we might consider adding, you can look at the CWE database: https://cwe.mitre.org/

Some CWEs that are possibly relevant:

@kayabaNerve
Copy link
Contributor

CWE-348 sounds exactly like the serde discussion.

@traviscross
Copy link

Here's the relevant Fedora policy:

https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#prebuilt-binaries-or-libraries

(Other distributions have roughly similar policies and cultural tendencies.)

This matches when I would expect to see an advisory.

@traviscross
Copy link

I wonder if a database is even a good solution for this.... I believe an automated tool that detect binary blobs of executable types in the downloaded code is a better solution for this problem than a database of known instances of it. https://github.com/EmbarkStudios/cargo-deny springs to mind.

Automation is always good, but there are many ways for pre-built binary code to be included (even non-maliciously) that are difficult to check for automatically. Also, if the local policy is to reject pre-built binaries and to abort if the dependency tree would contain them, it's preferable to do this before downloading the problematic binaries.

The hope, I think, is that this practice is or will become uncommon enough that maintaining a database is not too onerous.

@RalfJung
Copy link
Contributor

I disagree; downloading some binary code from the internet and running it during build without the user’s knowledge and consent is a serious security issue that opens the door to various attacks.

Looking closer into the issue, I don't think that's what is happening here? If I understand the code in lib_precompiled.rs correctly, the binary is already part of the crate package. That's not great, but is is a lot better than if the binary was downloaded from the internet -- this way, the binary is part of the integrity checking (with checksums in lockfiles) and we can be sure all of us are getting the same binary. That's worth a lot.

So the issue really is that we cannot inspect what that binary does. "distributes-executable" doesn't quite capture that since arguably if it was a Python script, having the build script run that isn't all that different from just executing the build script itself. "distributes-executable-without-source" or "distributes-unauditable-executable" or so captures the issue better.

@soqb
Copy link

soqb commented Aug 20, 2023

@RalfJung please bare with me as i havent had the time to totally analyse the issue:

IIUC, the binary is embedded directly into the .crate file that is uploaded to crates.io. unlike traditional Rust dependencies, where code can be validated from either its repository or from its local distribution (unpacked .crate in ~/.cargo), in the serde_derive case, the source can only be verified online. there's no way to verify that serde is actually using the code that appears in the github repository and not some secret hack that publishes some other code to crates.io. we just have to trust @dtolnay and their password protection to not fuck us completely.

@ChrisDenton
Copy link

@soqb that is why people are discussing the feasibility of reproducible builds. You can verify by running the same steps and getting the same output.

@soqb
Copy link

soqb commented Aug 20, 2023

@soqb that is why people are discussing the feasibility of reproducible builds. You can verify by running the same steps and getting the same output.

i think checksums are too volatile for this. on a local machine the environment might just be too different to reasonably reproduce the exact same binary. (is this true?- i don't really have any knowledge in this area)

users could theoretically use manual methods to test for every vulnerability/attack surface in the database. this project is simply a best-effort trusted global cache of security concerns.

@RalfJung
Copy link
Contributor

RalfJung commented Aug 20, 2023

IIUC, the binary is embedded directly into the .crate file that is uploaded to crates.io.

Yes. Exactly like the source code for the derive macro.

there's no way to verify that serde is actually using the code that appears in the github repository

Correct. And I said exactly that in my comment.

But above someone claimed that serde would be "downloading some binary code from the internet and running it during build". That is definitely not happening. The only download that is happening is cargo downloading the crate file from crates.io, same as it does for all other crates. All code that is being run was downloaded (and its checksum verified) by cargo, same as for all other crates. I was merely correcting that false statement (which stood uncontested as far as I can see). I do still agree that there is a problem here, but the problem is less big than I originally thought.

Even if we don't like what serde is doing, we surely shouldn't spread or accept false claims about what it is doing.

@tarcieri
Copy link
Member

tarcieri commented Aug 20, 2023

Reviewing these CWEs I don't think any apply:

CWE-829: Inclusion of Functionality from Untrusted Control Sphere

This seems to be about embedding 3rd party code that winds up as trusted as the first-party program, and that was unexpected (i.e. the third party code wasn't properly reviewed or intended for that safety domain). But in this case, there is no third party code as both the source and binary originate with the author.

CWE-494: Download of Code Without Integrity Check
The product downloads source code or an executable from a remote location and executes the code without sufficiently verifying the origin and integrity of the code.

The binary is distributed with the crate and its integrity is checked in the same manner as the source code (SHA-256 digest in Cargo.lock)

CWE-348: Use of Less Trusted Source

The example is using an untrusted HTTP header instead of a trusted one. I don't think this applies.

@soqb
Copy link

soqb commented Aug 20, 2023

@RalfJung am i misunderstanding it, or is the checksum (at least sometimes) generated by crates.io based on purely the .crate that is uploaded? IIUC, the scope of the checksum is limited to verifying the continuous integrity of downloads from crates.io and makes no guarantee about how accurate crates.io is to the crate's code in its public repository.

@tarcieri
Copy link
Member

@soqb https://docs.rs allows you to view/link to canonical crate source code.

GitHub is a 3rd party service so it’s not possible to control what code is hosted there.

@soqb
Copy link

soqb commented Aug 20, 2023

@soqb https://docs.rs allows you to view/link to canonical crate source code.

nothing is "canonical" about this source code. in serde_derive's case, all the source is included, even though the source we unpack from the .crate file is demonstrably not used since the precompiled binary may be used instead. on targets where the binary is used the source code shown on docs.rs has no relation to the code that is executed.

GitHub is a 3rd party service so it’s not possible to control what code is hosted there.

yes, and in serde_derive's it's also not possible to verify the code locally from the .crate file and therefore we must simply trust the integrity of @dtolnay's pipeline.

@RalfJung
Copy link
Contributor

@soqb yes it's a general problem that the contents of the crates file don't have to have anything to do with the github repo. That applies to all crates though. It also completely misses the point I have been making.

@soqb
Copy link

soqb commented Aug 20, 2023

@soqb yes it's a general problem that the contents of the crates file don't have to have anything to do with the github repo. That applies to all crates though. It also completely misses the point I have been making.

that's not the issue here, though im glad we're on the same page. there is no way for a consumer of the serde_derive crate to be sure at all what the crate might do when we use its macros. that doesn't apply to crates without embedded binary dependencies (since we have access to their source and compile them locally) and thus somewhere we should warn users.

@GoldsteinE
Copy link

GoldsteinE commented Aug 20, 2023

@soqb docs.rs correctly shows that serde_derive uses precompiled binary: https://docs.rs/serde_derive/1.0.183/src/serde_derive/lib_precompiled.rs.html#37

In fact, lib_from_source.rs is not shown on docs.rs, identifying that it’s not used on x86_64-linux-gnu.

@soqb
Copy link

soqb commented Aug 20, 2023

@soqb docs.rs correctly shows that serde_derive uses precompiled binary: https://docs.rs/serde_derive/1.0.183/src/serde_derive/lib_precompiled.rs.html#37

that's true. what it cannot tell us is what that binary does, and so the source on docs.rs incomplete and as such cannot at least in the conception that pertains to this issue be called "canonical".

@RalfJung
Copy link
Contributor

RalfJung commented Aug 20, 2023

though im glad we're on the same page.

I take it then you are agreeing with my comment? Your replies very much indicate that you think some claim I am making in that comment is wrong, but you are not providing any arguments that would carry evidence against anything I said (you are only bringing up arguments against things I never said -- for instance, I never said that we could know what that binary does, I only said that we know that everyone gets the same binary), so this is a confusing and frustrating interaction.

Please don't reply to people in a way that suggests they have made claims they never made.

@soqb
Copy link

soqb commented Aug 20, 2023

"downloading some binary code from the internet and running it during build"

this is what i have a problem with. you claimed serde_derive isn't vulnerable in this way, but if you don't disagree with anything i made in my most recent comment is this not still a potential attack surface? (even if only for dependants who don't have serde_derive pinned)

yes crates.io guaranteed everyone will get the same binary for a given version, but i never hinged my argument on that.

@traviscross
Copy link

Let's keep in mind the precedent we're setting. We might all trust dtolnay. But I doubt that we would act with such respect and restraint if very many other crate authors were to do this.

Pre-built binaries (from sources they have not already decided explicitly to trust for those) are a supply-chain security policy violation for many users (and distributors). Among other things, it breaks the auditing model of cargo-vet and other tools.

We should ask, if other crate authors started to do this, would we consider it reasonable for users to have to recheck when updating Cargo.lock all added or updated direct and transitive dependencies for pre-built binaries, and then have to try to reproduce each from source to verify its integrity with respect to the source code?

While dtolnay shipping a pre-built binary may not in-and-of-itself be a major security issue, the precedent we set as a community could be. Ensuring supply-chain security and auditing dependencies with crates.io is already difficult. If it became common for crate authors to ship pre-built binaries, it would become completely infeasible.

This is bigger than this one crate. We should set aside our trust for dtolnay and consider the effect of our actions in general.

@kayabaNerve
Copy link
Contributor

This isn't to mention the complexity of checking each binary especially if there is no standard on how to. I have an environment I'm treating as security critical, and am already planning to check all crates used weren't published dirty, and do match a publicly available commit on the stated repo (barring documented exceptions). From there, the natural next step would be to check the commit is signed.

With binaries, especially ones not included on Git, this model becomes botched as all publications are dirty. Even if they are included on Git, I have to look up how each party implemented reproducible builds and add such pipelines to my integrity checks.

At least being aware of the crates which pull this mess down onto me, especially ones which only newly introduce this mess, would be greatly appreciated.

@RalfJung
Copy link
Contributor

RalfJung commented Aug 21, 2023

yes crates.io guaranteed everyone will get the same binary for a given version, but i never hinged my argument on that.

If you didn't then you had no reason to quote me in your reply. Others here did. I was replying to them. If you didn't think that serde was downloading code from the internet then you didn't have any reason to reply to my poist.

I never disagreed with anything you said, except with your framing your statements as if they would somehow be counterarguments to mine. Someone (not you!) said "serde did A", I replied "no serde is not doing A", and ever since then you kept saying "but serde is doing B" with A != B. You can see how that's not helpful, right? Just stop it, please.

I will unsubscribe and not reply here further. This thread is not constructive.

@ijackson
Copy link
Contributor

I have just filed #1750, about handling intentional but surprising crate behaviours.

@couchand
Copy link

Revisiting this after a bit of calm, I'd like to highlight these two comments from the conversation on #1738:

@LikeLakers2: The way I see it, an executable being in a crate where it doesn't need to be, is just as much of a security issue as an unmaintained library. That is to say, neither case directly implies a security issue, but both introduce uncertainty as to the safety of the crate.

@mulkieran: AFAIU the crate broke a security-related rule in a bunch of distributions and packagers had to seek work-arounds[1][2]. These rules are generally based on the assumption that a crate that packages an executable is not secure. And this brings the subject right back to security again.

[1] https://src.fedoraproject.org/rpms/rust-serde_derive/c/689eea59064df8f2f15240c7464a91349e2d78d4?branch=rawhide
[2] https://src.fedoraproject.org/rpms/stratisd/c/928d848180d054ca81766bff4247dda192a33fc8?branch=rawhide

These seem like practical reasons to introduce an informational advisory of the type considered here. In particular, using the mentioned distribution and packaging security rules to guide the criteria would help make it actionable.

@tarcieri
Copy link
Member

In this particular case, the binary has been mostly-reproduced by third parties with innocuous discrepancies. I don't see anything actionable here, especially since the crate has stopped distributing the executable.

Completely unmaintained crates are concerning because if any security vulnerabilities were to be discovered, there is no one there to respond to them, making it an ongoing problem (or at least, potential problem). If someone were to show up and start maintaining them, we would withdraw such an advisory.

Given this issue seems to be addressed upstream, I see no reason to further pursue it. If something similar happens again, we can reconsider.

@LikeLakers2
Copy link

LikeLakers2 commented Nov 27, 2023

@tarcieri Responding since your comment seems to directly respond to parts of my comment that was quoted.

I find the results of reproducing the included binary to be irrelevant here. This is an advisory database: it advises people of a potentially precarious situation. Whether that be security vulnerabilities, or simply that a crate is unmaintained, it's not the database's role to make a determination on whether that situation is harmful or harmless. That is for the user to decide.

That's why I compared it to the unmaintained advisories - while it does invite concern, a crate being unmaintained does not necessarily imply a security concern. In much the same way, an executable inside a crate invites concern, but is not necessarily a security concern. Some users may consider both situations to be security concerns - or they may consider only one (or even neither) to be security concerns; in any case, that choice should be made by them, not us.

Still, if you want to wait until it comes up again (though I'm of the mind that such an advisory class should still be implemented for the future), I will not stop you.


P.S. While I say that the results of reproducing the binary are irrelevant here, I do still think distributes-executable advisories would benefit from listing whether the binary has been reproduced or not. This is an informational database, and if we can bring more information to the user about precarious situations, that's all the better.

@tarcieri
Copy link
Member

With a binary reproducibility story, the security implications of a binary are equivalent to those of the original source code it was built from, since reproducibility in effect proves equivalence between the binary and its source code.

An unreproducible binary is a mystery and so are its security implications, and that's much more concerning.

@LikeLakers2
Copy link

LikeLakers2 commented Nov 27, 2023

But some users consider included binaries to be a security concern, regardless of whether the binary is reproducible or not. It matters not if the binary is reproducible - what matters here is if the binary is included.

Plus, if the binary isn't reproducible, we'd be putting out an advisory anyways - just with the disadvantage of having put it out several days (or even weeks) after everyone already found out.

But if we still want to consider reproducibility when it comes to whether an advisory is relevant, then consider this: Put out a "distributes-unreproduced-executable" advisory immediately after the executable is discovered, then withdraw it once the executable has been reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests