Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type_id is not sufficiently collision-resistant #129014

Open
RalfJung opened this issue Aug 12, 2024 · 7 comments
Open

type_id is not sufficiently collision-resistant #129014

RalfJung opened this issue Aug 12, 2024 · 7 comments
Labels
C-bug Category: This is a bug. C-discussion Category: Discussion or questions that doesn't represent real issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 12, 2024

This is a re-post of #10389, attempting to summarize the current state of the discussion since that issue had too many comments to still be useful.

The soundness of functions like downcast relies on the type_id of two different types never being equal. Currently, the type_id is a 128-bit hash of the full type identity, computed specifically via SipHash-1-3 with an all-zero key. This is not a strong enough hash function for this purpose.

The lang team decided that relying on a full (non-truncated) cryptographic hash is fine -- we don't have to guarantee soundness against an infinite-resource attacker that can generate collisions in cryptographic hash functions.
However, SipHash even in its default configuration (SipHash-2-4) is not a cryptographic hash, as clarified by its author1 -- it is a pseudo-random function (PRF); that means it assumes a secret key, but Rust hard-codes an all-zero key and in fact has no way to keep a key secret. By standards for cryptographic hash functions, SipHash-2-4 with an all-zero-key is weaker than MD5, and generating a collision for that is pretty easy these days. SipHash-1-3 is even weaker, we should thus expect it to be a matter of hours to create a collision, if someone really tried.

Generating a concrete example of an unsoundness from that is a bit more tricky since one would have to find a Rust type generating the collision, but it seems fairly clear that the bar of "full (non-truncated) cryptographic hash" is not met by the current type_id implementation. The point of this issue is to determine how the compiler implementation can best satisfy the soundness expectation set by the lang team.

If you instead want to argue that the lang team should change its mind, please open a new issue and gather arguments in favor of that position, so that a summary of all the arguments for either option can be brought to the lang team for discussion. Also, this issue is only about type_id. According to this comment, the only other case where a hash collision could cause unsoundness is with incremental builds; see #129016 for the issue tracking that. All other hashes are actively checked for collisions by Rust. That said, it is not clear to me whether that also covers dynamic linking scenarios -- if someone knows about hash-related soundness concerns for that case, please file a new issue.

Possible solutions

We should do one of the following:

  1. switch to a stronger hash function, or
  2. switch to a different scheme that doesn't rely on collision-resistance of the hash function.

Worth noting is that a cryptographic hash function these days must output at least 256 bits to be considered worth its salt. The lang team has not spelled out their exact definition of "cryptographic hash function", but the standard definition includes collision resistance, and in fact collisions are exactly what we are most worried about here, so it seems reasonable to assume that this is part of the lang team intent. A lang team member mentioned BLAKE3 as a candidate, further corroborating this claim. So we'd have to switch to BLAKE3 or SHA2 or something like that.

Therefore, in both cases will we need more than the currently available 128 bits of a TypeId to obtain the desired level of collision resistance. To avoid further increasing the size of TypeId, the most likely scheme would be to include a pointer in TypeId that points to the remaining information -- either a 256-bit hash, or a string (null-terminated or with leading length information, to obtain a "thin" pointer), or something like that.

#95845 explores what this could look like without depending on a hash function: TypeId becomes a pair of a (not-soundness-critical) hash and a pointer. If the hashes are different, we can quickly conclude "inequal". If the pointers are equal, we can quickly conclude "equal". (Many linkers should be able to deduplicate the data the pointers point to, making the "equal" optimization even work cross-crate in many cases.) The same approach could easily be used without a full type name, by storing a pair of a low-quality hash for quick "inequal" checks and a high-equality hash behind the pointer to provide soundness.

Relying on the linker to deduplicate is unlikely to work since not all platforms have linkers that can do that (in particular for dynamic linking). C++ has a similar problem to solve and, at least on Windows, seems to do something like the first of these options: compare pointers, and fall back to comparing strings.

Therefore, the next step here seems to be for the compiler team to decide whether to use a hash function or to include the type name in the binary. The latter has an existing implementation, but there were concerns about leaking type names (and the paths leading to those types) in the binary.

Footnotes

  1. In the original paper, they even write: "We comment that SipHash is not meant to be, and (obviously) is not, collision-resistant."

@RalfJung RalfJung added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Aug 12, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 12, 2024
@jieyouxu jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 12, 2024
@the8472
Copy link
Member

the8472 commented Aug 12, 2024

If you instead want to argue that the lang team should change its mind, please open a new issue and gather arguments in favor of that position

I have opened #129030 which somewhat goes in that direction, though from a different angle.

@Noratrieb Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 13, 2024
@apiraino
Copy link
Contributor

Discussed in T-compiler triage meeting on Zulip. The topic needs more time, is worth a dedicated design meeting. Not only about the options we have but also about defining the threat model we are trying to protect against.

@briansmith
Copy link
Contributor

All other hashes are actively checked for collisions by Rust.

When a collision is detected, compilation will fail, right? If so, the issue is reduced from unsoundness to denial of service; that's definitely a less-serious issue but still problematic.

@RalfJung
Copy link
Member Author

This issue is about type_id; if you are concerned about other hashes please open a new issue or discuss that in #129030.

@briansmith
Copy link
Contributor

So we'd have to switch to BLAKE3 or SHA2 or something like that.

The discussion from the Bazel team regarding switching from SHA-256 to BLAKE3 maybe be useful. On modern x86-64 and ARMv8 hardware the fastest SHA-256 implementations that use SHA-256-specific CPU instructions should be notably faster than BLAKE3.

@apiraino
Copy link
Contributor

Filed a t-compiler design meeting to discuss this as the topic deserves. @RalfJung feel free to add some more context if you wish

@rustbot label -I-compiler-nominated

@rustbot rustbot removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Aug 22, 2024
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +C-discussion

@rustbot rustbot added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. C-discussion Category: Discussion or questions that doesn't represent real issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants