type_id is not sufficiently collision-resistant #129014

RalfJung · 2024-08-12T16:59:33Z

This is a re-post of #10389, attempting to summarize the current state of the discussion since that issue had too many comments to still be useful.

The soundness of functions like downcast relies on the type_id of two different types never being equal. Currently, the type_id is a 128-bit hash of the full type identity, computed specifically via SipHash-1-3 with an all-zero key. This is not a strong enough hash function for this purpose.

The lang team decided that relying on a full (non-truncated) cryptographic hash is fine -- we don't have to guarantee soundness against an infinite-resource attacker that can generate collisions in cryptographic hash functions.
However, SipHash even in its default configuration (SipHash-2-4) is not a cryptographic hash, as clarified by its author¹ -- it is a pseudo-random function (PRF); that means it assumes a secret key, but Rust hard-codes an all-zero key and in fact has no way to keep a key secret. By standards for cryptographic hash functions, SipHash-2-4 with an all-zero-key is weaker than MD5, and generating a collision for that is pretty easy these days. SipHash-1-3 is even weaker, we should thus expect it to be a matter of hours to create a collision, if someone really tried.

Generating a concrete example of an unsoundness from that is a bit more tricky since one would have to find a Rust type generating the collision, but it seems fairly clear that the bar of "full (non-truncated) cryptographic hash" is not met by the current type_id implementation. The point of this issue is to determine how the compiler implementation can best satisfy the soundness expectation set by the lang team.

If you instead want to argue that the lang team should change its mind, please open a new issue and gather arguments in favor of that position, so that a summary of all the arguments for either option can be brought to the lang team for discussion. Also, this issue is only about type_id. According to this comment, the only other case where a hash collision could cause unsoundness is with incremental builds; see #129016 for the issue tracking that. All other hashes are actively checked for collisions by Rust. That said, it is not clear to me whether that also covers dynamic linking scenarios -- if someone knows about hash-related soundness concerns for that case, please file a new issue.

Possible solutions

We should do one of the following:

switch to a stronger hash function, or
switch to a different scheme that doesn't rely on collision-resistance of the hash function.

Worth noting is that a cryptographic hash function these days must output at least 256 bits to be considered worth its salt. The lang team has not spelled out their exact definition of "cryptographic hash function", but the standard definition includes collision resistance, and in fact collisions are exactly what we are most worried about here, so it seems reasonable to assume that this is part of the lang team intent. A lang team member mentioned BLAKE3 as a candidate, further corroborating this claim. So we'd have to switch to BLAKE3 or SHA2 or something like that.

Therefore, in both cases will we need more than the currently available 128 bits of a TypeId to obtain the desired level of collision resistance. To avoid further increasing the size of TypeId, the most likely scheme would be to include a pointer in TypeId that points to the remaining information -- either a 256-bit hash, or a string (null-terminated or with leading length information, to obtain a "thin" pointer), or something like that.

#95845 explores what this could look like without depending on a hash function: TypeId becomes a pair of a (not-soundness-critical) hash and a pointer. If the hashes are different, we can quickly conclude "inequal". If the pointers are equal, we can quickly conclude "equal". (Many linkers should be able to deduplicate the data the pointers point to, making the "equal" optimization even work cross-crate in many cases.) The same approach could easily be used without a full type name, by storing a pair of a low-quality hash for quick "inequal" checks and a high-equality hash behind the pointer to provide soundness.

Relying on the linker to deduplicate is unlikely to work since not all platforms have linkers that can do that (in particular for dynamic linking). C++ has a similar problem to solve and, at least on Windows, seems to do something like the first of these options: compare pointers, and fall back to comparing strings.

Therefore, the next step here seems to be for the compiler team to decide whether to use a hash function or to include the type name in the binary. The latter has an existing implementation, but there were concerns about leaking type names (and the paths leading to those types) in the binary.

In the original paper, they even write: "We comment that SipHash is not meant to be, and (obviously) is not, collision-resistant." ↩

The text was updated successfully, but these errors were encountered:

the8472 · 2024-08-12T23:05:00Z

If you instead want to argue that the lang team should change its mind, please open a new issue and gather arguments in favor of that position

I have opened #129030 which somewhat goes in that direction, though from a different angle.

apiraino · 2024-08-15T15:10:41Z

Discussed in T-compiler triage meeting on Zulip. The topic needs more time, is worth a dedicated design meeting. Not only about the options we have but also about defining the threat model we are trying to protect against.

briansmith · 2024-08-21T18:55:22Z

All other hashes are actively checked for collisions by Rust.

When a collision is detected, compilation will fail, right? If so, the issue is reduced from unsoundness to denial of service; that's definitely a less-serious issue but still problematic.

RalfJung · 2024-08-21T19:04:39Z

This issue is about type_id; if you are concerned about other hashes please open a new issue or discuss that in #129030.

briansmith · 2024-08-21T19:59:01Z

So we'd have to switch to BLAKE3 or SHA2 or something like that.

The discussion from the Bazel team regarding switching from SHA-256 to BLAKE3 maybe be useful. On modern x86-64 and ARMv8 hardware the fastest SHA-256 implementations that use SHA-256-specific CPU instructions should be notably faster than BLAKE3.

apiraino · 2024-08-22T15:24:56Z

Filed a t-compiler design meeting to discuss this as the topic deserves. @RalfJung feel free to add some more context if you wish

@rustbot label -I-compiler-nominated

apiraino · 2024-11-12T15:16:06Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +C-discussion

RalfJung · 2025-01-28T16:59:18Z

Quoting from the compiler team discussion meeting summary:

Meeting took place on Zulip.

The summary: Out of the options suggested, seems that switching to not rely on hashing for soundness could give us the best results, also taking into account an increased size and perf. cost. Instead use a pair of a (not-soundness-critical) hash and a pointer but still use hashing as an optimization.

The meeting didn't set any actionables so I'm not sure about them. You can probably talk to us, T-compiler.

(David also posted a brief comment about not treating is as a security concern because - so far - we have no concrete threat-model scenario)

RalfJung · 2025-01-28T17:03:56Z

That sounds to me like an implementation of that scheme (basically a revival of #95845) should be accepted? Or would an MCP still be required?

The main remaining concern seems to be @the8472 who is concerned about inconsistent decisions in this space. However, I don't see why that would block a PR if the change can be demonstrated to have negligible downsides. The upsides are that we can confidently stabilize type_id in const, which is a regularly requested feature.

the8472 · 2025-01-28T17:19:03Z

Yeah, as I said in #129030 (comment)

AIUI one of the proposed fixes is storing all the type names in a binary. This will increase binary sizes which is a real issue for embedded systems. If there's a fix that's free in terms of the produced output then I agree that it's not productive to argue further.

at that point the threat model discussion would become only about general policy, not the typeid issue.

bjoernager · 2025-01-28T20:21:36Z

If the memory concerns are problematic enough, would adding a compiler flag (e.g. unsafe-type-id) that always makes use of hashes similar to the current implementation be an appropriate mitigation?

_{Assuming the conditional layout/representation of the type from such a flag wouldn't pose any issues.}

fu5ha · 2025-05-03T22:18:58Z

Hi, I'm interested in working on implementation for this, be it a revival of #95845, something similar but storing a 256-bit hash behind the pointer rather than the full mangled string, or simply increasing the efficacy of the hashing algorithm while leaving the result as a 128-bit hash inline. However, it seems ill advised to begin on implementing any of these without agreement on what the threat model is, and therefore using that to inform which option is the best to move forward with (thus hopefully preventing the implementation from again being stuck in limbo due to lack of consensus on the intended goals of the change). For example, if we decide that we don't care about protecting against engineered collisions, only against "legitimate" ones, it could change our outlook on what design we could accept for a hash function.

Would it be possible to add that conversation to the docket at some point in the near future?

If the memory concerns are problematic enough, would adding a compiler flag (e.g. unsafe-type-id) that always makes use of hashes similar to the current implementation be an appropriate mitigation?

This seems like a poor compromise to me, and also requires writing, maintaining, and ensuring two different paths function as expected. If the memory concerns are problematic enough, we should probably try to find an alternate solution.

the8472 · 2025-05-04T21:28:07Z

However, it seems ill advised to begin on implementing any of these without agreement on what the threat model is

That's #129030 and it's currently waiting on t-lang.

fu5ha · 2025-05-04T22:52:51Z

Sorry, yes, perhaps I should have posted there but the idea of my post was more or less to inquire about t-lang having the necessary discussion as you say.

RalfJung · 2025-05-07T18:09:56Z

This also recently came up on a const-eval Zulip discussion: supporting type_id in const would be a lot simpler if we made TypeId a pointer to some opaque blob of data that we can then enforce is never actually accessed in const-eval and Miri. That data can then easily be a large hash or whatever, it's also easy to change later.

This does come at a cost though, since ensuring two type_id are different requires loading through that pointer.

RalfJung added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness C-bug Category: This is a bug. I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Aug 12, 2024

rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 12, 2024

jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 12, 2024

RalfJung mentioned this issue Aug 12, 2024

Collisions in type_id #10389

Closed

the8472 mentioned this issue Aug 12, 2024

Language vs. implementation threat models and implications for TypeId collision resistance #129030

Open

Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 13, 2024

apiraino mentioned this issue Aug 22, 2024

type_id is not sufficiently collision-resistant rust-lang/compiler-team#774

Closed

rustbot removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Aug 22, 2024

engusmaze mentioned this issue Sep 23, 2024

Add new builtin: @typeId ziglang/zig#19858

Closed

weihanglo mentioned this issue Oct 10, 2024

Improve resolver speed rust-lang/cargo#14663

Merged

rustbot added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 12, 2024

MarcGuiselin mentioned this issue Jan 25, 2025

Tracking Issue for const fn type_id #77125

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

type_id is not sufficiently collision-resistant #129014

type_id is not sufficiently collision-resistant #129014

RalfJung commented Aug 12, 2024 •

edited by pnkfelix

Loading

the8472 commented Aug 12, 2024

apiraino commented Aug 15, 2024

briansmith commented Aug 21, 2024

RalfJung commented Aug 21, 2024

briansmith commented Aug 21, 2024

apiraino commented Aug 22, 2024

apiraino commented Nov 12, 2024

RalfJung commented Jan 28, 2025

RalfJung commented Jan 28, 2025 •

edited

Loading

the8472 commented Jan 28, 2025

bjoernager commented Jan 28, 2025

fu5ha commented May 3, 2025

the8472 commented May 4, 2025

fu5ha commented May 4, 2025

RalfJung commented May 7, 2025

type_id is not sufficiently collision-resistant #129014

type_id is not sufficiently collision-resistant #129014

Comments

RalfJung commented Aug 12, 2024 • edited by pnkfelix Loading

Possible solutions

Footnotes

the8472 commented Aug 12, 2024

apiraino commented Aug 15, 2024

briansmith commented Aug 21, 2024

RalfJung commented Aug 21, 2024

briansmith commented Aug 21, 2024

apiraino commented Aug 22, 2024

apiraino commented Nov 12, 2024

RalfJung commented Jan 28, 2025

RalfJung commented Jan 28, 2025 • edited Loading

the8472 commented Jan 28, 2025

bjoernager commented Jan 28, 2025

fu5ha commented May 3, 2025

the8472 commented May 4, 2025

fu5ha commented May 4, 2025

RalfJung commented May 7, 2025

RalfJung commented Aug 12, 2024 •

edited by pnkfelix

Loading

RalfJung commented Jan 28, 2025 •

edited

Loading