-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BufferBackend
: internal method has unsound signature
#49
Comments
I guess another option would be to pick an encoding for lengths that always produces valid utf8. E.g. use a length encoding that never sets the top bit of any byte. Sadly, this wastes 1 bit out of every length byte. It's less waste than the lookaside bit-table of the previous suggestion. |
@reinerp Generally you are right about the soundness of the API. However, given that this is an internal API that is only used properly internally this is at least not an issue right now that can be attacked. Please correct me if I am wrong. However, it would be great if we could improve the situation here. Maybe just flagging it as |
BufferBackend
: internal method should be unsafe to fix soudness issues
BufferBackend
: internal method should be unsafe to fix soudness issuesBufferBackend
: internal method has unsound signature
I dug deeper and it seems that the problem is not just internal to the |
This SAFETY comment (
string-interner/src/backend/buffer.rs
Line 102 in 95574e2
index
that were returned by thisBufferBackend
, but whenindex
is untrusted (could be provided arbitrarily) I think the invariant breaks.I'll phrase this adversarially with an "attacker", although it also applies to ordinary bugs. The failure case is that
index
points into the middle (rather than the beginning) of an attacker-controlled string. The attacker can arrange for the bytes at this index to decode into a valid varlen, such that the decodedstr_len
is any attacker-controlled value. In particular, they can arrange forstr_len
to be longer than the string it is in the middle of, such thatstr_bytes
contains some of the varlen bytes from the next string, which are not guaranteed to be utf8. This then breaks the invariants offrom_utf8_unchecked
.Sadly, the only fix I can currently see is a bit-table on the side which indicates where the start of every string is. This would allow you to safely validate that
index
indeed points to the beginning of a string rather than the middle.The text was updated successfully, but these errors were encountered: