Skip to content

array types (fixed-size arrays on 32-bit, HashMap, Vec) can be large enough that indexing is unsound #18726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thestinger opened this issue Nov 7, 2014 · 9 comments · Fixed by #26955
Labels
A-collections Area: `std::collections` A-type-system Area: Type system I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority

Comments

@thestinger
Copy link
Contributor

In a 32-bit process running on a 64-bit operating system, it's possible to allocate a Vec<u8> of length 1u32 << 31 or greater. The maximum value where uint as int produces a positive number is (1u32 << 31) - 1), so int is not large enough for offset operations to the end of these vectors. Everything from the language's built-in slice indexing operations to the push method on Vec<T> will perform an invalid negative (backwards) offset as the getelementptr instruction uses a signed offset.

@thestinger thestinger added A-libs I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Nov 7, 2014
@thestinger
Copy link
Contributor Author

I think the solution to this problem is defining the maximum size of an object or array type to be the maximum positive value representable by int. This would make using int for these cases correct, and the only loss would be the inability to have an object larger than half the address space which does not seem important. This would be a clear value to use as the upper bound on type sizes (which is now implemented, but without a language defined upper bound) and could also be an upper bound on the maximum size returned by memory allocators. It could be implemented deep inside jemalloc's huge allocation handling where it would have no performance impact.

@thestinger
Copy link
Contributor Author

The non-jemalloc memory allocation support would need to branch in allocate and reallocate, but it would be optimized out in all but cases where the sizes are dynamic (like vectors). Custom allocators would also need to consider this issue, but it can be very clearly laid out in the documentation.

@thestinger thestinger added the A-type-system Area: Type system label Nov 7, 2014
@thestinger thestinger changed the title array types (HashMap, Vec) can become large enough that indexing is unsound array types (fixed-size arrays on 32-bit, HashMap, Vec) can be large enough that indexing is unsound Nov 7, 2014
@petrochenkov
Copy link
Contributor

The problem is not exactly about size of allocations, for example a large bit container like Bitv can allocate less than int::MAX bytes of memory, but its index difference will still overflow. So the language requirement about size of allocations needs to be supplemented with library requirements on sizes of containers.

@thestinger
Copy link
Contributor Author

@petrochenkov: This issue is about undefined behaviour from pointer arithmetic overflows, including in low-level language primitives like struct field offsets and indexing of fixed-size arrays. Fixing it requires both lowering the max object size on 32-bit and dealing with pointer arithmetic overflows in collections. The Bitv integer overflow issues are entirely separate from this, as are integer overflows in collections of zero-size types.

@thestinger
Copy link
Contributor Author

Rust, like C and C++, uses the inbounds marker on pointer arithmetic instructions for performance reasons. Pointer arithmetic instructions that wrap have undefined behaviour with inbounds. Rust needs to bound the size of all types to prevent these overflows via the basic language primitives.

That's what the pull request I opened about this does, and now the remaining problems with pointer arithmetic overflows in library collections need to be fixed. It can either be done by adding code deep inside the allocator where it will have zero overhead or by adding pervasive overflow checks that will be hard to get right and will hurt performance. I don't think bounding individual memory allocations from liballoc to half of the address space is a major sacrifice to make for simplicity and performance.

@petrochenkov
Copy link
Contributor

@petrochenkov: I see, that means I misinterpreted the "unsound" in the title, i.e. overflow in Bitv is still dangerous, but it is not unsound in the memory safety sense.

@zwarich
Copy link

zwarich commented Nov 13, 2014

Pointer arithmetic instructions that wrap have undefined behaviour with inbounds.

Strictly speaking, certain uses of the resulting poison value have undefined behavior, not the pointer arithmetic itself.

@pnkfelix
Copy link
Member

P-high, should not block 1.0.

@pnkfelix pnkfelix added P-medium Medium priority and removed I-nominated labels Nov 13, 2014
bors added a commit that referenced this issue Nov 19, 2014
This fixes the gap in the language definition causing #18726 by defining
a clear bound on the maximum size for libraries to enforce.

Closes #18069
@kmcallister kmcallister added the A-collections Area: `std::collections` label Jan 16, 2015
bors added a commit that referenced this issue Jul 17, 2015
Per the top level comment:

A low-level utility for more ergonomically allocating, reallocating, and deallocating a
a buffer of memory on the heap without having to worry about all the corner cases
involved. This type is excellent for building your own data structures like Vec and VecDeque.
In particular:

* Produces heap::EMPTY on zero-sized types
* Produces heap::EMPTY on zero-length allocations
* Catches all overflows in capacity computations (promotes them to "capacity overflow" panics)
* Guards against 32-bit systems allocating more than isize::MAX bytes
* Guards against overflowing your length
* Aborts on OOM
* Avoids freeing heap::EMPTY
* Contains a ptr::Unique and thus endows the user with all related benefits

This type does not in anyway inspect the memory that it manages. When dropped it *will*
free its memory, but it *won't* try to Drop its contents. It is up to the user of RawVec
to handle the actual things *stored* inside of a RawVec.

Note that a RawVec always forces its capacity to be usize::MAX for zero-sized types.
This enables you to use capacity growing logic catch the overflows in your length
that might occur with zero-sized types.

However this means that you need to be careful when roundtripping this type
with a `Box<[T]>`: `cap()` won't yield the len. However `with_capacity`,
`shrink_to_fit`, and `from_box` will actually set RawVec's private capacity
field. This allows zero-sized types to not be special-cased by consumers of
this type.

Edit: 
fixes #18726 and fixes #23842
@Gankra
Copy link
Contributor

Gankra commented Jul 18, 2015

EAT IT SOUNDNESS HOLES

On Fri, Jul 17, 2015 at 8:42 PM, bors notifications@github.com wrote:

Closed #18726 #18726 via #26955
#26955.


Reply to this email directly or view it on GitHub
#18726 (comment).

dlrobertson pushed a commit to dlrobertson/rust that referenced this issue Nov 29, 2018
This fixes the gap in the language definition causing rust-lang#18726 by defining
a clear bound on the maximum size for libraries to enforce.

Closes rust-lang#18069
lnicola pushed a commit to lnicola/rust that referenced this issue Dec 23, 2024
fix: Reduce applicability of unnecessary_async assist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collections` A-type-system Area: Type system I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. P-medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants