-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace "heap" with "dynamic allocation" in the documentation #18226
Comments
To be fair, the term |
I tend to agree, to me the idea of a "heap" conjures images of Java. On the other hand, the "heap vs. stack" dichotomy, imagined or not, is quite popular. I'm leery of giving people the impression "lol rust allocates its stack on the HEAP? nah I'll go back to a real lang like c++ thx". |
You've put forward an argument against a different proposal than the one here. The term heap should be completely removed from the documentation because it distracts from the high level semantics and doesn't actually have a basis in reality. Rust doesn't use the data section referred to as the heap (dss) for either stacks or dynamic memory allocations. The dss section only exists for legacy reasons anyway. |
C and C++ don't mention the terms stack or heap at all. The language and library APIs are defined in terms of the high level semantics and the low-level details are left to the implementation. Those low-level details do not really include a distinction between dynamic allocations of stacks for new threads and other dynamic allocations. |
While I agree that a C-style language specification, if there was such a document, shouldn't talk about "the stack" and "the heap", the former is a significant component of almost all ABI-s, being the scratch space given to functions, and "heap" is just the short term for "memory managed by the process-global dynamic allocator" (I haven't seen anyone really claim that "the heap" means "the dss section"), so these are terms low-level developers are familiar with, and documentation aimed at these developers should talk about how Rust uses them. |
@arielb1: Stacks are dynamic allocations and come from the same OS API as other memory allocations. The usage of an allocator in between these allocations and the OS API is an implementation detail that's likely going to change soon. It doesn't make sense for Rust to call mmap and munmap directly or allow the POSIX thread API to call it because it increases memory fragmentation and decreases performance. It really only needs to deal directly with |
General purpose documentation should concentrate on semantics, not implementation details. Documentation of the implementation details or performance characteristics has no use case for familiar but inaccurate ways of describing the functionality. |
The term isn't used by C and C++ themselves because it's a platform-specific implementation detail. That's how the heap is defined by the Linux kernel and the userspace documentation. The way Rust's documentation uses the term is inaccurate, misleading and confusing. If you want to talk about implementation details, then lying about how it actually works isn't a great start. For example, the glibc malloc implementation has a |
Of course, both the stack, the heap, static allocation, and manually mmaped zones are all regions of virtual memory that are used in the same way. However, from within a function, memory from these areas certainly isn't allocated from the same API-s – stack allocations are a pointer bump, heap allocations are function calls, arena allocations via different function calls (which may eventually allocate from other regions) etc. (inside a function, there is also memory accessible via a reference, which can come from anywhere without the function needing to know about it). Also, the |
@arielb1: The documentation is quite clear that
|
You're conflating the usage pattern of the memory after it's allocated with obtaining the memory allocation. Putting local variables on the stack or pushing elements to a vector with reserved capacity is not a memory allocation. The stack itself is a dynamic memory allocation and the call stack manages that memory as a stack data structure. |
The documentation is incredibly clear that there's a distinction between the heap (managed by
Another case where the term "heap" is mentioned:
Or just look in |
This isn't true. It uses free lists for more than just managing memory in the heap (dss) because it will make secondary arenas with |
I got this distinction from musl libc, which manages the |
On Linux, heap refers to the data section managed by |
And if you set |
No, we're not arguing about definitions. I'm talking about dynamic memory allocations and you're talking about how the memory is used after it's allocated. It's possible to use a |
The call stack is a very simple data structure implemented on top of a dynamic memory allocation. It doesn't appear magically without calling into the general purpose allocator or going directly through the operating system API (which I expect is slower than calling The general purpose documentation should not be getting into these implementation details, and I am going to enforce accuracy in any implementation documentation. That leaves no place for an incorrect distinction between stack and "heap" allocations, because it's not how things are done. It will be even more blatantly incorrect when there's support for allocators. |
@thestinger I think you're right about "heap" being not ideal, but I don't think "dynamic allocation" is much better, except perhaps in a narrow formal terminology sense. |
All I know is that there used to be accurate high-level documentation on boxes in the tutorial. The new documentation gets bogged down in the implementation details rather than sticking to the language semantics and it isn't correct about those implementation details. I don't think it should get into this stuff at all because it changes from week to week but if it is going to then it needs to be accurate. |
@thestinger I like the idea of doubling down on "box" as terminology, especially since it's what the syntax says: |
The |
@thestinger I confess I am somewhat confused. To me, the terms "heap" and "stack" have basically nothing to do with the source of the memory, and everything to do with the usage pattern of the memory. That is, the call stack refers to the big hunk of memory we reserve to store function activations. This is used according to a stack discipline (first in, last out). The heap, in contrast, refers to memory that is allocated and freed in some other pattern that does not follow a stack discipline. That is, it is not first in, last out. Naturally this requires a more complicated data structure to track what memory is free than a stack does. I don't believe that, in common usage, the word heap is specifically tied to the DSS segment (or any other). Heap vs stack seems like a useful distinction and one that is very meaningful to Rust. In particular, the stack is tied to the language in the form of lifetimes and so on. When you say "in the documentation", I guess you mean things like the guide and so forth? I can see the argument for being abstract in a language reference, but I think we are permitted some license in the tutorial. Being overly generic and abstract can make text quite hard to understand. All that said, I could believe that we can use the word "box" rather than "heap" in many places. |
It can refer to the call stack and dynamic allocation, but referring to a heap is neither clear or accurate. Pushing a stack frame onto the call stack isn't memory allocation, it's usage of a fixed amount of pre-allocated memory. It's comparable to pushing and popping from a vector. The memory allocated for the call stack is not any easier to track than memory allocated for a vector. In fact, I expect that both will end up using the same allocator code path in the future.
Lifetimes are tied to the call stack, but that call stack is a high-level language concept managing memory spread out across registers, the current thread's call stack and other dynamically allocated memory like |
I haven't lived long enough to have off-hand references to historical usage, but "heap" is certainly frequently used to refer to any pool of memory from which dynamic allocations are carved out. Using it to refer to some Linux section is a nonstandard and specialized meaning which can be useful, but has little chance of confusing anyone using Rust, especially on any other platform. To wit: OS X frequently uses the term "heap" despite not having such a section. So does Windows. jemalloc and tcmalloc refer to profiling the area of memory they manage as "heap profiling". The term for overflowing a malloced buffer is heap overflow. Wikipedia's memory management article refers to any pool of memory used for allocation as "the heap". Even the Linux kernel has a bunch of references to heap allocation from other heaps. FWIW, I have literally never heard of .dss, nor is what sbrk manages usually described as "the heap", see one, two, three. |
@comex: The areas managed by jemalloc will include our call stacks in the future. The glibc malloc documentation explicitly refers to the sbrk heap as the heap as does the kernel ABI (as |
By doing that, you have allocated your stack from the heap (you requested a portion of the jemalloc heap, and now you are using it as a stack) instead of directly from the kernel. There is no conflict here, any more than there is if you allocate an arena from the global heap and then start allocating sub-objects out of that. What is important is what data structure you directly obtain a particular piece of memory from, as this determines its lifetime semantics. (You could just focus on the lifetime semantics, but anything that dynamically manages memory is a "heap" and stacks are stacks, so you may as well just say the words.) |
@comex: I can accept heap as a near synonym for address space but I don't think it's a sensible way to refer to anything not on a call stack. I don't see why an 2MiB vector used as a LIFO data structure would qualify as a heap allocation but the same 2MiB allocation used as a call stack would not. |
The allocation of the vector is an heap allocation (i.e. call to In that context, I could store an element in the vector, in a new place in the arena, in a new local variable (stored on the stack), in a new Now, vectors are typically used for more than just place to store things in (otherwise they would be called "arenas"), so putting things into vectors is typically not called "allocation". When programs want somewhere to put things into, the "classical" places are either a stack allocation or heap allocation. |
The use of "heap" to mean "unordered pool of free memory" predates the existence of such things as jemalloc and dss sections by decades. The documentation is conveying the semantics accurately; it's just using a broader and much more established definition of the word "heap". The fact the word is sometimes used in a narrower sense in other contexts doesn't make this use inaccurate. |
@jpetkau: Stack allocations come from the same pool of memory. It also completely ignores the possibility of having custom allocators, which is intended. |
Virtual memory / anonymous memory mappings are what makes the term so inaccurate now. |
This was brought up in a discuss thread as well: http://internals.rust-lang.org/t/newcomer-to-rust-my-experience/1816/28
|
With the stack and the heap chapter, exactly what we mean in these contexts has been addressed. |
Correctly parse `use` in generic parameters Fixes: rust-lang#18225
The documentation should be conveying the language semantics, not an inaccurate analogy of the implementation details. Rust never makes use of the
dss
section (aka the heap) by default as jemalloc prefers obtaining all memory via anonymous memory mappings. Stacks are dynamic allocations coming from the same operating system API asBox<T>
andRc<T>
. It's all obtained viammap
andVirtualAlloc
. The classical distinction between a static call stack and a dynamic heap doesn't really exist on modern operating systems since multi-threading means stacks are dynamic memory allocations.The text was updated successfully, but these errors were encountered: