Memory management in the C runtime #805
Replies: 6 comments 15 replies
-
These are very interesting ideas... We really need to pursue this... |
Beta Was this translation helpful? Give feedback.
-
It is an interesting observation that the memory management in the runtime can influence the performance of the reaction code. My guess is that this is due to memory fragmentation. Most often malloc finds a suitable memory area very quickly, but if memory becomes fragmented due to frequent malloc and free, it can become slower. Given that the C target is rather static (all parameters are known at compile time), wouldn't we be able to compute the memory size required by the runtime in advance? That said, I would be hesitant to implement static allocation of a fixed memory area for the reasons you outline above. I think, however, that it could be a good idea to separate runtime memory from user memory. Normally, this is done with memory pools and a custom allocator. The custom allocator would allocate a larger piece of memory, let's say 1 MiB. Then it would serve any memory allocations directly from the pool. Should there be no more space in the pool, it would allocate another 1 MiB chunk to increase the pool size. Obviously there is overhead in this solution since the memory pool needs to be managed. This overhead can be reduced if the pool is intended to only allocate objects of a fixed size, like a node in the priority queue. I implemented a memory pool for fixed size objects in the C++ runtime in hope to speed up some dynamic data structures. But I found that it performed worse than a plain |
Beta Was this translation helpful? Give feedback.
-
In the real-time meeting today @erlingrj had an interesting comment about bounding the size of the event queue that I did not think of in my original post. I think the idea is that if we do not allow a multiple events to be scheduled in one reaction invocation, then the number of events on the event queue could be bounded? One problem that might need to be solved is that one event could result in multiple different reactions being triggered, which might each schedule an event. So I think that the event queue should steadily grow in a program like this:
Unfortunately I was not able to verify that an out of memory error occurs because the memory usage grows so slowly (like, only 80 megabytes after one hour?) #1464 might be tangentially related to this since it pertains to the dropping of events. |
Beta Was this translation helpful? Give feedback.
-
Counterexample:
Event queue grows without bound. |
Beta Was this translation helpful? Give feedback.
-
As briefly proposed today at our meeting, I propose we try a bottom-up approach. Let us build 2-3 simple examples that we can statically analyze and that can result in a possible static schedule. For this I would start by using features that are all visible at LF level. E.g., explicit timers, no event scheduling in the future, but at the same time instant. |
Beta Was this translation helpful? Give feedback.
-
I did not mean timers only. I think a reaction triggered by a timer event is fine to fire an output reaction at the same time instant. That is just a dependency between reactions/reactors that can be statically scheduled. The event queue is not necessarily bound by the number of timers. They can have different periods. We need to look at the hyperperiod of the timers. |
Beta Was this translation helpful? Give feedback.
-
The evidence that motivates this is limited, but since it might be relevant to various aspects of the C target, it seemed okay to start this discussion sooner rather than later.
Problems
Initial Motivation
While optimizing the C runtime, I was surprised to find that I had introduced performance problems for benchmarks in which more than 99% of the time is spent executing user-supplied C code (a reaction body or function in a preamble).
In the case of SortedLinkList, 99.7% of cycles were spent traversing the list, and yet bad memory management in a version of the runtime (my mistake; recently fixed by Soroush) caused execution time to increase by 20-60%, depending on the hardware. In the case of NQueens, 99.5% of cycles were spent accessing arrays that were dynamically allocated in a preamble function, but changes to the runtime made it about 20% slower on one machine. The extreme hardware dependence here might be a problem.
I am unsure if I understand these performance issues correctly. Additionally, the first one was due to a mistake on my part that needed to be fixed anyway. However, the instruction counts, L1 cache miss counts, and branch misprediction counts that I got from
perf
did not seem to explain the issues, nor did the last-level cache miss counts that were estimated by a cache simulator (cachegrind). Furthermore, both cases seem to show complex interactions between the runtime and apparently unrelated C functions written by the user. This is why my tentative guess is that a) these problems arise from heap fragmentation, and b) they could potentially be a design consideration that goes beyond one or two isolated programming errors.(I would welcome and appreciate other interpretations of the data, of course, and can share more details if anyone thinks it relevant.)
Concrete Proposal
This might not be quite the right idea, but hopefully it conveys my point:
It would be nice if we could set aside a known, fixed quantity of memory for the runtime, and then give the user a pristine heap (or a pristine area of the heap) that is "like new" -- unused and unfragmented by the runtime. The user can then write reactions and functions that use malloc, calloc, realloc, etc., with as clear an understanding of the state of the heap as if s/he were writing pure C.
It could work like this:
malloc
and friends.lfc
could print a message summarizing what the runtime is using space for.malloc
and does not directly interact with the OS.Challenges:
assert
).Potential Advantages
Potential Disadvantages
Possibly related: #793. Additionally, @lhstrh pointed out that some of the EECS 149 students who ran into memory limitations may have something to say about this topic.
Beta Was this translation helpful? Give feedback.
All reactions