Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memmgr performance bug caused by high number of free vmas #2033

Closed
adrianlut opened this issue Oct 15, 2024 · 7 comments · Fixed by #2052
Closed

memmgr performance bug caused by high number of free vmas #2033

adrianlut opened this issue Oct 15, 2024 · 7 comments · Fixed by #2052

Comments

@adrianlut
Copy link
Contributor

adrianlut commented Oct 15, 2024

Description of the problem

While benchmarking the Hyrise DBMS inside Gramine, a student and I found the following performance issue: Hyrise regularly allocates a lot of memory with small mmap calls and then calls munmap on them. This creates a lot of free vmas in the memmgr.h free list (possibly millions).

The problem occurs due to the CHECK_LIST_HEAD macro in list.h, which is called in both free_mem_obj_to_mgr() and get_mem_obj_from_mgr_enlarge() (i.e. on most interactions with the memmgr). It traverses the whole free list to check its correctness while the memmgr is locked, blocking the whole memory system for other threads. If the free list contains millions of entries and calls to mmap are common, this prevents any useful work from happening.

From looking at the CHECK_LIST_HEAD macro, I think this is not intended to happen in release mode since the asserts in the loop are then replaced with (void)0. However, our performance tests show that the issue occurs both in debug and release mode. I guess that although the loop does not contain useful work, it is not removed by the compiler.

Steps to reproduce

(I will follow up with code to reproduce the issue in the coming days if necessary)

  1. Compile and use Gramine from master
  2. Compile and use Hyrise TPC-H benchmark from master
  3. Run benchmark with scale factor 5, scheduler activated, 8 threads, and 8 simulated clients without Gramine
  4. Run benchmark with the same settings in Gramine

Alternative: Write a micro-benchmark

  1. Call mmap and munmap and measure the required time as baseline
  2. Fill the memmgr free list with $10^6$ items (call mmap $10^6$ times and then unmap all mapped memory)
  3. Call mmap and munmap and measure their time with a long free list.

I guess the thread-local vma cache will probably interfere with such a simple benchmark design, but the goal should be clear.

Expected results

Throughput of Hyrise benchmark when running with Gramine > 50% of throughput running without Gramine

Alternative micro-benchmark result: latency of mmap and munmap is independent of free list length.

Actual results

Throughput of Hyrise when running with Gramine is approximately 3% of throughput running without Gramine. According to the included perf functionality of Gramine debug builds, 80 % of runtime is spent in memory management/bookkeeping functions.

Alternative micro-benchmark result: latency of mmap and munmap depends on free list length.

Gramine commit hash

91c90b4

@adrianlut
Copy link
Contributor Author

Suggested solution

Surround the call to CHECK_LIST_HEAD with #ifdef DEBUG here and here.

@mkow
Copy link
Member

mkow commented Oct 15, 2024

@adrianlut: Hi, would you like to contribute that change? If not they I can do that and reference you in the commit for the idea ;)

@adrianlut
Copy link
Contributor Author

Sure, I'm happy to contribute. I'll create a pull request.

@dimakuv
Copy link

dimakuv commented Oct 15, 2024

Wow, great find, @adrianlut. I'm surprised we haven't noticed this before.

By the way, there is another PR that may be of interest to you (I wrote it): #1795

@adrianlut
Copy link
Contributor Author

adrianlut commented Oct 15, 2024

Hi @dimakuv, thank you! I guess you haven't noticed this because having millions of vmas in the free list is rather uncommon, isn't it?

Also, thank you for the pointer to your PR. This could potentially further decrease the Gramine performance penalty for Hyrise. Although I think that the bookkeeping operations in memory management are mostly write operations. I'll give it a try if I find the time.

adrianlut added a commit to adrianlut/gramine that referenced this issue Nov 6, 2024
Fixes the memmgr performance bug described in gramineproject#2033 by changing the `CHECK_LIST_HEAD` macro in `common/include/list.h` in release mode to `(void)0`.

Signed-off-by: Adrian Lutsch <adrian.lutsch@cs.tu-darmstadt.de>
@adrianlut
Copy link
Contributor Author

adrianlut commented Nov 6, 2024

I finally came arround to work on this.

I again thought about solution options and now I am wondering if just #ifdef DEBUG is actually the right solution.

The main problem I see with this solution is the fact that it still runs the check in debugoptimized. Thus, if others want to profile their application in gramine-sgx, this performance bug might be triggered and performance will be very different between release to debugoptimized. I for myself would have to fully remove the check to continue my work.

One option I see is only including the check in debug builds and not in debugoptimized. Is there an option to only compile code in debug mode and not in debugoptimized? As far as I understand the meson configuration, this is not possible.

Another option would be to create a special "sanity check" compile option that is deactivated by default and could be activated for testing purposes, but this increases the scope of fixing this simple bug a lot.

@mkow
Copy link
Member

mkow commented Nov 6, 2024

I think this is intended (or at least not a big problem), debugoptimized is not 1-1 as fast as release builds and will never be, it's just a faster version of the debugable build. So, at least from my side, I'm fine with just adding #ifdef DEBUG. All other asserts are also compiled-in in debugoptimized builds.

adrianlut added a commit to adrianlut/gramine that referenced this issue Nov 7, 2024
Fixes the memmgr performance bug described in gramineproject#2033 by changing the `CHECK_LIST_HEAD` macro in `common/include/list.h` in release mode to `(void)0`.

Signed-off-by: Adrian Lutsch <adrian.lutsch@cs.tu-darmstadt.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants