Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans to support ARM64 as architecture? #33

Closed
alschult-saildrone opened this issue May 27, 2020 · 5 comments · Fixed by #50
Closed

Plans to support ARM64 as architecture? #33

alschult-saildrone opened this issue May 27, 2020 · 5 comments · Fixed by #50

Comments

@alschult-saildrone
Copy link

alschult-saildrone commented May 27, 2020

I would love to use tcmalloc on an ARM64 system, but it seems to not be officially supported and when I try to run I get the following:

external/com_google_tcmalloc/tcmalloc/system-alloc.cc:525] MmapAligned() failed (size, alignment) 33554432 33554432 @ 0x417480 0x416eb8 0x405528 0x415614 0x415410 0x427b4c 0x426bc8 0x7fa12d4144 0x 0x
external/com_google_tcmalloc/tcmalloc/arena.cc:31] FATAL ERROR: Out of memory trying to allocate internal tcmalloc data (bytes, object-size) 131072 48 @ 0x4055a8 0x415614 0x415410 0x427b4c 0x426bc8 0

A bit of debugging it looks like every time it calls the mmap with a hint it always gets back the same address (which doesn't match the hint), for example (with some extra logging):

external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x1df184000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x 
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x76520000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x 0
external/com_google_tcmalloc/tcmalloc/system-alloc.cc:507] mmap (result, hint, size) 0x7fb5bad000 0x7c0a8000000 33554432 @ 0x46e2cc 0x46f600 0x46d468 0x46cc48 0x4231d4 0x41d534 0x469364 0x469138 0x 0

The system I'm testing this on has a 4.9 kernel, so I can't try with MAP_FIXED_NOREPLACE. I also don't understand why mmap is always returning the same address, not sure if this is an ARM64 specific thing or something about my particular platform (clang-9, Ubuntu 18.04, Kernel 4.9.140-tegra, running on a Jetson Nano).

In any case, are there plans to support ARM64, and if not, any thoughts on what may be going on here?

@alschult-saildrone
Copy link
Author

NOTE: I am building currently with the SMALL_BUT_SLOW option on because I first thought maybe the default 1GB mapping was causing the issue, but I get the same problem with either option.

@ckennelly
Copy link
Collaborator

As a general note, ARM64 support would be desirable, but it's not yet ready. Our per-CPU cache implementation relies on hand-written assembly and needs to be ported to ARM.

Do you know how your page table is configured with your kernel build? https://kernel.org/doc/Documentation/arm64/memory.txt has a discussion of 39- versus 48-bit address spaces.

As an optimization, we "tag" (by using an upper bit of the pointer) sampled objects during allocation. The tag makes it easy to distinguish sampled objects from unsampled ones. My suspicion is that we are unable to allocate memory with the right bit pattern--possibly due to a constrained 39-bit address space--and so we fail. One way of testing this hypothesis is to edit tcmalloc/common.h's kAddressBits to 39 (instead of 48) for the ARM configuration.

@alschult-saildrone
Copy link
Author

alschult-saildrone commented May 27, 2020

Thanks for the quick reply! Based on the boot output, I think it is using 39-bit addresses:

[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffffff8000000000 - 0xffffff8008000000   (   128 MB)
[    0.000000]     vmalloc : 0xffffff8008000000 - 0xffffffbebfff0000   (   250 GB)
[    0.000000]       .text : 0xffffff8008080000 - 0xffffff8008f60000   ( 15232 KB)
[    0.000000]     .rodata : 0xffffff8008f60000 - 0xffffff80095e0000   (  6656 KB)
[    0.000000]       .init : 0xffffff80095e0000 - 0xffffff8009e40000   (  8576 KB)
[    0.000000]       .data : 0xffffff8009e40000 - 0xffffff800a11b808   (  2927 KB)
[    0.000000]        .bss : 0xffffff800a11b808 - 0xffffff800a1b3dbc   (   610 KB)
[    0.000000]     fixed   : 0xffffffbefe7fd000 - 0xffffffbefec00000   (  4108 KB)
[    0.000000]     PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000   (    16 MB)
[    0.000000]     vmemmap : 0xffffffbf00000000 - 0xffffffc000000000   (     4 GB maximum)
[    0.000000]               0xffffffbf00000000 - 0xffffffbf03fc8000   (    63 MB actual)
[    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc0ff200000   (  4082 MB)

I'll give it a try to change the address bits and see if it works.

In the short term I'm interested in using tcmalloc for the heap profiling abilities, so even if the assembly code is not complete, do you expect tcmalloc to work functionally (albeit not at max performance) on ARM64?

@alschult-saildrone
Copy link
Author

BTW, the change to 39-bits seems to work (at least for the hello_main test program). Going to try with our main application. If you think that it's likely we'll run into other problems given the not-officially-supported nature, please let me know. Thanks agian!

@ckennelly ckennelly linked a pull request Oct 19, 2020 that will close this issue
@westlaker
Copy link

Hello, I have just compiled snort 3.1 on Arm64 machine
snort_strace_tmalloc.txt
with tcmalloc enabled, the snort binary won't run, it is stuck see attached log file.
without tcmalloc enabled, it works fine.

any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants