Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSan is almost unusable on AArch64. #703

Closed
chefmax opened this issue Aug 2, 2016 · 20 comments
Closed

LSan is almost unusable on AArch64. #703

chefmax opened this issue Aug 2, 2016 · 20 comments
Assignees

Comments

@chefmax
Copy link

chefmax commented Aug 2, 2016

LSan works very slow even on trivial hello-world leaks. This happens because Aarch64 uses SizeClassAllocator32, in particular, here the code of ForEachChunk method of SizeClassAllocator32, that LSan calls several times:

  // Iterate over all existing chunks.
  // The allocator must be locked when calling this function.
  void ForEachChunk(ForEachChunkCallback callback, void *arg) {
    for (uptr region = 0; region < kNumPossibleRegions; region++)
      if (possible_regions[region]) {
        uptr chunk_size = SizeClassMap::Size(possible_regions[region]);
        uptr max_chunks_in_region = kRegionSize / (chunk_size + kMetadataSize);
        uptr region_beg = region * kRegionSize;
        for (uptr chunk = region_beg;
             chunk < region_beg + max_chunks_in_region * chunk_size;
             chunk += chunk_size) {
          // Too slow: CHECK_EQ((void *)chunk, GetBlockBegin((void *)chunk));
          callback(chunk, arg);
        }
      }
  }

For 39-bit VMA kNumPossibleRegions is calculated as:

SANITIZER_MMAP_RANGE_SIZE / kRegionSize

that equals to 2^47 / 2^20 == 2^27, thus we have a very busy loop in SizeClassAllocator32.

The problem I faced when trying to fix this is that actual VMA size (39, 42 or 48) is calculated dynamically at runtime, while allocator needs kSpaceSize to be determined statically. I wonder whether we could fix this in some way? Ideally, it would be nice to use SizeClassAllocator64, but it seems to be impossible according to #246. Any ideas?

@dvyukov
Copy link
Contributor

dvyukov commented Aug 2, 2016

We could add a bitmap which says what regions are non-empty.
If a bit covers 1000 regions, then number of loop iterations is reduced to 2^17. If we check a whole word in the bitmap at once, then it further reduces number of iterations to 2^11.

@kcc
Copy link
Contributor

kcc commented Aug 2, 2016

Before trying to fix this problem, please remind me: do we really need to support VMA 39 and 42?
If we only support 48, things become instantly much simpler.

@chefmax
Copy link
Author

chefmax commented Aug 2, 2016

Before trying to fix this problem, please remind me: do we really need to support VMA 39 and 42?
If we only support 48, things become instantly much simpler.

Well, I hit on this problem trying to run LSan on my 39-bit target mobile device. I'm not sure our customers would switch to 48 bits soon and it would be nice for us to use LSan now. AFAIK Fedora uses 42 bit VMA, but I'm not sure they tried to use LSan there.

@kcc
Copy link
Contributor

kcc commented Aug 2, 2016

Sad. The current code effectively makes 48-bit case slow because we have to support the smaller cases.

I think the next possible solution is to build an iterator inside ByteMap.
For FlatByteMap this would be a simple thing, but in TwoLevelByteMap it would skip the empty levels.
(The idea is not tested)

@chefmax
Copy link
Author

chefmax commented Aug 2, 2016

Yeah, this sounds complicated, Aarch64 is so problematic... Can't we calculate kNumPossibleRegions somehow on demand in SizeClassAllocator32 (say, before allocating first region)?

@kcc
Copy link
Contributor

kcc commented Aug 2, 2016

sure we can, but what I'd really love to do is to simply switch to SizeClassAllocator64 for 48-bit vma,
and that's can't be done at run-time.

@chefmax
Copy link
Author

chefmax commented Aug 2, 2016

Perhaps we could pick primary allocator dynamically? Say, in InitializeAllocator routine. This would imply some space overhead though.

@kcc
Copy link
Contributor

kcc commented Aug 2, 2016

This will have both space and time overhead.

@morehouse
Copy link
Contributor

@kcc: What's the status of this now?

@kcc
Copy link
Contributor

kcc commented Feb 16, 2019

Looks like the problem is still there:

clang -fsanitize=leak    ~/llvm/compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c  && time ./a.out 

real    0m18.315s
user    0m18.305s
sys     0m0.009s

I am tempted to disable lsan on aarch64 before someone can fix it.

@sebpop
Copy link

sebpop commented Feb 21, 2019

@ebahapo and I will be looking at this bug.

@sebpop
Copy link

sebpop commented Feb 22, 2019

simply switch to SizeClassAllocator64 for 48-bit vma

@kcc do you happen to know a better way to get the size of the VMA other than cat /proc/meminfo | grep VmallocTotal ? Maybe this is available in a more standard way from /sys ?

@kcc
Copy link
Contributor

kcc commented Feb 22, 2019

Not an expert here. I would either read /proc/meminfo or /proc/self/maps.
I guess you can also probe with mmap

@eugenis
Copy link
Contributor

eugenis commented Feb 25, 2019 via email

@sebpop
Copy link

sebpop commented Feb 26, 2019

Yes GetMaxVirtualAddress would work. Thanks @eugenis for the suggestion.

@sebpop
Copy link

sebpop commented Feb 27, 2019

I tried to implement this:

Perhaps we could pick primary allocator dynamically? Say, in InitializeAllocator routine.

Runtime switching 32 and 64-bit allocators would require too many changes to the current implementation of the allocators. Then I followed this path:

I am tempted to disable lsan on aarch64

The following patch disables LSan on aarch64 for 39 and 42-bit VMAs, and enables the 64-bit allocator by default on aarch64 machines with 47-bit VMA:

diff --git a/compiler-rt/lib/lsan/lsan.cc b/compiler-rt/lib/lsan/lsan.cc
index 5412b9ac05a..6e4f75eaee9 100644
--- a/compiler-rt/lib/lsan/lsan.cc
+++ b/compiler-rt/lib/lsan/lsan.cc
@@ -84,6 +84,15 @@ extern "C" void __lsan_init() {
   CHECK(!lsan_init_is_running);
   if (lsan_inited)
     return;
+#if defined(__aarch64__)
+  // Check whether the address space is 47-bit, that is large enough for the
+  // 64-bit allocator.
+  if (GetMaxVirtualAddress() < (((uptr) 1 << 48) - 1)) {
+    Report("WARNING: LSan is disabled on AArch64 on machines with 39 and 42-bit"
+           " virtual address spaces.");
+    return;
+  }
+#endif
   lsan_init_is_running = true;
   SanitizerToolName = "LeakSanitizer";
   CacheBinaryName();
diff --git a/compiler-rt/lib/lsan/lsan_allocator.h b/compiler-rt/lib/lsan/lsan_allocator.h
index 2d6b21a0301..ad479e66b94 100644
--- a/compiler-rt/lib/lsan/lsan_allocator.h
+++ b/compiler-rt/lib/lsan/lsan_allocator.h
@@ -49,8 +49,7 @@ struct ChunkMetadata {
   u32 stack_trace_id;
 };

-#if defined(__mips64) || defined(__aarch64__) || defined(__i386__) || \
-    defined(__arm__)
+#if defined(__mips64) || defined(__i386__) || defined(__arm__)
 static const uptr kRegionSizeLog = 20;
 static const uptr kNumRegions = SANITIZER_MMAP_RANGE_SIZE >> kRegionSizeLog;
 template <typename AddressSpaceView>
@@ -72,7 +71,7 @@ struct AP32 {
 template <typename AddressSpaceView>
 using PrimaryAllocatorASVT = SizeClassAllocator32<AP32<AddressSpaceView>>;
 using PrimaryAllocator = PrimaryAllocatorASVT<LocalAddressSpaceView>;
-#elif defined(__x86_64__) || defined(__powerpc64__)
+#elif defined(__x86_64__) || defined(__powerpc64__) || defined(__aarch64__)
 # if defined(__powerpc64__)
 const uptr kAllocatorSpace = 0xa0000000000ULL;
 const uptr kAllocatorSize  = 0x20000000000ULL;  // 2T.

On a 47-bit VMA aarch64 machine the time spent in LSan reduced from 2.5s to 0.02s when running

clang -fsanitize=leak compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c  && time ./a.out

The patch passes ninja check-all on that same aarch64 machine.
Should I submit this patch for review through llvm's phabricator?

@kcc
Copy link
Contributor

kcc commented Feb 27, 2019

This patch will fix the standalone lsan (-fsanitize=leak) and not lsan bundled with asan (-fsanitize=address). RIght?

@sebpop
Copy link

sebpop commented Feb 27, 2019

You are right, I have just tried with asan and it takes 2.5 seconds vs. 0.02s with lsan.
Do you have recommendations on how to fix asan as well?
Would a similar patch be acceptable to avoid registering LSan on atexit() when calling ASan on aarch64 39/42 VMAs?

@kcc
Copy link
Contributor

kcc commented Feb 27, 2019

Would a similar patch be acceptable

Emm.. What will happen on 47-bit? It will still have the Allocator32, right?
Also, this will affect how the llvm tests run (they will fail on one aarch64 boxes and pass on others)

@sebpop
Copy link

sebpop commented Apr 4, 2019

I just sent a patch for review https://reviews.llvm.org/D60243
that selects between allocator32 and allocator64 following the size of the VMA at run time.

llvm-git-migration pushed a commit to llvm/llvm-project that referenced this issue Aug 20, 2019
This patch fixes google/sanitizers#703
On a Graviton-A1 aarch64 machine with 48-bit VMA,
the time spent in LSan and ASan reduced from 2.5s to 0.01s when running

clang -fsanitize=leak compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c && time ./a.out
clang -fsanitize=address compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c && time ./a.out

With this patch, LSan and ASan create both the 32 and 64 allocators and select
at run time between the two allocators following a global variable that is
initialized at init time to whether the allocator64 can be used in the virtual
address space.

Differential Revision: https://reviews.llvm.org/D60243

llvm-svn: 369441
dtzWill pushed a commit to llvm-mirror/compiler-rt that referenced this issue Aug 20, 2019
This patch fixes google/sanitizers#703
On a Graviton-A1 aarch64 machine with 48-bit VMA,
the time spent in LSan and ASan reduced from 2.5s to 0.01s when running

clang -fsanitize=leak compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c && time ./a.out
clang -fsanitize=address compiler-rt/test/lsan/TestCases/sanity_check_pure_c.c && time ./a.out

With this patch, LSan and ASan create both the 32 and 64 allocators and select
at run time between the two allocators following a global variable that is
initialized at init time to whether the allocator64 can be used in the virtual
address space.

Differential Revision: https://reviews.llvm.org/D60243

git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@369441 91177308-0d34-0410-b5e6-96231b3b80d8
kstoimenov added a commit to llvm/llvm-project that referenced this issue Nov 1, 2022
…locator.

This change will switch SizeClassAllocator32 to SizeClassAllocator64 on ARM. This might potentially affect ARM platforms with 39-bit address space. This addresses [[ google/sanitizers#703  | issues/703  ]], but unlike [[ https://reviews.llvm.org/D60243 | D60243 ]] it defaults to 64 bit allocator.

Reviewed By: vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D137136
kstoimenov added a commit to llvm/llvm-project that referenced this issue Nov 2, 2022
…locator.

This change will switch SizeClassAllocator32 to SizeClassAllocator64 on ARM. This might potentially affect ARM platforms with 39-bit address space. This addresses [[ google/sanitizers#703  | issues/703  ]], but unlike [[ https://reviews.llvm.org/D60243 | D60243 ]] it defaults to 64 bit allocator.

Reviewed By: vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D137136
veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue May 28, 2024
…locator.

This change will switch SizeClassAllocator32 to SizeClassAllocator64 on ARM. This might potentially affect ARM platforms with 39-bit address space. This addresses [[ google/sanitizers#703  | issues/703  ]], but unlike [[ https://reviews.llvm.org/D60243 | D60243 ]] it defaults to 64 bit allocator.

Reviewed By: vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D137136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants