Skip to content

Conversation

@bryancall
Copy link
Contributor

@bryancall bryancall commented Nov 19, 2025

Problem

Three unit tests were failing on Rocky Linux 8 with ASAN enabled in CI:

  • test_PluginDso
  • test_PluginFactory
  • test_RemapPluginInfo

The tests were passing all their assertions, but LeakSanitizer was reporting false positive memory leaks from dlopen calls, causing the tests to fail.

Root Cause

On Rocky Linux 8, ASAN's instrumentation of the dynamic linker (ld-linux-x86-64.so) causes it to report leaks during plugin initialization via dlopen, even though the code properly calls dlclose() for each handle.

The leak backtrace shows:

#0  operator new
#1  <unknown module>  (plugin_v1.so - global init)
#2  <unknown module>  (plugin_v1.so - init_array)
#3  <unknown module>  (plugin_v1.so - init)
#4  ld-linux-x86-64.so.2  (dynamic linker - call_init)
...
#16 PluginDso::load()

Why Existing Suppressions Didn't Work

The suppression file already had leak:PluginDso::load, but it wasn't catching these leaks because:

  • LSAN uses fast unwinding by default, which only captures ~5 stack frames
  • PluginDso::load is at frame Fix list templating #16, so it's not visible in the fast unwind
  • Using fast_unwind_on_malloc=0 would allow the existing suppression to work, but has significant performance overhead. I also had problems with the test segfaulting with fast_unwind_on_malloc=0.

Solution

Added leak:ld-linux-x86-64.so to the suppression file, which catches the leak at frame #4 where the dynamic linker appears in the shallow stack trace.

Testing

Tested in Rocky Linux 8 Docker container with ASAN enabled:

  • ✅ All three tests now pass
  • ✅ No performance impact from slow unwinding
  • ✅ Tests still properly validate plugin loading/unloading logic

Fixes CI failures: https://ci.trafficserver.apache.org/job/master/job/os_build/46636/console

@bryancall bryancall force-pushed the fix-rocky8-plugin-tests branch 2 times, most recently from 3ae263f to 3558969 Compare November 19, 2025 16:11
@bryancall bryancall self-assigned this Nov 19, 2025
@bryancall bryancall added this to the 10.2.0 milestone Nov 19, 2025
@bryancall bryancall added ASan Address Sanitizer CI labels Nov 19, 2025
@bryancall bryancall requested a review from Copilot November 19, 2025 16:13
@bryancall
Copy link
Contributor Author

Another option is to just compile these unit tests with fast_unwind_on_malloc=0. I will look into the performance differences.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes false-positive memory leak failures in three Rocky Linux 8 unit tests (test_PluginDso, test_PluginFactory, and test_RemapPluginInfo) when running with ASAN enabled. The issue stems from ASAN's fast unwinding not capturing deep enough stack frames to match existing suppressions. The fix adds a targeted suppression for the dynamic linker itself.

Key Changes:

  • Adds leak:ld-linux-x86-64.so suppression to catch leaks at the dynamic linker level
  • Reformats existing leak:call_init suppression (moved to its own line)
  • Documents the Rocky Linux 8-specific issue with inline comments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

leak:call_init
# On Rocky Linux 8, ASAN reports leaks from the dynamic linker during dlopen
# These are false positives as we properly call dlclose for each dlopen
leak:ld-linux-x86-64.so
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suppression pattern leak:ld-linux-x86-64.so is architecture-specific (x86-64) and may not work on other architectures like ARM64 or i386. Consider using a more portable pattern like leak:ld-linux or documenting that this suppression is x86-64 specific. If Rocky Linux 8 is only supported on x86-64 in CI, this may be acceptable.

Suggested change
leak:ld-linux-x86-64.so
leak:ld-linux

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to narrow the scope and only fix CI that is x86.

@bryancall bryancall marked this pull request as draft November 19, 2025 16:22
@bryancall
Copy link
Contributor Author

[approve ci]

@bryancall
Copy link
Contributor Author

I tried using fast_unwind_on_malloc=0 and it didn't go well. I was getting segfaults and I wasn't getting the symbol names back.

Add leak suppression for ld-linux-x86-64.so to fix false positive memory
leak reports on Rocky Linux 8 when running plugin tests with ASAN enabled.

The tests properly call dlclose() for each dlopen(), but ASAN's instrumentation
of the dynamic linker reports false positive leaks during plugin initialization.

Fixes: test_PluginDso, test_PluginFactory, test_RemapPluginInfo on Rocky Linux 8
@bryancall bryancall force-pushed the fix-rocky8-plugin-tests branch from 3558969 to 819f760 Compare November 19, 2025 19:00
@bryancall bryancall marked this pull request as ready for review November 19, 2025 19:02
@bryancall
Copy link
Contributor Author

I was able to fix the segfaults, it was a dependency issue that other cmake targets to be build fist. However, with rockylinux 8 we would need to use an external symbolizer to convert the address numbers to symbol names. So the best approach is the modify the suppression file, like this PR does.

@bryancall
Copy link
Contributor Author

[approve ci autest 0]

@bryancall bryancall merged commit 734f992 into apache:master Nov 20, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASan Address Sanitizer CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants