Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory leak #309

Merged
merged 15 commits into from
Mar 12, 2021
Merged

Conversation

shintaro-iwasaki
Copy link
Collaborator

Pull Request Description

Problem

Resource allocation errors do not usually occur in testing, so these error handling paths are not tested. As a result, some Argobots functions leak resources, exit while keeping a lock, or cause a double-free corruption when such an error happens.

For example, ABT_init() (the default configuration) calls 17 malloc() / mmap() operations. Usually all the operations succeed (since a machine has enough memory), but Argobots is not tested if any of them fail. ABT_init() should succeed even if some of those operations fail while it "should" return an error after properly freeing already allocated resources if others fail. We want to fix it, but at the same time we should test if that fix is correct.

What this PR does

This PR fixes those issues the new testing library found. The new testing mechanism overrides malloc(), calloc(), mmap(), pthread_xxx_init(), ... functions by using dlsym() (actually dlvsym()) and simulates all the possible allocation success/failure patterns while tracking allocated resources to detect memory leak. This adds new 15 tests that cover most of the Argobots functions that incur resource allocation (including ABT_init() and ABT_thread_create()).

See test/leakcheck/rtrace.h for details.

Impact

This PR fixes several bugs, which is great, but user programs do not usually encounter those issues since those issues happen only when resource allocation fails.

This PR closes #116.

Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers

ABTD_affinity_list is an actual name used in the code.
Previous ABTD_affinity_init() asserts when memory allocation fails.
This patch changes ABTD_affinity_init() to properly handle memory
allocation error.
This patch fixes ABTD_affinity_list_create() to properly handle memory
allocation error.  Since it is tedious to recursively free all the
allocated memory, the current implementation uses a linked list to free
all just by following a link.

Note that ABTD_affinity_list is an internal object that is used by only
ABTD_affinity_init(), so extra memory overheads are negligible.
A FIFO_WAIT pool does not check return values of pthread_mutex_init()
and pthread_cond_init().  This patch fixes it.
ABTI_ktable_set() keeps holding a lock if an error (= memory allocation
failure) happens while taking a lock.  This patch fixes it.
A stackable scheduler that uses a non-automatic scheduler frees its ULT
twice, causing a double free error.  This patch fixes it.
ABT_pool_add_sched() did not reset p_sched->used when
ABTI_ythread_create_sched() fails, so ABT_sched becomes unusable.  This
patch fixes it.
ABTD_xstream_context uses several system resources such as pthread_t,
pthread_mutex_t, and pthread_cond_t.  Previously
ABTD_xstream_context_create() aborts if any resource allocation fails.
This patch adds branches to check and free them.
The specification says ABT_xstream_set_main_sched() and
ABT_xstream_set_main_sched_basic() can target a terminated xstream, but
they do not work.  This patch fixes them by adding a new path for this
case.
ABTI_mem_init() and ABTI_mem_init_local() ignored or just asserts memory
errors.  This patch fixes these functions.

Note that the callers such as ABT_xstream_create() and ABT_initialize()
will be fixed in the next commits.
When an error (e.g., memory allocation error) happens in execution
stream creation, it leaks memory and leaves given user handles modified.
This patch fixes it.
ABT_initialize() leaks memory or leaves global states broken if resource
allocation error happens during its routine.  This patch fixes it.
@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/all
test:argobots/freebsd
test:argobots/solaris
test:argobots/osx

@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/all
test:argobots/freebsd
test:argobots/solaris
test:argobots/osx

@shintaro-iwasaki shintaro-iwasaki force-pushed the pr/leakcheck branch 2 times, most recently from 79677ac to 5177eba Compare March 12, 2021 17:39
@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/freebsd

@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/freebsd

@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/all
test:argobots/solaris
test:argobots/osx

Though Linux does not need explicit pthread_mutex_destroy() or
pthread_cond_destory() for statically initialized pthread_mutex_t and
pthread_cond_t, other operating systems such as FreeBSD need them since
they lazily initialize those objects by using malloc() etc.  To avoid
memory leak, this patch adds pthread_mutex_destroy() and
pthread_cond_destory().

Note that by default Linux uses a real futex system call, not POSIX
mutex/cond, so the performance on Linux should not be affected by this
change.
New rtrace library tracks all the memory/system resources and simulates
system allocation error to see Argobots functions properly handle those
errors (e.g., no memory leak).  This patch adds 15 tests to check such.
This speeds up leakcheck tests when log is enabled as well as logging
itself.
@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/all
test:argobots/solaris
test:argobots/osx
test:argobots/freebsd

@shintaro-iwasaki
Copy link
Collaborator Author

test:argobots/all
test:argobots/solaris
test:argobots/osx
test:argobots/freebsd

@shintaro-iwasaki shintaro-iwasaki merged commit edf86c8 into pmodels:main Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

malloc failures should be checked and assertion should be avoided
1 participant