Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression of thread-safety for logging macros #393

Merged
merged 4 commits into from
Oct 14, 2022

Conversation

wjwwood
Copy link
Member

@wjwwood wjwwood commented Oct 14, 2022

This came up in one of our projects using rolling and they noticed an uptick in crashes originating in logging calls.

It turns out it is related to optimizations made in #381, and is only triggered in specific scenarios, namely when some loggers have had their severity set by the user, but not all the ones being called. This triggers a code path that was part of #381 which can cause a global hash map to be added to, and sometimes grown, and this seems to be the root of many of the crashes, though it crashes in many different ways.

In this pr I added a regression test that catches these kind of problems most runs, but it isn't perfect since it is a race condition.

After some investigation and talking with @clalancette, this code block seems to be the offender:

rcutils/src/logging.c

Lines 946 to 957 in 3485f1b

// If the calculated severity is anything but UNSET, we place it into the hashmap for speedier
// lookup next time. If the severity is UNSET, we don't bother because we are going to have to
// walk the hierarchy next time anyway.
if (severity != RCUTILS_LOG_SEVERITY_UNSET) {
ret = add_key_to_hash_map(name, severity, false);
if (ret != RCUTILS_RET_OK) {
// Continue on if we failed to add the key to the hashmap.
// This will affect performance but is not fatal.
RCUTILS_SAFE_FWRITE_TO_STDERR(
"Failed to cache severity; this is not fatal but will impact performance");
}
}

After removing that, this test passes even if I use --gtest_repeat=1000 to run it 1000 times. Whereas it fails with that code uncommented almost every time. So I don't know if this makes the logging macros completely (conceptually) thread-safe, but it does undo this regression, it seems.

Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood wjwwood requested a review from clalancette October 14, 2022 00:33
@wjwwood wjwwood self-assigned this Oct 14, 2022
@wjwwood wjwwood requested a review from jacobperron October 14, 2022 00:33
Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood
Copy link
Member Author

wjwwood commented Oct 14, 2022

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, I'm fine with disabling this optimization for now.

However, I do think we should add additional language into the doc block for rcutils_log saying that it is expected to be thread-safe with itself. That might have saved me from introducing this problem to begin with (though the new test may also do that in the future).

With that change made, 👍 from me.

src/logging.c Outdated Show resolved Hide resolved
Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood
Copy link
Member Author

wjwwood commented Oct 14, 2022

I'm not going to re-run ci.ros2.org because the changes were docs only and the linters are run in the Rpr job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants