-
Notifications
You must be signed in to change notification settings - Fork 848
Fix false crash logs with regression tests #12783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
bneradt
merged 1 commit into
apache:master
from
bneradt:address_regression_test_crashlog
Jan 6, 2026
Merged
Fix false crash logs with regression tests #12783
bneradt
merged 1 commit into
apache:master
from
bneradt:address_regression_test_crashlog
Jan 6, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When traffic_server exits normally (e.g., after regression tests complete), traffic_crashlog was incorrectly logging a crash because it detected its parent process had terminated. This happened because traffic_crashlog uses PR_SET_PDEATHSIG to wake up when traffic_server exits, but it couldn't distinguish between a crash (where crash_logger_invoke sends signal info via the pipe) and a normal exit (where the pipe is simply closed). This fix adds a poll() check on stdin to verify that crash data was actually sent before logging a crash, preventing false positive crash logs.
serrislew
approved these changes
Jan 6, 2026
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This definitively distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed).
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This definitively distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed).
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This definitively distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed). (cherry picked from commit 9edd4df)
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This actually (as opposed to the previous patch) distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed).
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This actually (as opposed to the previous patch) distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed). (cherry picked from commit 49d8702)
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). This tweak adds reading on the socket to make sure that there is data being sent to verify that there was indeed a crash. This actually (as opposed to the previous patch) distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed).
bneradt
added a commit
to bneradt/trafficserver
that referenced
this pull request
Jan 7, 2026
The poll() check from apache#12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This actually (as opposed to the previous patch) distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed). (cherry picked from commit 49d8702)
bneradt
added a commit
that referenced
this pull request
Jan 8, 2026
This reverts commit dd3455c. Addressing this turns out to be more complicated than it at first appeared. I realize after more testing that while this patch removed the "false" crash log reports, it broke crash log generally. And attempts at fixing that haven't been very fruitful. Maybe we should look into this in the future, but in the meantime I want to restore things to the previous state so that crash logs will at least be created when there is an actual crash.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When traffic_server exits normally (e.g., after regression tests complete), traffic_crashlog was incorrectly logging a crash because it detected its parent process had terminated. This happened because traffic_crashlog uses PR_SET_PDEATHSIG to wake up when traffic_server exits, but it couldn't distinguish between a crash (where crash_logger_invoke sends signal info via the pipe) and a normal exit (where the pipe is simply closed). This fix adds a poll() check on stdin to verify that crash data was actually sent before logging a crash, preventing false positive crash logs.