-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8267926: AsyncLogGtest.java fails on assert with: decorator was not part of the decorator set specified at creation. #4257
Conversation
…art of the decorator set specified at creation.
👋 Welcome back xliu! A progress list of the required criteria for merging this PR into |
Webrevs
|
Test: I personally inspect all gtests, only this line make _decorators bigger than before.
I will consider this case when I enable stdout/stderr of async logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is debug only this seems a reasonable workaround.
Thanks,
David
@navyxliu This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 86 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dholmes-ora, @phohensee) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
…art of the decorator set specified at creation.
Hi, @dholmes-ora, I recently realize how flexible unified logging is(JDK-8267952). By design, I also add a concurrent test to prove it works. By reasoning how it works and the test, I discover two places which could cause race condition. they are also fixed in this patch.
Could you verify this patch in Tier4 tests? thanks, |
Now the failure is irrelevant. It can't find proper toolchains on windows aarch64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments below - mostly typos - but I think the main race still exists.
I need more time to try and digest the other changes and understand the existing MT-safety aspects.
Meanwhile I've submitted for tiers 1-4 testing.
Thanks,
David
@@ -102,7 +102,7 @@ class AsyncLogMapIterator { | |||
using none = LogTagSetMapping<LogTag::__NO_TAG>; | |||
|
|||
if (*counter > 0) { | |||
LogDecorations decorations(LogLevel::Warning, none::tagset(), output->decorators()); | |||
LogDecorations decorations(LogLevel::Warning, none::tagset(), LogDecorators::All); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd to use LogDecorators::All - what exactly will that produce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the out-of-band message, it just shows that "xyz messages dropped due to async logging."
Unlike normal log messages, this ad-hoc message is enqueued when buffer is flushing. The synchronization I described in LogConfiguration::configure_output
only protects enqueuing log messages. This message is not under protection.Without LogDecorators::All, it will fail 'gtest:LogConfigurationTest.reconfigure_decorators_MT*' in async log mode.
// if setting has changed. It guarantees that all logs either synchronous writing or enqueuing to the async buffer | ||
// see the new tags and decorators. It's worth noting that the synchronization happens even level doesn't change. | ||
// | ||
// LogDecorator is a set of decorators represented in a uint. sizeof(uinit) is not greater than a machine word, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: uinit
// After updating output's decorators, it's still safe to shrink all decorators of tagsets. | ||
// | ||
// There are 2 hazards in async logging. A flush operation guarantees to all pending messages in buffer are written | ||
// before returning. Therefore, the hardards won't appear. It's a nop if async logging is not set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: hardards
AsyncLogWriter::flush(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as this completes, new logging requests could have been enqueued, thus restoring the hazard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true. Yes, new logsites could have been enqueued, but they must see the new tags and the new decorations. please note ts->set_output_level(output, level)
implies a wait_until_no_readers() if anything does change.
One trick is I treat asynclog enqueuing as same as synclog writing. wait_until_no_readers
also takes effect for async logging. It guarantees that all enqueuing async logs are done before the reader counter reset back to 0. If there're ongoing log requests with old tags or decorations, they all have enqueued before wait_until_no_readers()
returns. that's why one flush() is enough for the two hazards.
I put the latest patch through our tier 1-4 testing and it crashed in tier 3 on Linux-Aarch64: A fatal error has been detected by the Java Runtime Environment:SIGSEGV (0xb) at pc=0x0000ffff40820720, pid=3972473, tid=3972481JRE version: Java(TM) SE Runtime Environment (17.0) (fastdebug build 17-internal+0-LTS-2021-05-31-2211099.david.holmes.jdk-dev4.git)Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-internal+0-LTS-2021-05-31-2211099.david.holmes.jdk-dev4.git, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)Problematic frame:C [libc.so.6+0x60720] flockfile+0x0Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/a4f8fba9-f017-4328-b286-c66b6a97143d-S808/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/42993029-a18a-48c4-9a4d-108aee7b3811/runs/a38508fe-5d43-48ad-b70a-61f3e712fcd8/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_misc/scratch/0/core.3972473)If you would like to submit a bug report, please visit:https://bugreport.java.com/bugreport/crash.jsp--------------- S U M M A R Y ------------ Command Line: -XX:+ExecutingUnitTests -Xlog:async Host: AArch64, 6 cores, 46G, Oracle Linux Server release 8.3 --------------- T H R E A D --------------- Current thread (0x0000aaadf32b1e10): Thread "AsyncLog Thread" [stack: 0x0000fffefdaa0000,0x0000fffefdca0000] [id=3972481] Stack: [0x0000fffefdaa0000,0x0000fffefdca0000], sp=0x0000fffefdc9e510, free space=2041k |
It looks like this failure is taking a bit of work to resolve. I've gone ahead and |
Mailing list message from David Holmes on hotspot-runtime-dev: On 1/06/2021 10:26 am, Xin Liu wrote:
I understand the need to not use the true decorators, but using All Available log decorators: will all of those appear on this log line? Thanks, |
Mailing list message from Liu, Xin on hotspot-runtime-dev: Hi, David, No, logoutput won't show all decorators. It just materialize It's very tricky here, but it's essential to reason how the lockless A LogDecorators a SET of decorators in bitmask. A LogDecorations There are 2 different LogDecorations instances coexist.? In any give time, set 1 is a 'subset' of set 2. This guarantees that Why configure_output() is MT-safe if users allow to change decorations There are 4 steps in configure_output(). This mechanism is intrigue. That's why I try to comment it out. I also To make current mechanism support async logging, I insert a thanks, On 5/31/21 6:27 PM, David Holmes wrote:
|
Mailing list message from Liu Xin on hotspot-runtime-dev: hi, David, This is awkward. I think I changed too many in a patch. Do you think it is a good idea that I revert to If it does, we can know that this new crash is caused by the change in my It seems that it crashed while executing gtest in async mode. Could you thanks, On Mon, May 31, 2021 at 10:13 PM David Holmes <dholmes at openjdk.java.net> |
Mailing list message from David Holmes on hotspot-runtime-dev: Hi Xin, On 1/06/2021 3:48 pm, Liu Xin wrote:
I didn't test just that patch, but will try to do so. It might be best to revert to the simple workaround for now to fix the
[ RUN ] LogConfigurationTest.parse_log_arguments_vm HTH, |
Mailing list message from David Holmes on hotspot-runtime-dev: On 1/06/2021 4:00 pm, David Holmes wrote:
The basic change passed tiers 1-4 testing. Thanks,
|
1 similar comment
Mailing list message from David Holmes on hotspot-runtime-dev: On 1/06/2021 4:00 pm, David Holmes wrote:
The basic change passed tiers 1-4 testing. Thanks,
|
Mailing list message from David Holmes on hotspot-runtime-dev: On 2/06/2021 2:01 am, Daniel D.Daugherty wrote:
I've submitted the basic fix for tiers 5-7 testing. Xin: note you will need to merge with master and remove the test from Thanks, |
1 similar comment
Mailing list message from David Holmes on hotspot-runtime-dev: On 2/06/2021 2:01 am, Daniel D.Daugherty wrote:
I've submitted the basic fix for tiers 5-7 testing. Xin: note you will need to merge with master and remove the test from Thanks, |
Mailing list message from Liu Xin on hotspot-runtime-dev: hi, David, Thanks. Let's see the results of tier5~7 with the basic fix. If it can Actually, I didn't change a lot in this PR. I hoisted the Yesterday, when you ran the tier3~4 test, did all failures occur on aarch64? void LogOutputList::wait_until_no_readers() const { So far, I still can't reproduce the crash you sent me yesterday on aarch64. On Tue, Jun 1, 2021 at 8:03 PM David Holmes <david.holmes at oracle.com> wrote:
|
Mailing list message from David Holmes on hotspot-runtime-dev: On 2/06/2021 1:02 pm, David Holmes wrote:
That testing passed. David
|
1 similar comment
Mailing list message from David Holmes on hotspot-runtime-dev: On 2/06/2021 1:02 pm, David Holmes wrote:
That testing passed. David
|
hi, @dholmes-ora , I have another guess. This crash sight could also be caused by log rotation!
It's possible that
I don't know how to make it happen. The easiest way to verify my guess is to grep 'Error opening log file' if you still have those files. or you may try this revision. I remove AsyncLogGtest.java from problemlist. |
I double down that one flush() can eliminate two hazards. 2 concurrent tests are provided to demonstrate Previously, the test was in bad shape which cause unnecessary rotation and quite big(100M+) on /tmp. I guess that's the root cause of failure on Linux/aarch64. patched. This patch not only fixes intermittent crash in JDK-8267926, but also fixes JDK-8267952. |
Sorry. I just realize a big problem of mine. I try to fix multiple problems in one same PR. This is actually the first revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. If _decorators is set to a reference to an ephemeral value, random memory stomps may ensue.
David helped me verify the latest patch using tier1~7. I would like to ensure the regression test covers the following changes of unified logging. |
/integrate |
Thanks all reviewers! |
/sponsor |
@dholmes-ora @navyxliu Since your change was applied there have been 89 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit b09d8b9. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The root cause of the intermittent failure is that _decorators in LogDecorations
may be inconsistent with LogOutput::_decorators. It could happen when gtest disables a
Log output via set_log_config(TestLogFileName, "all=off").
Since we copy the entire logDecorations, it's reasonable to copy _decorator as well.
LogDecorator is a bitmask of uint. It's even smaller than a reference on LP64 platforms.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4257/head:pull/4257
$ git checkout pull/4257
Update a local copy of the PR:
$ git checkout pull/4257
$ git pull https://git.openjdk.java.net/jdk pull/4257/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4257
View PR using the GUI difftool:
$ git pr show -t 4257
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4257.diff