Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time #22915

Closed
3 tasks done
yyj8 opened this issue Jun 15, 2024 · 0 comments · Fixed by #22916
Closed
3 tasks done
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@yyj8
Copy link
Contributor

yyj8 commented Jun 15, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS: Windows and Mac and Linux
JDK: 17
Pulsar version: 3.0.5

Minimal reproduce step

  1. Simulate a deadlock scenario in broker and start pulsar broker.
  2. Send a health checkpoint HTTP request: http://localhost:8080/admin/v2/brokers/health.
  3. Then on the server, you can see:
    `
    2024-06-15T11:59:30,935 - ERROR - [pulsar-web-39-3:BrokersBase@399] - Deadlocked threads detected. Thread-0(tid=101), Thread-1(tid=102)
    2024-06-15 11:59:30
    Full thread dump OpenJDK 64-Bit Server VM (17.0.6+10 mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x000001c5fd3a6b30, length=99, elements={
0x000001c5d8140e20, 0x000001c5fad7a450, 0x000001c5fad7b2c0, 0x000001c5fad938d0,
0x000001c5fad982b0, 0x000001c5fad98c60, 0x000001c5fad99610, 0x000001c5fad9e700,
0x000001c5fad9f1a0, 0x000001c5fada1ba0, 0x000001c5fac54670, 0x000001c5fb066850,
0x000001c5fb066d40, 0x000001c5fb069350, 0x000001c5fcbf3090, 0x000001c5fccdf640,
0x000001c5fce38190, 0x000001c5fce377f0, 0x000001c5fce38b30, 0x000001c5fce37cc0,
0x000001c5fce39000, 0x000001c5fce394d0, 0x000001c5fce399a0, 0x000001c5fce36e50,
0x000001c5fce37320, 0x000001c5fce35fe0, 0x000001c5fce364b0, 0x000001c5fce38660,
0x000001c5fce36980, 0x000001c5fe38d3a0, 0x000001c5fe38c060, 0x000001c5fe38e6e0,
0x000001c5fe38c530, 0x000001c5fe38dd40, 0x000001c5fe38d870, 0x000001c5fe38ca00,
0x000001c5fe38ebb0, 0x000001c5fe38f080, 0x000001c5fe38ced0, 0x000001c5fe38f550,
0x000001c5fe38e210, 0x000001c5fe38fa20, 0x000001c580defc80, 0x000001c580df4e50,
0x000001c580df0af0, 0x000001c580dee940, 0x000001c580deee10, 0x000001c580df1490,
0x000001c580df3640, 0x000001c580df44b0, 0x000001c580df2ca0, 0x000001c580df0620,
0x000001c580df4980, 0x000001c580dedfa0, 0x000001c580df3170, 0x000001c580df0fc0,
0x000001c580df0150, 0x000001c580df27d0, 0x000001c580df1960, 0x000001c58186cd70,
0x000001c58186f8c0, 0x000001c58186ba30, 0x000001c581869880, 0x000001c58186b090,
0x000001c581868a10, 0x000001c58186d710, 0x000001c58186bf00, 0x000001c58186abc0,
0x000001c58186c3d0, 0x000001c58186f3f0, 0x000001c58186ea50, 0x000001c581868ee0,
0x000001c58186b560, 0x000001c5818693b0, 0x000001c58186c8a0, 0x000001c58186d240,
0x000001c58186e0b0, 0x000001c581869d50, 0x000001c58186dbe0, 0x000001c58186e580,
0x000001c580def7b0, 0x000001c5fcb6dda0, 0x000001c5fcb72f70, 0x000001c5fcb6fa80,
0x000001c5fcb6e270, 0x000001c5fcb6f0e0, 0x000001c5fcb70dc0, 0x000001c5fcb73440,
0x000001c5fcb6ec10, 0x000001c5fcb74780, 0x000001c5fcb71290, 0x000001c5fcb6ff50,
0x000001c5fcb725d0, 0x000001c5fcb6f5b0, 0x000001c5fcb71760, 0x000001c5fcb74c50,
0x000001c5fcb71c30, 0x000001c5fcb70420, 0x000001c5819b9170
}

"main" #1 prio=5 os_prio=0 cpu=9765.63ms elapsed=24.56s tid=0x000001c5d8140e20 nid=0x2424 waiting on condition [0x00000065dc6ff000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.6/Native Method)
- parking to wait for <0x0000000700e1b638> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.6/LockSupport.java:341)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.6/AbstractQueuedSynchronizer.java:506)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.6/ForkJoinPool.java:3463)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.6/ForkJoinPool.java:3434)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.6/AbstractQueuedSynchronizer.java:1623)
at org.apache.pulsar.broker.PulsarService.waitUntilClosed(PulsarService.java:1276)
at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.join(PulsarBrokerStarter.java:281)
at org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:361)

Locked ownable synchronizers:
- None

"Reference Handler" #2 daemon prio=10 os_prio=2 cpu=15.63ms elapsed=24.52s tid=0x000001c5fad7a450 nid=0x4a8c waiting on condition [0x00000065dcdff000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@17.0.6/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@17.0.6/Reference.java:253)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@17.0.6/Reference.java:215)

Locked ownable synchronizers:
- None

"Finalizer" #3 daemon prio=8 os_prio=1 cpu=0.00

.......

====> DEADLOCKS DETECTED <====

"Thread-0" Id=101 in BLOCKED on lock=java.lang.String@61c0cfe1
owned by Thread-1 Id=102
at app//org.apache.pulsar.broker.PulsarService.lambda$deadlock$7(PulsarService.java:971)
- locked java.lang.String@671c0f1e
at app//org.apache.pulsar.broker.PulsarService$$Lambda$689/0x00000008012f66b8.run(Unknown Source)
at java.base@17.0.6/java.lang.Thread.run(Thread.java:833)

Locked synchronizers: count = 0

"Thread-1" Id=102 in BLOCKED on lock=java.lang.String@671c0f1e
owned by Thread-0 Id=101
at app//org.apache.pulsar.broker.PulsarService.lambda$deadlock$8(PulsarService.java:988)
- locked java.lang.String@61c0cfe1
at app//org.apache.pulsar.broker.PulsarService$$Lambda$690/0x00000008012f68d8.run(Unknown Source)
at java.base@17.0.6/java.lang.Thread.run(Thread.java:833)

Locked synchronizers: count = 0

`

What did you expect to see?

First detection of deadlock printing, then printing at the interval between parameter settings in org.apache.pulsar.broker.admin.impl.BrokersBase.java
private static final long LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED = 600000L;

What did you see instead?

Output full thread dump information every time.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
1 participant