Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-15401: Standalone broker on MacOS often fail to start after workstation sleep or reboot #4155

Open
sijie opened this issue Apr 30, 2022 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Apr 30, 2022

Original Issue: apache#15401


Describe the bug
Often, but not always, if I start Pulsar standalone on my workstation and then close/reopen my laptop lid or restart the computer, subsequent attempts to start the standalone broker fail. The JVM process keeps running and emits some errors in its logs (see attachment), but never reaches a connectible state.

Repeatedly restarting the broker usually causes this condition to go away. Sometimes that doesn't work, and I have to remove all broker data files (i.e. reinstall Pulsar) to make it start.

This only seems to happen when publishes have been recorded to a persistent topic.

This does not happen when using Pulsar Standalone in Docker, or on my Linux laptop. It appears to be MacOS specific.

To Reproduce

  1. brew install apache-pulsar
  2. brew services start apache-pulsar
  3. Verify that Pulsar is connectible on localhost:6650.
  4. Publish some messages to a persistent topic such that the message data and the topic itself are not automatically deleted (i.e. at least one ledger is created and persists).
  5. Close laptop lid or otherwise induce sleep.
  6. Reopen laptop and attempt to connect to Pulsar; it will sometimes fail.
  7. brew services restart apache-pulsar and attempt to connect to Pulsar; it will sometimes fail.

Expected behavior

  1. Sleep/resume should not break Pulsar. It should either be available after resumption, or the service/JVM process should fully crash so that it reports not-running if it cannot handle connection requests after resumption.
  2. Restarting the service should result in Pulsar either being connectible or not having a JVM process. If Pulsar Standalone can't come up in a connectible state, its process shouldn't hang around (if the process stays around, service managers think it's still running and thus won't report a failure or auto-restart it).

Desktop (please complete the following information):

  • OS: MacOS monterey, intel.
  • Pulsar standalone 2.9.2 and 2.9.1 both exhibit this issue.

Additional context
brew services is likely not the problem. It doesn't do anything special, and Pulsar does sometimes work when restarted via brew services.

Attached is a copy of my logs from a broker start attempt that did not become connectible after 5min.
nostart.log

@sijie sijie added the type/bug label Apr 30, 2022
@sijie sijie added the Stale label May 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant