Skip to content

2/3 PG units stuck in waiting/idle state, not moving to active/idle #668

@ethanmye-rs

Description

@ethanmye-rs

Steps to reproduce

a. I do not have a firm reproducer, but I ran into this issue upgrading from rev 429 to rev 468 in a charmed landscape deployment. I originally encountered the issue in rev 429, and based on a prior bug, expected refreshing to 468 would fix the issue. However, I still see my pg units not starting, in a "awaiting for member to start" state.
b. I did not encounter this issue on another cluster in an identical environment, so it seems somewhat random. The machines in the juju model are manual machines in Azure.

  1. Essentially, 2/3 postgres units stay stuck in a "awaiting for member to start" state. They cycle through different waiting and executing states, but the PG units never actually start.

Expected behavior

I expect the other 2 units to start and enter a active/idle state. They have been in this state for >48 hours.

Actual behavior

image

see logs below, but the machines cycle through waiting/executing states, but never enter active/idle as expected.

Versions

Operating system: 22.04.4

Juju CLI: 3.5.4

Juju agent: 3.5.4

Charm revision: 468

LXD: n/a

Log output

juju debug log: https://paste.ubuntu.com/p/FzXnjMpNYz/
snap logs from one unit failing to start: https://paste.ubuntu.com/p/St8WZNn4GT/ (restart at the end of the log file)
snap logs from other unit failing to start: https://paste.ubuntu.com/p/BH3RXfZrTW/
snap logs from healthy unit: https://paste.ubuntu.com/p/b6bgSVZKYm/
pg snap services config: https://paste.ubuntu.com/p/xJJq6ktXm9/

Happy to provide more logs, details or access to the environment. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions