Skip to content

Conversation

yiida-ha
Copy link

@yiida-ha yiida-ha commented Sep 4, 2025

Problem Details

When starting a pgsql resource with rep_mode="slave", the following error occurred, causing the start operation to fail.
The error message is as follows:

Failed Resource Actions:
  * remote-site-pgsql_start_0 on dr-standby1 'error' (1): call=34, status='complete', exitreason='Can't create recovery.conf.', last-rc-change='Thu Sep  4 11:59:06 2025', queued=0ms, exec=309ms

Environment

  • PostgreSQL 17
  • Resource setting: rep_mode="slave"

Solution

In the current code, the tmp directory is not created when rep_mode="slave".
The existing code creates the tmp directory only if the result of is_replication() is true within the validate_ocf_check_level_10 function.
To resolve this issue, I modified the validate_ocf_check_level_10 function to create the tmp directory even when rep_mode="slave" is set.

Copy link

knet-jenkins bot commented Sep 4, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/1/input

heartbeat/pgsql Outdated
rc=$?
if [ $rc -eq 1 ]||[ $rc -eq 2 ]; then # PosrgreSQL 12 or later.
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
ocf_exit_reason "Can't create directory $OCF_RESKEY_tmpdir or it is not readable by $OCF_RESKEY_pgdba"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should say "writable", and should only be run during the start-action.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment.

I modified the code based on the following reasoning:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move it to the start-action, as we dont need it for regular actions (if it's needed for monitor or similar we might want to change the logic in those actions).

You'll also have to cover "promoted" or what term they are using in the latest PostgreSQL releases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment.
As requested, I've moved the tmpdir creation process to the start action.
cd8e58f
Based on my research, all write operations to tmpdir within the start action were consolidated in make_recovery_conf(), so I've grouped the processing there.
How does this approach look?

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from c86b25d to 0c3f2cf Compare September 12, 2025 02:07
Copy link

knet-jenkins bot commented Sep 12, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/2/input

Copy link

knet-jenkins bot commented Sep 17, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/3/input

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from 9829572 to cd8e58f Compare September 18, 2025 01:28
Copy link

knet-jenkins bot commented Sep 18, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/4/input

if [ $rc -eq 1 ]||[ $rc -eq 2 ]; then # PosrgreSQL 12 or later.
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
ocf_exit_reason "Can't create directory $OCF_RESKEY_tmpdir or it is not readable by $OCF_RESKEY_pgdba"
return $OCF_ERR_PERM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the outer if[] here, and and use [ "$OCF_RESKEY_rep_mode" = "slave" ] && return $OCF_ERR_PERM || return $OCF_ERR_GENERIC here instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants