Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgBackRest: Fix timeline divergence after PITR #820

Merged
merged 1 commit into from
Nov 28, 2024

Conversation

SDV109
Copy link
Contributor

@SDV109 SDV109 commented Nov 27, 2024

This PR is intended to be fixed during PITR, when at the end of the playbook, the replicas were on a timeline different from the master node.
image
The first solution that helped fix the error on the timeline was to restart patroni on replica nodes, which causes pg_rewind to run between replicas and the master, and the replicas received the necessary WAL files from the master node to align the timeline. But it didn't look like the right decision.

Also, when replicas were in the first timeline after recovery, errors were observed in the logs on the master node that a replication slot could not be created for the replica node:

STATEMENT:  START_REPLICATION SLOT "pgnode03" 0/76000000 TIMELINE 1
ERROR:  requested starting point 0/76000000 on timeline 1 is not in this server's history
DETAIL:  This server's history forked from timeline 1 at 0/73000498.
STATEMENT:  START_REPLICATION SLOT "pgnode02" 0/76000000 TIMELINE 1
ERROR:  requested starting point 0/76000000 on timeline 1 is not in this server's history
DETAIL:  This server's history forked from timeline 1 at 0/73000498.

The official documentation says that if target-action=shutdown is used, the recovery.signal file is not deleted, which prevents subsequent PostgreSQL launches in the cluster, since the server will wait for further recovery from the WAL repository, where the necessary files are missing, since after recovery it is necessary to make a new backup. A simple intervention with deleting the recovery.signal file before running patroni on replicas does not help solve the problem with the missing timeline.

The solution is to change the target-action for replicas from shutdown to pause, in this case the replica starts as a ready-made PostgreSQL instance and immediately connects to the master node and pulls the necessary WALs from it for the desired timeline.
image

Changing the target-action parameter from shutdown to pause
@vitabaks vitabaks changed the title Fix PITR Fix timeline divergence after PITR with shutdown target action Nov 27, 2024
@vitabaks vitabaks changed the title Fix timeline divergence after PITR with shutdown target action Fix timeline divergence after PITR Nov 27, 2024
@vitabaks vitabaks changed the title Fix timeline divergence after PITR pgBackRest: Fix timeline divergence after PITR Nov 27, 2024
@vitabaks vitabaks merged commit 90948bc into vitabaks:master Nov 28, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants