Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: could not open file "/var/lib/postgresql/13/main/postgresql.conf": No such file or directory #216

Closed
katayama2224 opened this issue Jan 20, 2022 · 20 comments

Comments

@katayama2224
Copy link

katayama2224 commented Jan 20, 2022

<Label: Question>
On Ubuntu 18.04.6, pg_rman 1.3.8 worked well on PostgreSQL 11.5. But in case of pg_rman 1.3.14 and PostgreSQL 13.5 on ubuntu 18.04.6, the following error occurs after "pg_rman restore" is executed, Is pg_rman (above 1.3.12) supported on Ubuntu 18.04 ?
I use "pg_rman-1.3.14-pg13.tar.gz".

Log of "$ pg_rman restore --recovery-target-time '2022-01-19 11:02:16' --debug"
..
INFO: backup "2022-01-19 11:02:06" is valid
INFO: restoring WAL files from backup "2022-01-19 11:02:06"
INFO: restoring online WAL files and server log files
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists
DEBUG: make temporary file "/var/lib/postgresql/13/main/postgresql.conf.pg_rman.tmp"
ERROR: could not open file "/var/lib/postgresql/13/main/postgresql.conf": No such file or directory

Actually "/var/lib/postgresql/13/main/postgresql.conf" doesn't exist, but
"/var/lib/postgresql/13/main/pg_rman_recovery.conf" exists, and
"/etc/postgresql/13/main/postgresql.conf" exits and it contains as follows;

data_directory = '/var/lib/postgresql/13/main'
archive_mode = on
archive_command = 'cp %p /mnt/backup/arclog/%f'
restore_command = 'cp /mnt/backup/arclog/%f %p'

Will "/var/lib/postgresql/13/main/postgresql.conf" be generated by pg_rman automatically ?
And will "include = 'pg_rman_recevery.conf' " directive be added into $PGDATA/postgresql.conf by pg_rman ?

I want to know the reason of error and correct operation for pg_rman 1.3.14. Since I'm confusing the behavior for PostgreSQL
12 above and for pg_rman 1.3.12 above.
I miss something ?

mikecaat added a commit to mikecaat/pg_rman that referenced this issue Jan 22, 2022
When restoring, pg_rman removes the 'include'
directive which it previously added in the
restored postgrsql.conf.

But, the logic didn't consider the restored
data doesn't have postgresql.conf. It happens
when a user manages it in a directory different
from the data directory using postgresql's
guc parameter `data_directory`.
mikecaat added a commit to mikecaat/pg_rman that referenced this issue Jan 22, 2022
When restoring, pg_rman removes the 'include'
directive which it previously added in the
restored postgrsql.conf.

But, the logic didn't consider the restored
data doesn't have postgresql.conf. It happens
when a user manages it in a directory different
from the data directory using postgresql's
guc parameter `data_directory`.
@mikecaat
Copy link
Contributor

Hi, thanks for reporting the issue and sorry for replying.

pg_rman (above 1.3.12) supported on Ubuntu 18.04 ?

No, we officially support that it works on RHEL.
But the issue seems to happen even when using RHEL.

Will "/var/lib/postgresql/13/main/postgresql.conf" be generated by pg_rman automatically ?

No, as I said below. It doesn't assume the data directory doesn't have postgresql.conf.

And will "include = 'pg_rman_recevery.conf' " directive be added into $PGDATA/postgresql.conf by pg_rman ?

Yes. To be precise, it adds the parameter into postgresql.conf in the restored data directory. So, if postgresql.conf is managed in a directory different from the data directory, it's not added by pg_rman.

The error occurred when the restored data directory doesn't have the postgresql.conf because it doesn't assume the postgresql.conf is managed in a directory different from the data directory using the GUC data_directory parameter.

I made a patch to solve the issue. But, there is one limitation. --recovery-target-time doesn't work because postgresql.conf is not managed in the restored data directory and pg_rman adds recovery-related parameters into postgresql.conf only in the directory.

In the first place, I think it's better to manage the configuration files in the data directory without using the data_directory parameter. You can also back up the configuration files.
What do you think?

@katayama2224
Copy link
Author

katayama2224 commented Jan 22, 2022

Thank you for your answer.

In the first place, I think it's better to manage the configuration files in the data directory without using the data_directory parameter. What do you think?

I tried the followings, then error was disappeared.

  1. Execute "pg_rman init" by postgres user.
    $export BACKUP_PATH=/mnt/backup
    $export PGDATA=/var/lib/postgresql/13/main
    $pg_rman init -B /mnt/backup -A /mnt/backup/arclog -D /var/lib/postgresql/13/main

  2. cp /etc/postgresql/13/main/postgresql.conf /var/lib/postgresql/13/main/postgresql.conf by manually.
    i.e. /etc/postgresql/13/main/postgresql.conf manages PostgreSQL related configuration, and
    /var/lib/postgresql/13/main/postgresql.conf manages pg_rman related configuration.

  3. Execute full backup and restore by pg_rman (same as before)

I will try to do further test, but the first problem seems to be cleared. So please go ahead.

I made a patch to solve the issue. But, there is one limitation. --recovery-target-time doesn't work

I experienced the similar limitation. It seems that --recovery-target-time may not work during our test on pg_rman 1.3.14.
(It is under investigation)

I appreciate if the problems will be solved, and pg_rman works on ubuntu 18.04 LTS and 20.04 LTS.
If you and other people are agree, I want to request for enhancement to support ubuntu. Since there are many users who shift from CentOS to ubuntu.

@mikecaat
Copy link
Contributor

mikecaat commented Jan 24, 2022

Thanks for trying.

I experienced the similar limitation. It seems that --recovery-target-time may not work during our test on pg_rman 1.3.14.
(It is under investigation)

The reason is that pg_rman can't know the path of your real postgresql.conf (/etc/postgresql/13/main/postgresql.conf) and just add the recovery-related parameters into the dummy postgresql.conf(/var/lib/postgresql/13/main/postgresql.conf). Since postgresql doesn't read the dummy configuration file, the recovery-related parameters don't work.

There are two workaround ways now.

  1. To add recovery-related parameters to the real postgrsql.conf manually
    pg_rman is a just wrapper so that you can manage recovery target time if you add the recovery_target_time into the real path manually before starting restored database cluster.

Ref. 19.5.5. Recovery Target
https://www.postgresql.org/docs/13/runtime-config-wal.html

  1. To create symbolic links to real path.
    Before taking a backup, you change your procedure 2 to the following. If so, you can also backup the configuration files and pg_rman can add recovery-related parameters into the real path though it seems to be hassle ways and there is no meaning to use data_directory parameter...
(before)
cp /etc/postgresql/13/main/postgresql.conf /var/lib/postgresql/13/main/postgresql.conf

(after)
mv /etc/postgresql/13/main/postgresql.conf /var/lib/postgresql/13/main/postgresql.conf
ln -s /var/lib/postgresql/13/main/postgresql.conf  /etc/postgresql/13/main/postgresql.conf
ln -s /var/lib/postgresql/13/main/pg_rman_recovery.conf /etc/postgresql/13/main/pg_rman_recovery.conf  # the file pg_rman will create

I appreciate if the problems will be solved, and pg_rman works on ubuntu 18.04 LTS and 20.04 LTS.
If you and other people are agree, I want to request for enhancement to support ubuntu. Since there are many users who shift from CentOS to ubuntu.

OK. I want to know other people's comment.
(Since we don't have much development resource, I'm sorry if we can't realize your desired feature. )

At least, it's reasonable to avoid the error happened when the postgresql.conf doesn't exists although the limitation still exists.

@mikecaat
Copy link
Contributor

Sorry, I updated the above comments.

@katayama2224
Copy link
Author

katayama2224 commented Jan 26, 2022

Thank you for suggesting two workaround ways. I try them and expect you to create patch.

During our test I want to confirm the curious phenomenon.
I want to know if it is the specification, bug or miss operation.

  1. Before backup, symbolic link is executed.
mv /etc/postgresql/13/main/postgresql.conf /var/lib/postgresql/13/main/postgresql.conf
ln -s /var/lib/postgresql/13/main/postgresql.conf  /etc/postgresql/13/main/postgresql.conf
  1. Get Full backup (time1)
    In this stage 'recovery_target_time' is unknown, so it is not specified
    pg_rman backup --port=5432 --username=postgres --dbname=test --compress-data --backup-mode=full --progress --debug
  2. Delete a part of table from database 'test' manually (time2)
  3. Stop PostgreSQL without getting incremental backup
  4. pg_rman show is executed, in order to know recovery_target_time, and (time1)
    recovery_target_time = '2022-01-25 14:56:58'
    is added in 'postgresql.conf' manually.
  5. Restore backup
    pg_rman restore --debug --progress
  6. Symbolic link for 'pg_rman_recovery.conf'
    ln -s /var/lib/postgresql/13/main/pg_rman_recovery.conf /etc/postgresql/13/main/pg_rman_recovery.conf
  7. Start PostgreSQL

Does the restored database recover with (time1) stage ? Or (time2) stage (i.e. a part of table is deleted) ?
I expected (time1) stage, but the result was (time2) stage.

The following is a part of log of 'pg_rman restore --debug'.

pg_rman restore --debug --progress
DEBUG: the current timeline ID of database cluster is 1
DEBUG: the timeline ID of latest full backup is 1
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
DEBUG: the calculated branch history is as below;
DEBUG: stage 1: timeline ID = 1
INFO: searching latest full backup which can be used as restore start point
DEBUG: backup "2022-01-25 14:56:58" has the timeline ID 1
INFO: found the full backup can be used as base in recovery: "2022-01-25 14:56:58"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2022-01-25 14:56:58" backup and archive log files by SIZE
DEBUG: checking database files
DEBUG: checking archive WAL files
INFO: backup "2022-01-25 14:56:58" is valid
INFO: restoring database files from the full mode backup "2022-01-25 14:56:58"
Processed 1625 of 1625 files, skipped 26
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
DEBUG: backup "2022-01-25 14:56:58" has the timeline ID 1
DEBUG: checking database files
DEBUG: checking archive WAL files
INFO: backup "2022-01-25 14:56:58" is valid
INFO: restoring WAL files from backup "2022-01-25 14:56:58"
Processed 4 of 4 files, skipped 0
INFO: restoring online WAL files and server log files
Processed 6 of 6 files, skipped 0

I found that after full backup (time1) is restored, online WAL files are also restored in last two lines.
Is this the reason why recovery result is (time2) stage ?

INFO: restoring online WAL files and server log files
Processed 6 of 6 files, skipped 0

Is this a specification, bug or miss operation ?

@katayama2224
Copy link
Author

katayama2224 commented Jan 26, 2022

Sorry, I updated above comment. (modified process 1 - 8)
The generated 'pg_rman_recovery.conf' is as followes.

# added by pg_rman 1.3.14
restore_command = 'cp /mnt/vdb/backup/wal/%f %p'
recovery_target_timeline = '1'

If the --recovery_target_time option is not specified in pg_rman restore command, will it be recovered with the latest state ? Or I may do some miss operations or missed operation order ?

I will investigate above further.

@mikecaat
Copy link
Contributor

mikecaat commented Jan 26, 2022

Thanks for testing.

At first, you need to use one of two workaround ways.

I expected (time1) stage, but the result was (time2) stage.

I think the reason is that you performed two workaround ways at the same time though I didn't test in my environments.
pg_rman overwrited the value you added manually at 5 step since it also restored postgresql.conf.

If the --recovery_target_time option is not specified in pg_rman restore command, will it be recovered with the latest state ?

Yes. Because pg_rman doesn't specify reocvery_target_time in postgresql.conf if you execute without the --
recovery_target_time option, postgresql will just restore the latest point.

Could you try again? Please test only one workaround way.

  1. To add recovery-related parameters to the real postgrsql.conf manually
  2. To create symbolic links to real path.

@katayama2224
Copy link
Author

katayama2224 commented Jan 26, 2022

Hello, I tried test again using workaround 2 (symbolic links).
I found the miss operation (9 and 10), but it is resolved. But the restored state is still (time2).
I want to know the reason.

  1. Before backup, symbolic link is executed.
mv /etc/postgresql/13/main/postgresql.conf /var/lib/postgresql/13/main/postgresql.conf
ln -s /var/lib/postgresql/13/main/postgresql.conf  /etc/postgresql/13/main/postgresql.conf
  1. Get Full backup (time1)
    In this stage 'recovery_target_time' is unknown, so it is not specified
    pg_rman backup --port=5432 --username=postgres --dbname=test --compress-data --backup-mode=full --progress --debug
  2. Delete a part of table from database 'test' manually (time2)
  3. Stop PostgreSQL without getting incremental backup
  4. pg_rman show 2022-01-26 14:09:42 is executed, in order to know recovery_target_time,
    I get RECOVERY_TIME='2022-01-26 14:09:45'
  5. Restore backup
    pg_rman restore --recovery-target-time='2022-01-26 14:09:45' --debug --progress
  6. Symbolic link for 'pg_rman_recovery.conf'
    ln -s /var/lib/postgresql/13/main/pg_rman_recovery.conf /etc/postgresql/13/main/pg_rman_recovery.conf
  7. Confirm the generated files
$cat backup_label
START WAL LOCATION: 0/3000028 (file 000000010000000000000003)
CHECKPOINT LOCATION: 0/3000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2022-01-26 05:09:42 UTC
LABEL: 2022-01-26 14:09:42 with pg_rman
START TIMELINE: 1

pg_rman show 2022-01-26 14:09:42
...
# result
...
START_TIME='2022-01-26 14:09:42'
END_TIME='2022-01-26 14:09:45'
RECOVERY_XID=577
RECOVERY_TIME='2022-01-26 14:09:45'
...
STATUS=OK

$ cat pg_rman_recovery.conf
# added by pg_rman 1.3.14
restore_command = 'cp /mnt/vdb/backup/wal/%f %p'
recovery_target_time = '2022-01-26 14:09:45'
recovery_target_timeline = '1'

$ cat recovery.signal
# recovery.signal generated by pg_rman 1.3.14
  1. Start PostgreSQL, but it can not start.
  2. Comment out the 'restore_command' in 'postgresql.conf' file.
    (because it is duplicated in 'pg_rman_recovery.conf' which is added in include directive by pg_rman)
    #restore_command = 'cp /mnt/vdb/backup/wal/%f %p'
  3. Start PostgreSQL again, it can start.
  4. I cheked the database status, but the restored state is still (time2).

A part of log of 'pg_rman restore' command

pg_rman restore --recovery-target-time='2022-01-26 14:09:45' --debug --progress
DEBUG: the current timeline ID of database cluster is 1
DEBUG: backup "2022-01-26 14:09:42" satisfies the condition of recovery target time
DETAIL: the recovery target time is "2022-01-26 14:09:45", the recovery time of the backup is "2022-01-26 14:09:45"
DEBUG: the timeline ID of latest full backup is 1
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of current database cluster as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
DEBUG: the calculated branch history is as below;
DEBUG: stage 1: timeline ID = 1
INFO: searching latest full backup which can be used as restore start point
DEBUG: backup "2022-01-26 14:09:42" has the timeline ID 1
DEBUG: backup "2022-01-26 14:09:42" satisfies the condition of recovery target time
DETAIL: the recovery target time is "2022-01-26 14:09:45", the recovery time of the backup is "2022-01-26 14:09:45"
INFO: found the full backup can be used as base in recovery: "2022-01-26 14:09:42"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2022-01-26 14:09:42" backup and archive log files by SIZE
DEBUG: checking database files
DEBUG: checking archive WAL files
INFO: backup "2022-01-26 14:09:42" is valid
INFO: restoring database files from the full mode backup "2022-01-26 14:09:42"
Processed 1624 of 1624 files, skipped 26
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
DEBUG: backup "2022-01-26 14:09:42" has the timeline ID 1
DEBUG: checking database files
DEBUG: checking archive WAL files
INFO: backup "2022-01-26 14:09:42" is valid
INFO: restoring WAL files from backup "2022-01-26 14:09:42"
Processed 4 of 4 files, skipped 0
INFO: restoring online WAL files and server log files
Processed 5 of 5 files, skipped 0
INFO: create pg_rman_recovery.conf for recovery-related parameters.
INFO: remove an 'include' directive added by pg_rman in postgresql.conf if exists

@mikecaat
Copy link
Contributor

Uhh.. I'm might missing something. Sorry.
I will test in my environment again.

If you have postgresql's error log when starting up, please let me know.

@mikecaat
Copy link
Contributor

Though I may misunderstood something, pg_rman seems to work well even if using data_directory parameter in my environment. I attach the test script and the execution log (test_scripts.zip)

Could you share the postgresql.conf, the error log and test scripts?

@tatsuo-ishii
Copy link

OK. I want to know other people's comment.

It would be nice if pg_rman officially supports Ubuntu. We have been pushing pg_rman to our customers since it's a great tool. If needed, we can help in testing it.

@katayama2224
Copy link
Author

katayama2224 commented Jan 27, 2022

OK. I want to know other people's comment.

It would be nice if pg_rman officially supports Ubuntu. We have been pushing pg_rman to our customers since it's a great tool. If needed, we can help in testing it.

Ishii-san,
Thank you for your comment. It is encouraging for us and ubuntu users.

@katayama2224
Copy link
Author

katayama2224 commented Jan 27, 2022

Thank you for sharing the test script.

I checked it and noticed our miss operation.
I forgot to remove the data on data_directory before restoring.

<line 49 in test.sh>

# restore to point (1)
pg_ctl stop -D data.conf
rm -r data

Between our process 5 and 6, I add the following command, and I confirmed that recovery was success, i.e (time1) stage was recovered.

5.a Remove data on data_directory
rm -r /var/lib/postgresql/13/main

Attached is our postgresql.conf and log file.
testconf_log.zip

I'm sorry that time zone of log file is UTC, and it makes you confuse, but it will not be necessary any more.

Thank you very much for your help.

@mikecaat
Copy link
Contributor

mikecaat commented Jan 27, 2022

Hi,

Thanks for sharing your test results. I understood the workaround works for your environments.

OK. I want to know other people's comment.

It would be nice if pg_rman officially supports Ubuntu. We have been pushing pg_rman to our customers since it's a great tool. If needed, we can help in testing it.

Ishii-san, Thank you for your comment. It is encouraging for us and ubuntu users.

OK. Out of curiosity, now there are other backup tools for postgresql. Is there any reason that you use pg_rman or recommend it to your customer?

BTW, I made a PoC patch. If you have time, please test with your environments. Please build the branch( #217 ) and add --pgconf-path option when you restoring. You don't need to do the workarounds any more.

ex. pg_rman restore --recovery-target-time="${recovery_date}" --pgconf-path=/etc/postgresql/13/main

And if you have any comments or suggestions for the patch, please let me know. Welcome your comments.

@katayama2224
Copy link
Author

Thank you very much to create a PoC patch. I will test it with our environment.

Reason to recommend pg_rman to customers

  • Excellent tool for PITR usage.
  • Multi generation of backup management is possible.
  • Archive log and server log are also supported for backup.
  • Incremental backup is possible.
  • It has been supported from PostgreSQL 8.2 through PostgreSQL 14 (latest version).

@tatsuo-ishii
Copy link

OK. Out of curiosity, now there are other backup tools for postgresql. Is there any reason that you use pg_rman or recommend it to your customer?

Most of our customers are Japanese. Comparing with other tools, there's more information in Japanese on it. Also pg_rman is easier to configure for small systems.

@mikecaat
Copy link
Contributor

Thanks for sharing the reason why you use pg_rman. I agree what you said.

@katayama2224
Copy link
Author

katayama2224 commented Jan 27, 2022

I tested PoC patch without workaround, and I add '--pgconf-path' option when restoring. It works well as far as I tested.
The attached is the postgresql.conf and our log with record of process (comment is added).
testlog2.zip

In postgresql.conf, I tested both mode (with comment and without comment)
Both mode worked.
<line 251 of postgresql.conf>
<test 1> restore_command = 'cp /mnt/vdb/backup/wal/%f %p'
<test 2> #restore_command = 'cp /mnt/vdb/backup/wal/%f %p'

There is a warning. Is this no problem ?

postgres@miadmin:~/13$ pg_rman restore --recovery-target-time='2022-01-27 15:30:27' --pgconf-path=/etc/postgresql/13/main --progress
WARNING: pg_controldata file "/var/lib/postgresql/13/main/global/pg_control" does not exist

@mikecaat
Copy link
Contributor

mikecaat commented Jan 27, 2022

Thanks for quick testing!

There is a warning. Is this no problem ?

Yes. This is the expected behavior if you remove the data directory.

ref: https://ossc-db.github.io/pg_rman/index.html

If --recovery-target-timeline is not specified, the last checkpoint’s TimeLineID in control file ($PGDATA/global/pg_control) will be a restore target. If pg_control is not present, TimeLineID in the full backup used by the restore will be a restore target.

@mikecaat
Copy link
Contributor

merged
#217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants