Skip to content

Make cronjobs to vacuum full PuppetDB tables #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 18, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Minor Release 0.12.0

- Improve maintenance cron jobs [#12](https://github.com/npwalker/pe_databases/pull/12)
- Change from reindexing all tables to VACUUM FULL on just the smaller tables

## Z Release 0.11.2

- Fix metadata.json version
Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ By default you get the following:
- PuppetDB is backed up once a week
- The other PE databases are backed up every night
- The node_check_ins table is TRUNCATED from the pe-classifier database to keep the size down
2. A weekly reindex and vacuum analyze run on all databases
2. Maintenance cron jobs to keep your PuppetDB tables lean and fast
3. Slightly better default settings for PE PostgreSQL

## Items you may want to configure
Expand All @@ -48,7 +48,7 @@ By default the script will only hold two backups for each database. When the sc

### Disable the maintenance cron job

If you run into your maintenance cron job having DEADLOCK errors as described in the [reindexing section](#reindexing) you may want to disable it. You can do so by setting `pe_databases::maintenance::disable_maintenace: true` in your hieradata.
The maintenance cron jobs will perform a VACUUM FULL on various PuppetDB tables to keep them small and make your PuppetDB performance better. A VACUUM FULL is a blocking operation and you will see the PuppetDB command queue grow while the cron jobs run. The blocking should be short lived and the PuppetDB command queue should work itself down after, however, if for some reason you experience issues you can disable the maintenance cron jobs. You can do so by setting `pe_databases::maintenance::disable_maintenace: true` in your hieradata.

# General PostgreSQL Recommendations

Expand Down Expand Up @@ -78,19 +78,22 @@ This module provides a script for backing up the Puppet Enterprise databases and

Note: You may be able to improve the performance ( reduce time to completion ) of maintenance tasks by increasing the [maintenance_work_mem](#maintenance_work_mem) setting.

This module provides a monthly cron job that performs a reindex and then a VACUUM ANALYZE. This cron job should be monitored to make sure the reindex is not failing on a DEADLOCK error as discussed in the [reindexing](#reindexing) section.
This module provides cron jobs to VACUUM FULL various tables in the PuppetDB database
- facts tables are VACUUMed Tuesdays and Saturdays at 4:30AM
- catalogs tables are VACUUMed Sundays and Thursdays at 4:30AM
- other tables are VACUUMed on the 20th of the Month at 5:30AM

### Vacuuming

Generally speaking PostgreSQL keeps itself in good shape with a process called [auto vacuuming](https://www.postgresql.org/docs/9.4/static/runtime-config-autovacuum.html). This is on by default and tuned for Puppet Enterprise out of the box.

There are a few things that autovacuum does not touch though and it is prudent to run an infrequent VACUUM ANALYZE to prevent any issues that could manifest.
Note that there is a difference between VACUUM and VACUUM FULL. VACUUM FULL rewrites a table on disk while VACUUM simply marks deleted row so the space that row occupied can be used for new data.

Please note that you should never need to run VACUUM FULL. VACUUM FULL rewrites the entire database and is a blocking operation that will take down your Puppet Enterprise installation while it runs. It is rarely necessary and causes a lot of I/O and downtime for Puppet Enterprise. If you think you need to run VACUUM FULL then be prepared to have your Puppet Enterprise installation down for a while.
VACUUM FULL is generally not necessary and if run too frequently can cause excessive disk I/O. However, in the case of PuppetDB the way we constantly receive and update data causes bloat in the database and it is beneficial to VACUUM FULL the facts and catalogs tables every few days. We, however, do not recommend a VACUUM FULL on the reports or resource_events tables because they are too big and may cause extended downtime if VACUUM FULL is performed on them.

### Reindexing

Reindexing is also a prudent exercise. It may not be necessary very often but doing every month or so can definitely prevent performance issues in the long run.
Reindexing is also a prudent exercise. It may not be necessary very often but doing every month or so can definitely prevent performance issues in the long run. In the scope of what this module provides, a VACUUM FULL will rewrite the table and all of its indexes so tables are reindexed during the VACUUM FULL maintenance cron jobs. That only leaves the reports and resource_events tables not getting reindexed. Unfortunately, the most common place to get a DEADLOCK error mentioned below is when reindexing the reports table.

Reindexing is a blocking operation. While an index is rebuilt the data in the table cannot change and operations have to wait for the index rebuild to complete. If you don’t have a large installation or you have a lot of memory / a fast disk you may be able to complete a reindex while your Puppet Enterprise installation is up. PuppetDB will backup commands in its command queue and the console may throw some errors about not being able to load data. After the reindex is complete the PuppetDB command queue will work through and the console UI will work as expected.

Expand Down
35 changes: 35 additions & 0 deletions files/vacuum_full_tables.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
if [ "$1" = "" ]; then
echo "Usage: $0 <(facts, catalogs, or other) tables to VACUUM FULL> "
exit
fi

if [ "$2" = "" ]; then
SLEEP=300
else
SLEEP=$2
fi

if [ $1 = 'facts' ]; then
WHERE="'facts' ,'factsets', 'fact_paths', 'fact_values'"
elif [ $1 = 'catalogs' ]; then
WHERE="'catalogs' ,'catalog_resources', 'edges', 'certnames'"
elif [ $1 = 'other' ]; then
WHERE="'producers' ,'resource_params', 'resource_params_cache'"
else
echo "Must pass facts, catalogs, or other as first argument"
exit 1
fi

SQL="SELECT t.relname::varchar AS table_name
FROM pg_class t
JOIN pg_namespace n
ON n.oid = t.relnamespace
WHERE t.relkind = 'r'
AND t.relname IN ( $WHERE )"

for TABLE in $(su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -c \"$SQL\" --tuples-only")
do
#echo $TABLE
su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/vacuumdb -d pe-puppetdb -t $TABLE --full"
sleep $SLEEP
done
6 changes: 1 addition & 5 deletions manifests/backup.pp
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,11 @@
],
String $psql_version = $pe_databases::psql_version,
String $backup_directory = "/opt/puppetlabs/server/data/postgresql/${psql_version}/backups",
String $backup_script_path = '/opt/puppetlabs/pe_databases/scripts/puppet_enterprise_database_backup.sh',
String $backup_script_path = "${pe_databases::scripts_dir}/puppet_enterprise_database_backup.sh",
String $backup_logging_directory = '/var/log/puppetlabs/pe_databases_backup',
Integer $retention_policy = 2,
) {

file { ['/opt/puppetlabs/pe_databases', '/opt/puppetlabs/pe_databases/scripts', $backup_directory ] :
ensure => directory,
}

file { $backup_logging_directory :
ensure => 'directory',
owner => 'pe-postgres',
Expand Down
6 changes: 6 additions & 0 deletions manifests/init.pp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Boolean $manage_database_backups = true,
Boolean $manage_database_maintenance = true,
Boolean $manage_postgresql_settings = true,
String $install_dir = '/opt/puppetlabs/pe_databases',
String $scripts_dir = "${install_dir}/scripts"
) {

if ( versioncmp('2017.3.0', $facts['pe_server_version']) <= 0 ) {
Expand All @@ -10,6 +12,10 @@
$psql_version = '9.4'
}

file { [$install_dir, $scripts_dir] :
ensure => directory,
}

if $manage_database_maintenance {
include pe_databases::maintenance
}
Expand Down
63 changes: 57 additions & 6 deletions manifests/maintenance.pp
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
class pe_databases::maintenance (
Boolean $disable_maintenace = false,
Integer $maint_cron_weekday = 6,
Integer $maint_cron_hour = 1,
Integer $maint_cron_minute = 0,
String $logging_directory = '/var/log/puppetlabs/pe_databases_cron'
Optional[Integer] $maint_cron_weekday = undef, #DEPRECATED
Optional[Integer] $maint_cron_hour = undef, #DEPRECATED
Optional[Integer] $maint_cron_minute = undef, #DEPRECATED
String $logging_directory = '/var/log/puppetlabs/pe_databases_cron',
String $script_directory = $pe_databases::scripts_dir,
){

$ensure_cron = $disable_maintenace ? {
Expand All @@ -15,14 +16,64 @@
ensure => directory,
}

$vacuum_script_path = "${script_directory}/vacuum_full_tables.sh"

file { $vacuum_script_path:
ensure => file,
source => 'puppet:///modules/pe_databases/vacuum_full_tables.sh',
owner => 'pe-postgres',
group => 'pe-postgres',
mode => '744',
}

cron { 'VACUUM FULL facts tables' :
ensure => $ensure_cron,
user => 'root',
weekday => [2,6],
hour => 4,
minute => 30,
command => "${vacuum_script_path} facts",
require => File[$logging_directory, $script_directory],
}

cron { 'VACUUM FULL catalogs tables' :
ensure => $ensure_cron,
user => 'root',
weekday => [0,4],
hour => 4,
minute => 30,
command => "${vacuum_script_path} catalogs",
require => File[$logging_directory, $script_directory],
}

cron { 'VACUUM FULL other tables' :
ensure => $ensure_cron,
user => 'root',
monthday => 20,
hour => 5,
minute => 30,
command => "${vacuum_script_path} other",
require => File[$logging_directory, $script_directory],
}

#Remove old versions of maintenance cron jobs
cron { 'Maintain PE databases' :
ensure => $ensure_cron,
ensure => absent,
user => 'root',
weekday => $maint_cron_weekday,
hour => $maint_cron_hour,
minute => $maint_cron_minute,
command => "su - pe-postgres -s /bin/bash -c '/opt/puppetlabs/server/bin/reindexdb --all; /opt/puppetlabs/server/bin/vacuumdb --analyze --verbose --all' > ${logging_directory}/output.log 2> ${logging_directory}/output_error.log",
require => File[$logging_directory],
require => File[$logging_directory, $script_directory],
}

if empty($maint_cron_weekday) == false {
warning('pe_databases::maintenance::maint_cron_weekday is deprecated and will be removed in a future release')
}
if empty($maint_cron_hour) == false {
warning('pe_databases::maintenance::maint_cron_hour is deprecated and will be removed in a future release')
}
if empty($maint_cron_minute) == false {
warning('pe_databases::maintenance::maint_cron_minute is deprecated and will be removed in a future release')
}
}
2 changes: 1 addition & 1 deletion metadata.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "npwalker/pe_databases",
"version": "0.11.2",
"version": "0.12.0",
"author": "npwalker",
"summary": "A Puppet Module for Backing Up / Maintaining / Tuning Your Puppet Enterprise Databases",
"license": "Apache-2.0",
Expand Down