Skip to content

PostgresqlAndCollation

Dmitry Litvintsev edited this page Aug 5, 2024 · 13 revisions

Postgresql and Collation version

With drop of support for RHEL7 we had to upgrade our DB hosts to Alma9. Here I share notes on issues we (Fermilab) have run into.

Fermilab runs 3 separate dCache instances (2 CMS and 1 "public" - multu-VO instance. Public being the largest). Prior to June 30, 2024 all databases were running on SL7 host, postgresql version 11. The database hosts had to be upgraded to Alma9 with simultaneous upgrade of postgresql to version 15. In some cases switching to new hardware.

Upgrade procedure used on Public.

The current host running chimera DB is slated to be replaced with new hardware. To minimize downtime the following plan was executed:

  • short downtime to run pg_upgrade from 11 to 15
  • setup stream replication to new Alma9 host
  • one day just promote replica on new Alma9 host to be the master

This worked seemingly well, but on attempt to connect to replica db the following warning was seen:

# psql -U postgres chimera
WARNING:  database "chimera" has a collation version mismatch
DETAIL:  The database was created using collation version 2.17, but the operating system provides version 2.34.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE chimera REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
psql (15.7)
Type "help" for help.

chimera=# 

The replication seem to be working (as evidenced by comparing master and replica and observing data catching up). The issue is due to change in glibc collation between EL7 and EL8 (and possibly 9). The problem is summarized here:

https://wiki.postgresql.org/wiki/Locale_data_changes

Apparently it affects indexed string columns.

The same issue will be encountered if you just upgrade your DB host from SL7 to EL{8,9}.

So, there are three ways of handling postgresql when upgrading OS:

  • do dump before OS upgrade / restore after OS upgrade (may take a long time).
  • REINDEX the indexes after running OS upgrade. This is faster than dump/restore.
  • use logical replication

We used the following script to reindex indexes:

#!/bin/bash

QUERY="SELECT DISTINCT \
       indrelid::regclass::text, \
       indexrelid::regclass::text, \
       collname, pg_get_indexdef(indexrelid) \
       FROM (SELECT indexrelid, indrelid, indcollation[i] coll \
       FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s \
       JOIN pg_collation c ON coll=c.oid \
       WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX')"

psql -t -F ' ' -A -U postgres chimera -c "${QUERY}" 2>/dev/null | while read table index collname ref
do
    t0=`date +"%s"`
    psql -t -F ' ' -A -U postgres chimera -c "REINDEX INDEX ${index}" 2>/dev/null
    t1=`date +"%s"`
    dt=$((t1-t0))
    echo "${db} ${index} Took ${dt} seconds"
    psql -t -F ' ' -A -U postgres chimera -c "ALTER DATABASE $db REFRESH COLLATION VERSION" 
done

The indexes on chimera DB that need to be reindexed:

chimera=#  SELECT DISTINCT indexrelid::regclass::text as name, collname, pg_get_indexdef(indexrelid) 
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s 
  JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');
                 name                 | collname |                                                   pg_get_indexdef                                                    
--------------------------------------+----------+----------------------------------------------------------------------------------------------------------------------
 i_locationinfo_trash_itype_ilocation | default  | CREATE INDEX i_locationinfo_trash_itype_ilocation ON public.t_locationinfo_trash USING btree (itype, ilocation)
 i_xattr_pkey                         | default  | CREATE UNIQUE INDEX i_xattr_pkey ON public.t_xattr USING btree (inumber, ikey)
 pk_databasechangelog                 | default  | CREATE UNIQUE INDEX pk_databasechangelog ON public.databasechangelog USING btree (id, author, filename)
 t_dirs_pkey                          | default  | CREATE UNIQUE INDEX t_dirs_pkey ON public.t_dirs USING btree (iparent, iname)
 t_inodes_pkey                        | default  | CREATE UNIQUE INDEX t_inodes_pkey ON public.t_inodes USING btree (ipnfsid)
 t_labels_labelname_key               | default  | CREATE UNIQUE INDEX t_labels_labelname_key ON public.t_labels USING btree (labelname)
 t_locationinfo_pkey                  | default  | CREATE UNIQUE INDEX t_locationinfo_pkey ON public.t_locationinfo USING btree (inumber, itype, ilocation)
 t_locationinfo_trash_pkey            | default  | CREATE UNIQUE INDEX t_locationinfo_trash_pkey ON public.t_locationinfo_trash USING btree (ipnfsid, itype, ilocation)
 t_tags_pkey                          | default  | CREATE UNIQUE INDEX t_tags_pkey ON public.t_tags USING btree (inumber, itagname)
(9 rows)

These indexes can be reindexed in parallel. In our case (1.2B entries in t_inodes table running REINDEX in parallel took about 2 hours.