reindex process slows after some hours when Postgres vacuuming falls behind #1822

punktilious · 2020-12-14T14:27:44Z

After some time, queries against LOGICAL_RESOURCES slow down. After a couple of hours, the reindex status update on this table ends up dominating the overall processing time. It turns out this is a byproduct of doing so many updates to this table and the way Postgres handles those updates. The update leaves behind dead records which need to be cleaned up by the vacuum process. This is supposed to be run in the background but it looks like the automatic vacuum can't keep up.

After stopping the reindex and run a manual vacuum the performance recovers (from 30s per call to 10s):

VACUUM (ANALYZE,VERBOSE) fhirdata.logical_resources;

After restarting reindex, the time slowly creeps up again, so this has to be repeated.

Investigate other solutions which avoid the need to perform an update against an indexed column. One possible solution is to copy all the rows needing to be processed into a working queue table and delete each row from there as it is processed. This table probably does not even require an index, because processing can be done in row order (somewhat arbitrary, but that is OK).

The text was updated successfully, but these errors were encountered:

lmsurpre · 2021-02-22T16:25:45Z

options:

make vacuuming more aggressive
manual (code-driven vs db-driven) vacuuming
other?

need investigation as too which is best

lmsurpre · 2021-06-21T15:36:53Z

The LOGICAL_RESOURCES table will still be updated as part of reindex after #2524 but we don't expect it to cause the same contention as what we were seeing from updating the tstamp.

lmsurpre · 2021-07-12T13:23:11Z

After trying to reindex a large postgres db using the new client-driven reindex behavior, I don't think this one is fully addressed.

It ran for over a day, but after some amount of time we ran out of disk space. I don't know if it was vacuuming related or just that the space was needed for the new tables/indices in 4.9.0.

However, after increasing disk capacity, client-driven reindex repeatedly failed with error like:

2021-07-11 22:37:13.879 fhir-test-server-86b656757d-9drdx fhir-test-server SEVERE inserting parameters
org.postgresql.util.PSQLException: ERROR: multixact "members" limit exceeded
  Detail: This command would create a multixact with 2 members, but the remaining space is only enough for 1 member.
  Hint: Execute a database-wide VACUUM in database with OID 16478 with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.
  Where: SQL statement "SELECT 1 FROM ONLY "fhirdata"."parameter_names" x WHERE "parameter_name_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"

I found the following command from the postgresql forums for finding tables with the oldest xids:

SELECT oid::regclass, relminmxid, mxid_age(relminmxid) FROM pg_class WHERE relminmxid <> '0' ORDER BY mxid_age(relminmxid) DESC;

Our observation_resource_token_refs was on the top of this list and so I performed a manual vacuum of that table:
vacuum fhirdata.observation_resource_token_refs

The next morning I noticed that the reindex response times were much faster.

lmsurpre · 2021-07-15T13:27:37Z

Discussed with Robin and we think that we can further mitigate this issue by setting the properties mentioned at https://ibm.github.io/FHIR/guides/FHIRPerformanceGuide#412-tuning-auto-vacuum on the postgresql tables that see a lot of update/deletes. Specifically:

logical_resources
all search parameter tables

So basically everything other than the xx_RESOURCES and COMMON_TOKEN_VALUE tables :-)

prb112 · 2021-07-20T15:14:03Z

PR #2628

d0roppe · 2021-07-28T12:59:04Z

Ran the latest reindex of 79 million resources, and it finished without error in 39 hours. this was with the latest fixes for vacuum settings. I am closing this issue based on last run.

punktilious added the bug Something isn't working label Dec 14, 2020

punktilious changed the title ~~reindex~~ reindex process slows after some hours when Postgres vacuuming falls behind Dec 14, 2020

lmsurpre added the P2 Priority 2 - Should Have label Feb 22, 2021

lmsurpre added this to the Sprint 2021-08 milestone Jun 1, 2021

prb112 modified the milestones: Sprint 2021-08, Sprint 2021-09 Jun 29, 2021

lmsurpre mentioned this issue Jul 7, 2021

Client-driven reindex should be restartable #2574

Closed

lmsurpre assigned tbieste Jul 7, 2021

lmsurpre mentioned this issue Jul 9, 2021

Server-driven reindex should avoid unnecessary reindexing #2598

Open

lmsurpre unassigned tbieste Jul 12, 2021

lmsurpre removed this from the Sprint 2021-09 milestone Jul 12, 2021

lmsurpre added the needs-design label Jul 12, 2021

lmsurpre added this to the Sprint 2021-10 milestone Jul 12, 2021

lmsurpre added P1 Priority 1 - Must Have and removed P2 Priority 2 - Should Have labels Jul 12, 2021

This was referenced Jul 12, 2021

Support Partial Reindex #2601

Open

Drive reindex using client side checkpointing for improved throughput #2195

Closed

lmsurpre assigned prb112 Jul 15, 2021

lmsurpre removed the needs-design label Jul 15, 2021

d0roppe closed this as completed Jul 28, 2021

lmsurpre added the schema-change a schema change label Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reindex process slows after some hours when Postgres vacuuming falls behind #1822

reindex process slows after some hours when Postgres vacuuming falls behind #1822

punktilious commented Dec 14, 2020 •

edited

Loading

lmsurpre commented Feb 22, 2021

lmsurpre commented Jun 21, 2021

lmsurpre commented Jul 12, 2021

lmsurpre commented Jul 15, 2021 •

edited

Loading

prb112 commented Jul 20, 2021

d0roppe commented Jul 28, 2021 •

edited

Loading

reindex process slows after some hours when Postgres vacuuming falls behind #1822

reindex process slows after some hours when Postgres vacuuming falls behind #1822

Comments

punktilious commented Dec 14, 2020 • edited Loading

lmsurpre commented Feb 22, 2021

lmsurpre commented Jun 21, 2021

lmsurpre commented Jul 12, 2021

lmsurpre commented Jul 15, 2021 • edited Loading

prb112 commented Jul 20, 2021

d0roppe commented Jul 28, 2021 • edited Loading

punktilious commented Dec 14, 2020 •

edited

Loading

lmsurpre commented Jul 15, 2021 •

edited

Loading

d0roppe commented Jul 28, 2021 •

edited

Loading