GPKG status performance regression fix #398

rcoup · 2021-04-06T15:41:52Z

Description

Workaround for a GPKG performance regression introduced in b20eec4.

For a 29K feature dataset with 21K changes in the working copy, sno status went from ~7s to ~3m20s.

Key driver is this query (~200ms):

SELECT gpkg_sno_track.pk AS ".__track_pk", nz_mainland_powerline_centrelines_topo_150k.t50_fid, nz_mainland_powerline_centrelines_topo_150k."GEOMETRY", nz_mainland_powerline_centrelines_topo_150k.support_ty
FROM gpkg_sno_track
  LEFT OUTER JOIN nz_mainland_powerline_centrelines_topo_150k ON gpkg_sno_track.pk = nz_mainland_powerline_centrelines_topo_150k.t50_fid
WHERE gpkg_sno_track.table_name = 'nz_mainland_powerline_centrelines_topo_150k';

Becoming ~200s:

SELECT gpkg_sno_track.pk AS ".__track_pk", nz_mainland_powerline_centrelines_topo_150k.t50_fid, nz_mainland_powerline_centrelines_topo_150k."GEOMETRY", nz_mainland_powerline_centrelines_topo_150k.support_ty
FROM gpkg_sno_track
  LEFT OUTER JOIN nz_mainland_powerline_centrelines_topo_150k ON gpkg_sno_track.pk = CAST(nz_mainland_powerline_centrelines_topo_150k.t50_fid AS TEXT)
WHERE gpkg_sno_track.table_name = 'nz_mainland_powerline_centrelines_topo_150k';

I think the speedup is only possible via this method because Sqlite is basically untyped. Expectedly, removing the cast altogether fails tests on MSSQL & PostGIS, but I suspect they're impacted too. Need a better solution really so we can at least always make use of the dataset PK index. Minimum might be casting the other way (ie. tracking table → dataset pk type) with the logic there's often likely to be less changes in the tracking table than there are rows in the dataset table?

While I was profiling, disabled an extra hash verification per-object lookup.

Checklist:

Have you reviewed your own change?
~~Have you included test(s)?~~ covered by existing tests
Have you updated the changelog?

sno/working_copy/base.py

make formatting more verbose at -vv or higher

This fixes a performance regression from b20eec4, `sno status` for a 21K/29K changeset is on the order of 1000x slower with the cast in place. Likely the same issue is present in PostGIS/MSSQL, but lets start here.

Disables strict verification of object hashsums when reading objects from disk. Eliminates an additional checksum calculation on each object. https://github.com/libgit2/libgit2/search?q=GIT_OPT_ENABLE_STRICT_HASH_VERIFICATION&type=issues

rcoup requested a review from olsen232 April 6, 2021 20:42

hamishcampbell reviewed Apr 6, 2021

View reviewed changes

sno/working_copy/base.py Show resolved Hide resolved

olsen232 approved these changes Apr 6, 2021

View reviewed changes

rcoup force-pushed the rc-fix-gpkg-status-perf-regression branch from 06c0b92 to fa2c150 Compare April 7, 2021 10:56

rcoup added 4 commits April 7, 2021 12:00

logging: improve log formatting

e405268

make formatting more verbose at -vv or higher

log sqlalchemy queries when at -vvv or higher

2f23c28

gpkg: avoid cast during tracking table comparison

c4f25c5

This fixes a performance regression from b20eec4, `sno status` for a 21K/29K changeset is on the order of 1000x slower with the cast in place. Likely the same issue is present in PostGIS/MSSQL, but lets start here.

rcoup force-pushed the rc-fix-gpkg-status-perf-regression branch from fa2c150 to d1f5dbd Compare April 7, 2021 11:01

rcoup merged commit 1ba5dfb into master Apr 7, 2021

rcoup deleted the rc-fix-gpkg-status-perf-regression branch April 7, 2021 15:26

rcoup mentioned this pull request Apr 8, 2021

Basic but working implementation of MySQL working copy #399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPKG status performance regression fix #398

GPKG status performance regression fix #398

rcoup commented Apr 6, 2021 •

edited

Loading

GPKG status performance regression fix #398

GPKG status performance regression fix #398

Conversation

rcoup commented Apr 6, 2021 • edited Loading

Description

Related links:

Checklist:

rcoup commented Apr 6, 2021 •

edited

Loading