Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPKG status performance regression fix #398

Merged
merged 4 commits into from
Apr 7, 2021

Conversation

rcoup
Copy link
Member

@rcoup rcoup commented Apr 6, 2021

Description

Workaround for a GPKG performance regression introduced in b20eec4.

For a 29K feature dataset with 21K changes in the working copy, sno status went from ~7s to ~3m20s.

Key driver is this query (~200ms):

SELECT gpkg_sno_track.pk AS ".__track_pk", nz_mainland_powerline_centrelines_topo_150k.t50_fid, nz_mainland_powerline_centrelines_topo_150k."GEOMETRY", nz_mainland_powerline_centrelines_topo_150k.support_ty
FROM gpkg_sno_track
  LEFT OUTER JOIN nz_mainland_powerline_centrelines_topo_150k ON gpkg_sno_track.pk = nz_mainland_powerline_centrelines_topo_150k.t50_fid
WHERE gpkg_sno_track.table_name = 'nz_mainland_powerline_centrelines_topo_150k';

Becoming ~200s:

SELECT gpkg_sno_track.pk AS ".__track_pk", nz_mainland_powerline_centrelines_topo_150k.t50_fid, nz_mainland_powerline_centrelines_topo_150k."GEOMETRY", nz_mainland_powerline_centrelines_topo_150k.support_ty
FROM gpkg_sno_track
  LEFT OUTER JOIN nz_mainland_powerline_centrelines_topo_150k ON gpkg_sno_track.pk = CAST(nz_mainland_powerline_centrelines_topo_150k.t50_fid AS TEXT)
WHERE gpkg_sno_track.table_name = 'nz_mainland_powerline_centrelines_topo_150k';

I think the speedup is only possible via this method because Sqlite is basically untyped. Expectedly, removing the cast altogether fails tests on MSSQL & PostGIS, but I suspect they're impacted too. Need a better solution really so we can at least always make use of the dataset PK index. Minimum might be casting the other way (ie. tracking table → dataset pk type) with the logic there's often likely to be less changes in the tracking table than there are rows in the dataset table?

While I was profiling, disabled an extra hash verification per-object lookup.

Related links:

Checklist:

  • Have you reviewed your own change?
  • Have you included test(s)? covered by existing tests
  • Have you updated the changelog?

@rcoup rcoup requested a review from olsen232 April 6, 2021 20:42
@rcoup rcoup force-pushed the rc-fix-gpkg-status-perf-regression branch from 06c0b92 to fa2c150 Compare April 7, 2021 10:56
rcoup added 4 commits April 7, 2021 12:00
make formatting more verbose at -vv or higher
This fixes a performance regression from b20eec4, `sno status` for a
21K/29K changeset is on the order of 1000x slower with the cast in place.

Likely the same issue is present in PostGIS/MSSQL, but lets start here.
Disables strict verification of object hashsums when reading objects
from disk. Eliminates an additional checksum calculation on each object.

https://github.com/libgit2/libgit2/search?q=GIT_OPT_ENABLE_STRICT_HASH_VERIFICATION&type=issues
@rcoup rcoup force-pushed the rc-fix-gpkg-status-perf-regression branch from fa2c150 to d1f5dbd Compare April 7, 2021 11:01
@rcoup rcoup merged commit 1ba5dfb into master Apr 7, 2021
@rcoup rcoup deleted the rc-fix-gpkg-status-perf-regression branch April 7, 2021 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants