Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow VersionQuerySet.get_deleted #885

Closed
dbartenstein opened this issue Aug 16, 2021 · 1 comment
Closed

Very slow VersionQuerySet.get_deleted #885

dbartenstein opened this issue Aug 16, 2021 · 1 comment

Comments

@dbartenstein
Copy link

dbartenstein commented Aug 16, 2021

This issue seems is closely related to: #748 (comment)

Situation

We have been using the great django-reversion tool for almost 10 years now. Thanks for the work! 👍
When trying to recover deleted objects it often happens that the request times out (limited to 30 seconds): thus the recover feature cannot be used.

Database size

select "content_type_id", count(*) as count from "reversion_version" group by "content_type_id" order by "count";

content_type_id |  count  
-----------------+---------
             219 |       1
              79 |       1
             113 |       2
              38 |       3
             112 |       6
               1 |       6
             148 |       7
             220 |       8
             143 |      11
             167 |      13
              33 |      13
              74 |      16
             165 |      16
              95 |      19
             155 |      26
             114 |      26
              86 |      27
             109 |      38
             168 |      38
             125 |      38
             157 |      56
             128 |      60
              24 |      69
              77 |      91
             156 |      98
             134 |     106
             152 |     140
              12 |     176
              20 |     182
             147 |     191
             130 |     199
             150 |     241
              22 |     262
              45 |     272
              21 |     279
             149 |     285
             115 |     312
              31 |     329
             124 |     329
             121 |     411
             122 |     443
             103 |     492
              87 |     563
             120 |     576
             126 |     740
              23 |     848
             173 |    1022
             105 |    1158
             135 |    1572
              19 |    1698
             111 |    2042
             154 |    2192
             127 |    3007
             161 |    4443
              28 |    5004
             118 |    6741
              40 |    6795
             108 |    9100
              83 |   11452
             107 |   15239
             119 |   20518
             170 |   23007
              99 |   28197
              30 |   33924
             131 |   38922
              29 |   48250
             132 |   69230
             106 |   92339
             129 |  130372
              97 |  633001
              32 | 2024564
             133 | 2255535
              90 | 3463918
              85 | 3488188
(74 rows)

Reversion "out of the box"

SQL query for recovering objects of content type 124 (only 329 entries):

explain analyze SELECT "reversion_version"."id", "reversion_version"."revision_id", "reversion_version"."object_id", "reversion_version"."content_type_id", "reversion_version"."db", "reversion_version"."format", "reversion_version"."serialized_data", "reversion_version"."object_repr" FROM "reversion_version" WHERE "reversion_version"."id" IN (SELECT DISTINCT ON (V0."object_id") V0."id" FROM "reversion_version" V0 WHERE (V0."content_type_id" = 124 AND V0."db" = 'default' AND NOT EXISTS(SELECT (1) AS "a" FROM "main_siteconfiguration" U0 WHERE U0."id" = CAST(V0."object_id" AS integer) LIMIT 1)) ORDER BY V0."object_id" ASC, V0."id" DESC) ORDER BY "reversion_version"."id" LIMIT 21

Query plan

                                                                                                              QUERY PLAN                                                                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6888.97..6891.84 rows=21 width=339) (actual time=103293.671..137101.367 rows=8 loops=1)
   ->  Merge Join  (cost=6888.97..856662.55 rows=6214714 width=339) (actual time=103293.670..137101.358 rows=8 loops=1)
         Merge Cond: (reversion_version.id = "ANY_subquery".id)
         ->  Index Scan using reversion_version_pkey on reversion_version  (cost=0.43..818697.45 rows=12429427 width=339) (actual time=19.761..136263.631 rows=9592229 loops=1)
         ->  Sort  (cost=6888.53..6889.03 rows=200 width=4) (actual time=4.810..4.933 rows=8 loops=1)
               Sort Key: "ANY_subquery".id
               Sort Method: quicksort  Memory: 25kB
               ->  HashAggregate  (cost=6878.89..6880.89 rows=200 width=4) (actual time=4.790..4.905 rows=8 loops=1)
                     Group Key: "ANY_subquery".id
                     ->  Subquery Scan on "ANY_subquery"  (cost=6773.64..6863.85 rows=6014 width=4) (actual time=4.772..4.892 rows=8 loops=1)
                           ->  Unique  (cost=6773.64..6803.71 rows=6014 width=10) (actual time=4.770..4.888 rows=8 loops=1)
                                 ->  Sort  (cost=6773.64..6788.68 rows=6014 width=10) (actual time=4.769..4.882 rows=50 loops=1)
                                       Sort Key: v0.object_id, v0.id DESC
                                       Sort Method: quicksort  Memory: 27kB
                                       ->  Gather  (cost=4.03..6396.14 rows=6014 width=10) (actual time=0.583..4.851 rows=50 loops=1)
                                             Workers Planned: 1
                                             Workers Launched: 1
                                             ->  Hash Anti Join  (cost=4.03..6396.14 rows=3538 width=10) (actual time=0.100..0.294 rows=25 loops=2)
                                                   Hash Cond: ((v0.object_id)::integer = u0.id)
                                                   ->  Parallel Index Scan using reversion_version_db_46b2c1d7ad097ac4_uniq on reversion_version v0  (cost=0.56..6325.02 rows=7075 width=10) (actual time=0.015..0.236 rows=164 loops=2)
                                                         Index Cond: (((db)::text = 'default'::text) AND (content_type_id = 124))
                                                   ->  Hash  (cost=3.21..3.21 rows=21 width=4) (actual time=0.020..0.021 rows=21 loops=1)
                                                         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                         ->  Seq Scan on main_siteconfiguration u0  (cost=0.00..3.21 rows=21 width=4) (actual time=0.007..0.016 rows=21 loops=1)
 Planning time: 0.351 ms
 Execution time: 137101.603 ms
(26 rows)

The issue is caused by the ORDER BY "reversion_version"."id" as already explored in #748 (comment). Postgres first sorts the whole reversion_version table by id before filtering by content type and db.

Reversion with optimized query

explain analyze SELECT "reversion_version"."id", "reversion_version"."revision_id", "reversion_version"."object_id", "reversion_version"."content_type_id", "reversion_version"."db", "reversion_version"."format", "reversion_version"."serialized_data", "reversion_version"."object_repr" FROM "reversion_version" WHERE "reversion_version"."content_type_id" = 124 AND "reversion_version"."db" = 'default' AND "reversion_version"."id" IN (SELECT DISTINCT ON (V0."object_id") V0."id" FROM "reversion_version" V0 WHERE (V0."content_type_id" = 124 AND V0."db" = 'default' AND NOT EXISTS(SELECT (1) AS "a" FROM "main_siteconfiguration" U0 WHERE U0."id" = CAST(V0."object_id" AS integer) LIMIT 1)) ORDER BY V0."object_id" ASC, V0."id" DESC) ORDER BY "reversion_version"."id" LIMIT 21

Query plan

                                                                                                          QUERY PLAN                                                                                                             
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7575.54..7575.59 rows=21 width=339) (actual time=5.935..7.615 rows=8 loops=1)
   ->  Sort  (cost=7575.54..7590.57 rows=6014 width=339) (actual time=5.934..7.606 rows=8 loops=1)
         Sort Key: reversion_version.id
         Sort Method: quicksort  Memory: 36kB
         ->  Nested Loop  (cost=6879.32..7413.39 rows=6014 width=339) (actual time=5.857..7.584 rows=8 loops=1)
               ->  HashAggregate  (cost=6878.89..6880.89 rows=200 width=4) (actual time=5.835..7.509 rows=8 loops=1)
                     Group Key: v0.id
                     ->  Unique  (cost=6773.64..6803.71 rows=6014 width=10) (actual time=5.801..7.494 rows=8 loops=1)
                           ->  Sort  (cost=6773.64..6788.68 rows=6014 width=10) (actual time=5.800..7.474 rows=50 loops=1)
                                 Sort Key: v0.object_id, v0.id DESC
                                 Sort Method: quicksort  Memory: 27kB
                                 ->  Gather  (cost=4.03..6396.14 rows=6014 width=10) (actual time=0.655..7.427 rows=50 loops=1)
                                       Workers Planned: 1
                                       Workers Launched: 1
                                       ->  Hash Anti Join  (cost=4.03..6396.14 rows=3538 width=10) (actual time=0.139..0.383 rows=25 loops=2)
                                             Hash Cond: ((v0.object_id)::integer = u0.id)
                                             ->  Parallel Index Scan using reversion_version_db_46b2c1d7ad097ac4_uniq on reversion_version v0  (cost=0.56..6325.02 rows=7075 width=10) (actual time=0.022..0.291 rows=164 loops=2)
                                                   Index Cond: (((db)::text = 'default'::text) AND (content_type_id = 124))
                                             ->  Hash  (cost=3.21..3.21 rows=21 width=4) (actual time=0.027..0.027 rows=21 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                   ->  Seq Scan on main_siteconfiguration u0  (cost=0.00..3.21 rows=21 width=4) (actual time=0.008..0.020 rows=21 loops=1)
               ->  Index Scan using reversion_version_pkey on reversion_version  (cost=0.43..2.66 rows=1 width=339) (actual time=0.008..0.008 rows=1 loops=8)
                     Index Cond: (id = v0.id)
                     Filter: ((content_type_id = 124) AND ((db)::text = 'default'::text))
 Planning time: 0.592 ms
 Execution time: 7.739 ms
(26 rows)

To overcome the performance issue, simply filter reversion_version on content_type and db to limit the records to be filtered by id.

How to do it in Django-reversion

# Perform the subquery.
q = self.get_for_model(model, model_db).filter(pk__in=subquery)

See

return self.filter(pk__in=subquery)

Additional index

For large tables an additional index greatly helps!

CREATE INDEX "reversion_version_SPEED"
  ON public.reversion_version
  USING btree
  (db COLLATE pg_catalog."default", content_type_id);

Other tables

I compared the execution times of the queries with our largest table as well: content_type_id 85.

  • With the optimized query execution time was cut in half: ~30s instead of ~60s (using the reversion_version_SPEED index mentioned below).
  • Without this index it’s ~200s vs. ~120s. I.e. the current query is even faster.

@etianen: what do you think of extending the query and adding the index as demonstrated above?

dbartenstein pushed a commit to dbartenstein/django-reversion that referenced this issue Aug 16, 2021
Version model: add index on (content_type, db).
VersionQuerySet.get_deleted: filter by content_type and db before subquery to increase performance.
etianen added a commit that referenced this issue Aug 18, 2021
@etianen
Copy link
Owner

etianen commented Oct 4, 2021

Closed by #886

@etianen etianen closed this as completed Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants