fix: add index on requested_at for refresh tokens and use it in janitor #3516

sgal · 2023-05-12T19:09:48Z

Related issue(s)

Inspired by #3115

This change addresses the performance issue with Hydra Janitor, which makes the cleanups extremely slow due to inefficient query that leads to a full table scan.

The approach here is taken from the issue above. Adding an index on the requested_at field and ordering by it in Janitor avoids full table scan and improves the performance of the cleanups.

Checklist

I have read the contributing guidelines.
I have referenced an issue containing the design document if my change
introduces a new feature.
I am following the
contributing code guidelines.
I have read the security policy.
I confirm that this pull request does not address a security
vulnerability. If this pull request addresses a security vulnerability, I
confirm that I got the approval (please contact
security@ory.sh) from the maintainers to push
the changes.
I have added tests that prove my fix is effective or that my feature
works.
I have added or changed the documentation.

Further Comments

When we run the following query:

EXPLAIN ANALYSE DELETE FROM hydra_oauth2_refresh WHERE signature in (
SELECT signature FROM (SELECT signature FROM hydra_oauth2_refresh hoa WHERE requested_at < now() - interval '720 hours' and nid = '<UUID>' ORDER BY signature LIMIT 100 ) as s
)

We get the following plan, with an execution time of 2.6 seconds. So 2,6 seconds to delete 100 refresh tokens. We can note here that there is a very high cost of the index scan with a lot of filtered rows.

"Delete on hydra_oauth2_refresh  (cost=97.31..955.25 rows=100 width=74) (actual time=2635.862..2635.864 rows=0 loops=1)"
"  ->  Nested Loop  (cost=97.31..955.25 rows=100 width=74) (actual time=2584.547..2586.199 rows=100 loops=1)"
"        ->  HashAggregate  (cost=96.75..97.75 rows=100 width=112) (actual time=2584.517..2584.619 rows=100 loops=1)"
"              Group Key: (s.signature)::text"
"              Batches: 1  Memory Usage: 48kB"
"              ->  Subquery Scan on s  (cost=0.56..96.50 rows=100 width=112) (actual time=277.987..2584.186 rows=100 loops=1)"
"                    ->  Limit  (cost=0.56..95.50 rows=100 width=44) (actual time=277.981..2583.939 rows=100 loops=1)"
"                          ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh hoa  (cost=0.56..2184604.46 rows=2300987 width=44) (actual time=277.979..2583.852 rows=100 loops=1)"
"                                Filter: ((nid = '<UUID>'::uuid) AND (requested_at < (now() - '720:00:00'::interval)))"
"                                Rows Removed by Filter: 1778803"
"        ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh  (cost=0.56..8.57 rows=1 width=50) (actual time=0.014..0.014 rows=1 loops=100)"
"              Index Cond: ((signature)::text = (s.signature)::text)"
"Planning Time: 1.830 ms"
"Execution Time: 2635.921 ms"

Just to confirm, we also checked the distribution of refresh tokens over the dates, but looks good:

"2023-04-15"	114043
"2023-04-14"	162217
"2023-04-13"	176243
"2023-04-12"	151112
"2023-04-11"	104758
"2023-04-10"	66714

To resolve the issue, we applied an index to the refresh token table:

CREATE INDEX hydra_oauth2_refresh_requested_at_idx ON hydra_oauth2_refresh (requested_at);

When we now try to run the cleanup query, and instead change the order by from signature to requested_at

EXPLAIN ANALYSE DELETE FROM hydra_oauth2_refresh WHERE signature in (
SELECT signature FROM (SELECT signature FROM hydra_oauth2_refresh hoa WHERE requested_at < now() - interval '720 hours' and nid = '<UUID>' ORDER BY requested_at LIMIT 100 ) as s
)

We get a much more healthy execution with 720 hours interval:

"Delete on hydra_oauth2_refresh  (cost=44.65..902.59 rows=100 width=74) (actual time=201.503..201.506 rows=0 loops=1)"
"  ->  Nested Loop  (cost=44.65..902.59 rows=100 width=74) (actual time=1.548..109.779 rows=100 loops=1)"
"        ->  HashAggregate  (cost=44.09..45.09 rows=100 width=112) (actual time=0.389..0.540 rows=100 loops=1)"
"              Group Key: (s.signature)::text"
"              Batches: 1  Memory Usage: 48kB"
"              ->  Subquery Scan on s  (cost=0.44..43.84 rows=100 width=112) (actual time=0.146..0.354 rows=100 loops=1)"
"                    ->  Limit  (cost=0.44..42.84 rows=100 width=52) (actual time=0.143..0.326 rows=100 loops=1)"
"                          ->  Index Scan using hydra_oauth2_refresh_requested_at_idx on hydra_oauth2_refresh hoa  (cost=0.44..976522.26 rows=2302988 width=52) (actual time=0.141..0.316 rows=100 loops=1)"
"                                Index Cond: (requested_at < (now() - '720:00:00'::interval))"
"                                Filter: (nid = '<UUID>'::uuid)"
"        ->  Index Scan using hydra_oauth2_refresh_pkey on hydra_oauth2_refresh  (cost=0.56..8.57 rows=1 width=50) (actual time=1.089..1.089 rows=1 loops=100)"
"              Index Cond: ((signature)::text = (s.signature)::text)"
"Planning Time: 1.876 ms"
"Execution Time: 201.566 ms"

codecov · 2023-05-12T19:36:50Z

Codecov Report

Merging #3516 (cfb9b01) into master (31b9e66) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head cfb9b01 differs from pull request most recent head eda1501. Consider uploading reports for the commit eda1501 to get more accurate results

@@            Coverage Diff             @@
##           master    #3516      +/-   ##
==========================================
- Coverage   76.89%   76.85%   -0.05%     
==========================================
  Files         124      124              
  Lines        9102     9175      +73     
==========================================
+ Hits         6999     7051      +52     
- Misses       1660     1673      +13     
- Partials      443      451       +8

Impacted Files	Coverage Δ
persistence/sql/persister_oauth2.go	`82.19% <100.00%> (-0.76%)`	⬇️

... and 1 file with indirect coverage changes

aeneasr · 2023-05-12T20:11:00Z

May I suggest adding nid as the index as well, given that it's part of the query in question?

sgal · 2023-05-12T20:25:02Z

@aeneasr Do you mean as a separate index or part of the requested_at one like below

CREATE INDEX hydra_oauth2_refresh_requested_at_idx ON hydra_oauth2_refresh (nid, requested_at);

The composite one was suggested by @arnolf here #3115 (comment)

aeneasr · 2023-05-12T20:27:57Z

Like below :)

sgal · 2023-05-16T07:28:47Z

@aeneasr Please have a look, I fixed the index.

kmherrmann · 2023-05-23T14:49:22Z

@hperl @alnr can either of you review this please?

hperl

LGTM! 🎉

Thanks for the contribution!

sgal requested a review from aeneasr as a code owner May 12, 2023 19:09

fix: add index on requested_at for refresh tokens and use it in janitor

eda1501

hperl approved these changes May 24, 2023

View reviewed changes

Merge branch 'master' into fix-refresh-janitororder-by-requested-at

bbba492

aeneasr approved these changes May 24, 2023

View reviewed changes

aeneasr merged commit 5b8e712 into ory:master May 24, 2023

aeneasr mentioned this pull request May 24, 2023

Janitor optimization use requested at index #3115

Closed

7 tasks

sgal deleted the fix-refresh-janitororder-by-requested-at branch September 1, 2023 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add index on requested_at for refresh tokens and use it in janitor #3516

fix: add index on requested_at for refresh tokens and use it in janitor #3516

sgal commented May 12, 2023 •

edited

Loading

codecov bot commented May 12, 2023 •

edited

Loading

aeneasr commented May 12, 2023

sgal commented May 12, 2023

aeneasr commented May 12, 2023

sgal commented May 16, 2023

kmherrmann commented May 23, 2023

hperl left a comment

fix: add index on requested_at for refresh tokens and use it in janitor #3516

fix: add index on requested_at for refresh tokens and use it in janitor #3516

Conversation

sgal commented May 12, 2023 • edited Loading

Related issue(s)

Checklist

Further Comments

codecov bot commented May 12, 2023 • edited Loading

Codecov Report

aeneasr commented May 12, 2023

sgal commented May 12, 2023

aeneasr commented May 12, 2023

sgal commented May 16, 2023

kmherrmann commented May 23, 2023

hperl left a comment

Choose a reason for hiding this comment

sgal commented May 12, 2023 •

edited

Loading

codecov bot commented May 12, 2023 •

edited

Loading