Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Janitor optimization use requested at index #3115

Closed

Conversation

arnolf
Copy link

@arnolf arnolf commented May 18, 2022

Hi,

on large tables, janitor queries could take minutes.

Table like refresh_token can grow before there is some expired tokens.

By adding missing indexes on requested_at columns, and by ordering on this field we can avoid full table scan, and/or in memory sorting.

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security. vulnerability, I
    confirm that I got green light (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

@arnolf arnolf requested a review from aeneasr as a code owner May 18, 2022 10:02
@CLAassistant
Copy link

CLAassistant commented May 18, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov
Copy link

codecov bot commented May 18, 2022

Codecov Report

Merging #3115 (e30b4f2) into master (0a77a06) will decrease coverage by 0.18%.
The diff coverage is 83.78%.

❗ Current head e30b4f2 differs from pull request most recent head 13d762b. Consider uploading reports for the commit 13d762b to get more accurate results

@@            Coverage Diff             @@
##           master    #3115      +/-   ##
==========================================
- Coverage   79.67%   79.48%   -0.19%     
==========================================
  Files         112      112              
  Lines        7932     7864      -68     
==========================================
- Hits         6320     6251      -69     
  Misses       1214     1214              
- Partials      398      399       +1     
Impacted Files Coverage Δ
driver/registry.go 80.00% <42.85%> (ø)
persistence/sql/persister_consent.go 78.47% <84.61%> (+0.07%) ⬆️
driver/registry_sql.go 80.32% <100.00%> (ø)
persistence/sql/persister.go 78.57% <100.00%> (ø)
persistence/sql/persister_oauth2.go 81.01% <100.00%> (ø)
jwk/helper.go 100.00% <0.00%> (+6.74%) ⬆️
jwk/cast.go 60.00% <0.00%> (+60.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e586fd7...13d762b. Read the comment docs.

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! We are in the process of refactoring the storage layer. Could you please:

  1. Confirm this issue exists on the v2.x branch
  2. If it still exists, make your changes against that branch

Thank you very much!

@aeneasr aeneasr self-assigned this May 18, 2022
@arnolf
Copy link
Author

arnolf commented May 27, 2022

Hello,

I had a look on v2.x branch. It seems the code changed a lot.
The authentication and request tables has been merged in an unique flow tables, so I can't give you production feedback on this table as the v2 has not yet been released :(

However, the main issue still arise on refresh_token table even if the query changed (https://github.com/ory/hydra/blob/v2.x/persistence/sql/persister_oauth2.go#L430) :

  • it is still missing an hydra_oauth2_refresh_requested_at_idx index on (nid, requested_at)
  • and ORDER BY still using signature instead of requested_at

By the way, I think it would be better if the existing index on access_token table would be on (nid, requested_at), rather than on (requested_at, nid). With composite index, it's always better to put more selective index at first, that is, equality first, and then comparison.

I've made tests we a small amount of data, and forcing usage of indexes on PostgreSQL:

Here the actual query plan :
Limit (cost=12.27..12.28 rows=1 width=44) (actual time=0.048..0.049 rows=0 loops=1)
-> Sort (cost=12.27..12.28 rows=1 width=44) (actual time=0.048..0.048 rows=0 loops=1)
Sort Key: signature
Sort Method: quicksort Memory: 25kB
-> Index Scan using hydra_oauth2_refresh_client_id_idx on hydra_oauth2_refresh (cost=0.14..12.26 rows=1 width=44) (actual time=0.043..0.044 rows=0 loops=1)
Index Cond: (nid = '4511a21c-67e1-4a80-8999-444c1a418ce6'::uuid)
Filter: (requested_at < '2021-12-12 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 1
Planning time: 0.105 ms
Execution time: 0.065 ms

With index on (nid, requested_at)
Limit (cost=8.16..8.17 rows=1 width=44) (actual time=0.015..0.015 rows=0 loops=1)
-> Sort (cost=8.16..8.17 rows=1 width=44) (actual time=0.014..0.015 rows=0 loops=1)
Sort Key: signature
Sort Method: quicksort Memory: 25kB
-> Index Scan using hydra_oauth2_refresh_requested_at_idx on hydra_oauth2_refresh (cost=0.14..8.15 rows=1 width=44) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: ((nid = '4511a21c-67e1-4a80-8999-444c1a418ce6'::uuid) AND (requested_at < '2021-12-12 00:00:00'::timestamp without time zone))
Planning time: 0.101 ms
Execution time: 0.034 ms

With index on (nid, requested_at) and ORDER BY requested_at, we have no sorting, no full scan, and we have full usage of index. :
Limit (cost=0.14..8.15 rows=1 width=52) (actual time=0.011..0.011 rows=0 loops=1)
-> Index Scan using hydra_oauth2_refresh_requested_at_idx on hydra_oauth2_refresh (cost=0.14..8.15 rows=1 width=52) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: ((nid = '4511a21c-67e1-4a80-8999-444c1a418ce6'::uuid) AND (requested_at < '2021-12-12 00:00:00'::timestamp without time zone))
Planning time: 0.124 ms
Execution time: 0.036 ms

Since the code change a lot, I think it would be better to create a new pull request, or an issue (I don't know yet if I would have time to create a new pull request).

@arnolf arnolf force-pushed the janitor-optimization-use-requested-at-index branch 2 times, most recently from 9129408 to 32421b9 Compare June 3, 2022 11:36
…R BY clauses to improve query plan performance
@arnolf arnolf force-pushed the janitor-optimization-use-requested-at-index branch from 32421b9 to 85962db Compare June 3, 2022 11:40
@arnolf arnolf force-pushed the janitor-optimization-use-requested-at-index branch from 85962db to 13d762b Compare June 3, 2022 11:59
@aeneasr
Copy link
Member

aeneasr commented May 24, 2023

Superseded by #3516

@aeneasr aeneasr closed this May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants