Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove expired data from the database, ISSUE-952 #2406

Merged
merged 10 commits into from
Jun 20, 2022

Conversation

abador
Copy link
Contributor

@abador abador commented Apr 19, 2022

Cleaning up the database from old data and flows

Related issue(s)

#952

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security. vulnerability, I
    confirm that I got green light (please contact
    security@ory.sh) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Further Comments

@abador
Copy link
Contributor Author

abador commented Apr 19, 2022

Hope It's enough :)

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, I have some ideas on how to further improve it 😉

func (p *Persister) DeleteExpiredContinuitySessions(ctx context.Context, expiresAt time.Time, limit int) error {
// #nosec G201
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf(
"DELETE FROM %s WHERE expires_at <= ? LIMIT ?",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be fun. Removing limits in our case will be tricky (or we'll just cleanup with our version where we use limits Wikia#56 and then just run this versionlater on ). As long as we run it more frequently this shouldn't be an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fun.

Is limit-less deletion for an established system a big performance impact? If so, we should probably add some kind of limit, e.g. using a sub-query, that works on all DBs.

Copy link
Contributor Author

@abador abador Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is tricky since we didn't have this ability from the start and we can't remove everything all at once, even with using the dates like in the pr. When switching from one system to another we created a lot of sessions/flows etc at the same time. Dropping those records can cause a big replication lag when the delete query hits that moment in time. Using limit helps us mitigate the problem a little and drops them in batches. This way we don't have to run this job as often and there is no need to monitor this as much

persistence/sql/persister.go Outdated Show resolved Hide resolved
cmd/cleanup/sql.go Outdated Show resolved Hide resolved
cmd/cleanup/sql.go Show resolved Hide resolved
Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks pretty good already 👍
I have some small suggestions how to improve it further 😉

fmt.Println(cmd.UsageString())
fmt.Println("")
fmt.Println("When using flag -e, environment variable DSN must be set")
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tests will always fail if you call os.Exit we instead use cobra.Command.RunE and propagate errors up. As in this case you already printed the error message, you can just return cmdx.FailSilently(cmd). It will silence the standard cobra output and results in an error exit code being set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did what you had in mind

cmd/cliclient/cleanup.go Outdated Show resolved Hide resolved
cmd/cliclient/cleanup.go Outdated Show resolved Hide resolved
persistence/sql/persister.go Show resolved Hide resolved
Comment on lines 48 to 53
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf(
"DELETE FROM %s WHERE expires_at <= ?",
new(continuity.Container).TableName(ctx),
),
expiresAt,
).Exec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf(
"DELETE FROM %s WHERE expires_at <= ?",
new(continuity.Container).TableName(ctx),
),
expiresAt,
).Exec()
err := p.GetConnection(ctx)
.Where("expires_at <= ?", expiresAt)
.Delete(new(continuity.Container)),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the raw query since it's done the same way everywhere else. Shouldn't this be done in a follow-up with all other cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this query is that it will be extremely slow as it is unbound. Check out this PR for batch-based processing: https://github.com/grantzvolsky/hydra/pull/2/files#diff-6034803b09ef5017e3aa7d3082827dadb4503e6ee3a56853d49edd419a75f864

abador and others added 2 commits April 27, 2022 09:56
Co-authored-by: Patrik <zepatrik@users.noreply.github.com>
@abador
Copy link
Contributor Author

abador commented Apr 28, 2022

@zepatrik I left a few comments. If it's ok to leave it this way please resolve the conversations

@aeneasr aeneasr requested a review from zepatrik April 28, 2022 21:16
Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abador looks like there are a couple of compile errors

@abador
Copy link
Contributor Author

abador commented Apr 29, 2022

@abador looks like there are a couple of compile errors

I've seen them on the master branch. I'll merge master and check if they're gone

@abador
Copy link
Contributor Author

abador commented Apr 29, 2022

Ok it didn't help:/ I see similar errors in this pr: https://github.com/ory/kratos/runs/6133235832?check_suite_focus=true

@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #2406 (c011208) into master (93bf1e2) will increase coverage by 0.05%.
The diff coverage is 84.28%.

@@            Coverage Diff             @@
##           master    #2406      +/-   ##
==========================================
+ Coverage   76.56%   76.61%   +0.05%     
==========================================
  Files         316      319       +3     
  Lines       17603    17813     +210     
==========================================
+ Hits        13477    13647     +170     
- Misses       3192     3225      +33     
- Partials      934      941       +7     
Impacted Files Coverage Δ
session/persistence.go 0.00% <ø> (ø)
cmd/cliclient/cleanup.go 31.03% <31.03%> (ø)
persistence/sql/persister.go 73.28% <64.70%> (-3.01%) ⬇️
cmd/cleanup/sql.go 96.00% <96.00%> (ø)
cmd/cleanup/root.go 100.00% <100.00%> (ø)
cmd/root.go 75.00% <100.00%> (+1.08%) ⬆️
continuity/test/persistence.go 100.00% <100.00%> (ø)
driver/config/config.go 82.17% <100.00%> (+0.11%) ⬆️
persistence/sql/persister_continuity.go 91.42% <100.00%> (+5.06%) ⬆️
persistence/sql/persister_login.go 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93bf1e2...c011208. Read the comment docs.

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks very good now 👍 I just have some ideas how to improve it further, and a few questions.

cmd/cleanup/sql.go Outdated Show resolved Hide resolved
continuity/test/persistence.go Outdated Show resolved Hide resolved
continuity/persistence.go Outdated Show resolved Hide resolved
Comment on lines 49 to 50
config.ViperKeyDatabaseCleanupSleepTables: 1 * time.Minute,
config.ViperKeyDatabaseCleanupBatchSize: 100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those defaults will be loaded from the config schema, so there should be no need to add them here.

driver/config/config.go Outdated Show resolved Hide resolved
driver/config/config.go Outdated Show resolved Hide resolved
cmd/cliclient/cleanup.go Outdated Show resolved Hide resolved
}
time.Sleep(wait)

p.r.Logger().Println("Successfully cleaned up the SQL database!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we did not fully clean the database but rather purged batchSize from each of the tables. If there are more expired rows, they would not have been deleted. The output should reflect that as we will have to re-run the cleanup multiple times until all data are truly cleaned.

persistence/sql/persister_continuity.go Show resolved Hide resolved
@abador abador force-pushed the issue-952-cleanup-db branch 3 times, most recently from 1909ac8 to d615d80 Compare May 25, 2022 09:38
@abador
Copy link
Contributor Author

abador commented May 27, 2022

@aeneasr and @zepatrik please take a look when you'll have a chance

@aeneasr aeneasr self-requested a review June 4, 2022 08:53
@@ -55,7 +55,7 @@ func (p *Persister) DeleteContinuitySession(ctx context.Context, id uuid.UUID) e
func (p *Persister) DeleteExpiredContinuitySessions(ctx context.Context, expiresAt time.Time, limit int) error {
// #nosec G201
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf(
"DELETE FROM %s WHERE id in (SELECT id FROM (SELECT id FROM %s c WHERE expires_at <= ? and nid = ? ORDER BY expires_at ASC LIMIT %d ) AS s )",
"DELETE FROM %s WHERE id IN (SELECT id FROM %s WHERE expires_at <= ? and nid = ? ORDER BY expires_at ASC LIMIT %d )",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revert this now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@abador abador requested review from aeneasr and zepatrik June 14, 2022 14:14
@aeneasr aeneasr merged commit 29d6376 into ory:master Jun 20, 2022
@aeneasr
Copy link
Member

aeneasr commented Jun 20, 2022

Thank you so much for this great contribution!

@harnash harnash deleted the issue-952-cleanup-db branch July 8, 2022 13:29
@frederikhors
Copy link
Contributor

Is there any docs about this tool? @abador, @aeneasr?

peturgeorgievv pushed a commit to senteca/kratos-fork that referenced this pull request Jun 30, 2023
Closes ory#952

Co-authored-by: Patrik <zepatrik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants