-
-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove expired data from the database, ISSUE-952 #2406
Conversation
Hope It's enough :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR, I have some ideas on how to further improve it 😉
func (p *Persister) DeleteExpiredContinuitySessions(ctx context.Context, expiresAt time.Time, limit int) error { | ||
// #nosec G201 | ||
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf( | ||
"DELETE FROM %s WHERE expires_at <= ? LIMIT ?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be fun. Removing limits in our case will be tricky (or we'll just cleanup with our version where we use limits Wikia#56 and then just run this versionlater on ). As long as we run it more frequently this shouldn't be an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be fun.
Is limit-less deletion for an established system a big performance impact? If so, we should probably add some kind of limit, e.g. using a sub-query, that works on all DBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is tricky since we didn't have this ability from the start and we can't remove everything all at once, even with using the dates like in the pr. When switching from one system to another we created a lot of sessions/flows etc at the same time. Dropping those records can cause a big replication lag when the delete query hits that moment in time. Using limit helps us mitigate the problem a little and drops them in batches. This way we don't have to run this job as often and there is no need to monitor this as much
0b815c7
to
df0f175
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks pretty good already 👍
I have some small suggestions how to improve it further 😉
cmd/cliclient/cleanup.go
Outdated
fmt.Println(cmd.UsageString()) | ||
fmt.Println("") | ||
fmt.Println("When using flag -e, environment variable DSN must be set") | ||
os.Exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As tests will always fail if you call os.Exit
we instead use cobra.Command.RunE
and propagate errors up. As in this case you already printed the error message, you can just return cmdx.FailSilently(cmd)
. It will silence the standard cobra output and results in an error exit code being set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did what you had in mind
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf( | ||
"DELETE FROM %s WHERE expires_at <= ?", | ||
new(continuity.Container).TableName(ctx), | ||
), | ||
expiresAt, | ||
).Exec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf( | |
"DELETE FROM %s WHERE expires_at <= ?", | |
new(continuity.Container).TableName(ctx), | |
), | |
expiresAt, | |
).Exec() | |
err := p.GetConnection(ctx) | |
.Where("expires_at <= ?", expiresAt) | |
.Delete(new(continuity.Container)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the raw query since it's done the same way everywhere else. Shouldn't this be done in a follow-up with all other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this query is that it will be extremely slow as it is unbound. Check out this PR for batch-based processing: https://github.com/grantzvolsky/hydra/pull/2/files#diff-6034803b09ef5017e3aa7d3082827dadb4503e6ee3a56853d49edd419a75f864
Co-authored-by: Patrik <zepatrik@users.noreply.github.com>
@zepatrik I left a few comments. If it's ok to leave it this way please resolve the conversations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abador looks like there are a couple of compile errors
I've seen them on the master branch. I'll merge master and check if they're gone |
Ok it didn't help:/ I see similar errors in this pr: https://github.com/ory/kratos/runs/6133235832?check_suite_focus=true |
Codecov Report
@@ Coverage Diff @@
## master #2406 +/- ##
==========================================
+ Coverage 76.56% 76.61% +0.05%
==========================================
Files 316 319 +3
Lines 17603 17813 +210
==========================================
+ Hits 13477 13647 +170
- Misses 3192 3225 +33
- Partials 934 941 +7
Continue to review full report at Codecov.
|
6558e19
to
6448602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks very good now 👍 I just have some ideas how to improve it further, and a few questions.
internal/driver.go
Outdated
config.ViperKeyDatabaseCleanupSleepTables: 1 * time.Minute, | ||
config.ViperKeyDatabaseCleanupBatchSize: 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those defaults will be loaded from the config schema, so there should be no need to add them here.
persistence/sql/persister.go
Outdated
} | ||
time.Sleep(wait) | ||
|
||
p.r.Logger().Println("Successfully cleaned up the SQL database!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we did not fully clean the database but rather purged batchSize
from each of the tables. If there are more expired rows, they would not have been deleted. The output should reflect that as we will have to re-run the cleanup multiple times until all data are truly cleaned.
1909ac8
to
d615d80
Compare
db4c862
to
0580b63
Compare
@@ -55,7 +55,7 @@ func (p *Persister) DeleteContinuitySession(ctx context.Context, id uuid.UUID) e | |||
func (p *Persister) DeleteExpiredContinuitySessions(ctx context.Context, expiresAt time.Time, limit int) error { | |||
// #nosec G201 | |||
err := p.GetConnection(ctx).RawQuery(fmt.Sprintf( | |||
"DELETE FROM %s WHERE id in (SELECT id FROM (SELECT id FROM %s c WHERE expires_at <= ? and nid = ? ORDER BY expires_at ASC LIMIT %d ) AS s )", | |||
"DELETE FROM %s WHERE id IN (SELECT id FROM %s WHERE expires_at <= ? and nid = ? ORDER BY expires_at ASC LIMIT %d )", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can revert this now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
d910a1e
to
c011208
Compare
Thank you so much for this great contribution! |
Closes ory#952 Co-authored-by: Patrik <zepatrik@users.noreply.github.com>
Cleaning up the database from old data and flows
Related issue(s)
#952
Checklist
introduces a new feature.
contributing code guidelines.
vulnerability. If this pull request addresses a security. vulnerability, I
confirm that I got green light (please contact
security@ory.sh) from the maintainers to push
the changes.
works.
Further Comments