Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some RedisLocks auto-releasing too fast #16276

Merged
merged 2 commits into from
May 19, 2021

Conversation

ClearlyClaire
Copy link
Contributor

@ClearlyClaire ClearlyClaire commented May 18, 2021

Fixes #16238 #15645

By default, RedisLock expires after 10 seconds, which may not be enough to process statuses, especially when those have attached media files.

This commit extends those 10 seconds to 15 minutes, which should be plenty enough to handle any status, while being short enough to not waste many sidekiq job retries in the exceedingly rare case in which a sidekiq process would crash when processing a Create or Delete.

Fixes mastodon#16238

By default, RedisLock expires after 10 seconds, which may not be enough to
process statuses, especially when those have attached media files.

This commit extends those 10 seconds to 15 minutes, which should be plenty
enough to handle any status, while being short enough to not waste many
sidekiq job retries in the exceedingly rare case in which a sidekiq process
would crash when processing a `Create` or `Delete`.
@Gargron
Copy link
Member

Gargron commented May 18, 2021

I think this may also close #15645

@ClearlyClaire
Copy link
Contributor Author

I think this may also close #15645

Only partially, there are many other uses of RedisLock, I will look into them shortly.

@ClearlyClaire ClearlyClaire changed the title Fix Delete and Create-related locks expiring too fast Fix some RedisLocks auto-releasing too fast May 18, 2021
@ClearlyClaire
Copy link
Contributor Author

ClearlyClaire commented May 18, 2021

Pushed a commit changing other RedisLock timeouts, basically the numbers I chose are:

  • default of 10 seconds for anything doing only a handful simple database queries
  • 5 minutes for anything performing significantly more involved queries (e.g. going through lists of followers and pushing redis messages)
  • 15 minutes for anything making HTTP requests to third-party servers

Note than in most cases having a long timeout isn't an issue. It's only an issue if the sidekiq process crashes abruptly (normal exception handling will make sure the locks are released) or the redis process crashes. Still, we should probably avoid having very long timeouts to make sure recovering from such a crash doesn't end up being a big issue.

Fixes mastodon#15645

- things that only perform a few simple database queries (e.g. finding and
  saving a record) have been left unchanged, so they'll still use the default
  10s duration
- things that perform significantly more complex database queries have been
  changed to a 5 minutes timeout
- things that perform multiple HTTP queries have been changed to a 15 minutes
  timeout
@ClearlyClaire ClearlyClaire linked an issue May 18, 2021 that may be closed by this pull request
@Gargron Gargron merged commit 9a19227 into mastodon:main May 19, 2021
ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this pull request May 22, 2021
Gargron pushed a commit that referenced this pull request May 22, 2021
tribela added a commit to tribela/mastodon that referenced this pull request May 24, 2021
@tribela
Copy link
Contributor

tribela commented Oct 7, 2021

irb(main):002:0> s = Status.where(uri: Tombstone.select(:uri))
=> #<ActiveRecord::Relation [#<Status id: 106712126356282706, uri: "https://gearlandia.haus/objects/aa8ba14b-07dd-4af4...", account_id: 106377726398314840, text: "", created_at: "2021-08-07 00:57:06.8415100...
irb(main):003:0> s.first
=> #<Status id: 106712126356282706, uri: "https://gearlandia.haus/objects/aa8ba14b-07dd-4af4...", account_id: 106377726398314840, text: "", created_at: "2021-08-07 00:57:06.841510000 +0000", updated_at: "2021-08-07 00:58:32.153301000 +0000", in_reply_to_id: 106712116033267570, reblog_of_id: nil, url: "https://gearlandia.haus/objects/aa8ba14b-07dd-4af4...", sensitive: false, visibility: "public", in_reply_to_account_id: 106377726398314840, application_id: nil, spoiler_text: "", reply: true, language: nil, conversation_id: 3325515, local: false, poll_id: nil, content_type: nil, deleted_at: nil, local_only: nil>
irb(main):004:0> s.first.created_at
=> Sat, 07 Aug 2021 00:57:06.841510000 UTC +00:00
irb(main):005:0> s.last.created_at
=> Fri, 18 Jun 2021 21:34:02.292221000 UTC +00:00
irb(main):006:0> s.count
=> 4

It seems to be happened again.

ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this pull request Jan 28, 2022
* Fix Delete and Create-related locks expiring too fast

Fixes mastodon#16238

By default, RedisLock expires after 10 seconds, which may not be enough to
process statuses, especially when those have attached media files.

This commit extends those 10 seconds to 15 minutes, which should be plenty
enough to handle any status, while being short enough to not waste many
sidekiq job retries in the exceedingly rare case in which a sidekiq process
would crash when processing a `Create` or `Delete`.

* Fix other RedisLock autorelease durations

Fixes mastodon#15645

- things that only perform a few simple database queries (e.g. finding and
  saving a record) have been left unchanged, so they'll still use the default
  10s duration
- things that perform significantly more complex database queries have been
  changed to a 5 minutes timeout
- things that perform multiple HTTP queries have been changed to a 15 minutes
  timeout
ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this pull request Jan 28, 2022
chrisguida pushed a commit to Start9Labs/mastodon that referenced this pull request Feb 26, 2022
* Fix Delete and Create-related locks expiring too fast

Fixes mastodon#16238

By default, RedisLock expires after 10 seconds, which may not be enough to
process statuses, especially when those have attached media files.

This commit extends those 10 seconds to 15 minutes, which should be plenty
enough to handle any status, while being short enough to not waste many
sidekiq job retries in the exceedingly rare case in which a sidekiq process
would crash when processing a `Create` or `Delete`.

* Fix other RedisLock autorelease durations

Fixes mastodon#15645

- things that only perform a few simple database queries (e.g. finding and
  saving a record) have been left unchanged, so they'll still use the default
  10s duration
- things that perform significantly more complex database queries have been
  changed to a 5 minutes timeout
- things that perform multiple HTTP queries have been changed to a 15 minutes
  timeout
chrisguida pushed a commit to Start9Labs/mastodon that referenced this pull request Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition with deleted post that has media attachment(s) Redis lock's TTL may leads to potential bugs
3 participants