Skip to content

Commit

Permalink
ref: Increase reprocessing TTL (#44883)
Browse files Browse the repository at this point in the history
This increases (and refreshes) the main reprocessing counter/info in
Redis. It also introduces a separate TTL for tombstones.

Reprocessing has been getting "stuck" quite a lot recently. Part of the
problem is this counter which is responsible for calling
`finish_reprocessing` in the end, and a missing counter value will also
manifest itself as a progress bar stuck at 100% which is confusing to
customers.

We have seen reprocessing jobs which are unreasonably slow, and can
legitimately take >24h to complete. Increasing and refreshing the TTL
reduces the likelihood of it breaking reprocessing, and also allows the
team more time to investigate potential issues with reprocessing.
  • Loading branch information
Swatinem authored and Andrii Soldatenko committed Feb 21, 2023
1 parent e31c639 commit f7536e5
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 12 deletions.
7 changes: 5 additions & 2 deletions src/sentry/conf/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2815,8 +2815,11 @@ def build_cdc_postgres_init_db_volume(settings):
# for synchronization/progress report.
SENTRY_REPROCESSING_SYNC_REDIS_CLUSTER = "default"

# How long can reprocessing take before we start deleting its Redis keys?
SENTRY_REPROCESSING_SYNC_TTL = 3600 * 24
# How long tombstones from reprocessing will live.
SENTRY_REPROCESSING_TOMBSTONES_TTL = 24 * 3600

# How long reprocessing counters are kept in Redis before they expire.
SENTRY_REPROCESSING_SYNC_TTL = 30 * 24 * 3600 # 30 days

# How many events to query for at once while paginating through an entire
# issue. Note that this needs to be kept in sync with the time-limits on
Expand Down
22 changes: 12 additions & 10 deletions src/sentry/reprocessing2.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@
preprocess_event. The event payload is taken from a backup that was made on
first ingestion in preprocess_event.
3. wait_group_reprocessed in sentry.tasks.reprocessing2 polls a counter in
Redis to see if reprocessing is done. When it reaches zero, all associated
models like assignee and activity are moved into the new group.
3. `mark_event_reprocessed` will decrement the pending event counter in Redis
to see if reprocessing is done.
When the counter reaches zero, it will trigger the `finish_reprocessing` task,
which will move all associated models like assignee and activity into the new group.
A group redirect is installed. The old group is deleted, while the new group
is unresolved. This effectively unsets the REPROCESSING status.
Expand Down Expand Up @@ -397,12 +399,12 @@ def buffered_delete_old_primary_hash(
if old_primary_hash is not None and old_primary_hash != current_primary_hash:
event_key = _get_old_primary_hash_subset_key(project_id, group_id, old_primary_hash)
client.lpush(event_key, f"{to_timestamp(datetime)};{event_id}")
client.expire(event_key, settings.SENTRY_REPROCESSING_SYNC_TTL)
client.expire(event_key, settings.SENTRY_REPROCESSING_TOMBSTONES_TTL)

if old_primary_hash not in old_primary_hashes:
old_primary_hashes.add(old_primary_hash)
client.sadd(primary_hash_set_key, old_primary_hash)
client.expire(primary_hash_set_key, settings.SENTRY_REPROCESSING_SYNC_TTL)
client.expire(primary_hash_set_key, settings.SENTRY_REPROCESSING_TOMBSTONES_TTL)

with sentry_sdk.configure_scope() as scope:
scope.set_tag("project_id", project_id)
Expand Down Expand Up @@ -497,10 +499,6 @@ def buffered_handle_remaining_events(
client = _get_sync_redis_client()
# We explicitly cluster by only project_id and group_id here such that our
# RENAME command later succeeds.
#
# We also use legacy string formatting here because new-style Python
# formatting is quite confusing when the output string is supposed to
# contain {}.
key = f"re2:remaining:{{{project_id}:{old_group_id}}}"

if datetime_to_event:
Expand Down Expand Up @@ -579,8 +577,12 @@ def mark_event_reprocessed(data=None, group_id=None, project_id=None, num_events

project_id = data["project"]

client = _get_sync_redis_client()
# refresh the TTL of the metadata:
client.expire(_get_info_reprocessed_key(group_id), settings.SENTRY_REPROCESSING_SYNC_TTL)
key = _get_sync_counter_key(group_id)
if _get_sync_redis_client().decrby(key, num_events) == 0:
client.expire(key, settings.SENTRY_REPROCESSING_SYNC_TTL)
if client.decrby(key, num_events) == 0:
from sentry.tasks.reprocessing2 import finish_reprocessing

finish_reprocessing.delay(project_id=project_id, group_id=group_id)
Expand Down

0 comments on commit f7536e5

Please sign in to comment.