Rewrite GC to handle Redis failures #1
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The most important change here is that Redis failures are now expected, and the entire GC/destroy client process has been updated to handle them. Broadly, failures are tolerated by allowing the GC to be re-runnable for a failed client ID.
When GC starts up, we fan out and call
destroyClient(clientId)for each candidate expired client ID. Each one of these calls can fail individually, and processing simply stops for that client ID. Since the removal of the client ID from the/clientssorted set is the last step, GC will simply pickup failed client IDs and re-run them in the future. As such, the lock around GC has also been removed, since failures are expected and multiple deletion runs for a single client ID are tolerable.Other notable changes:
ZADDcall increateClient. Previously, the score was assigned to zero, and updated in the callback withping(). This is susceptible to a race, however, where GC can run and pluck the client ID before its score is updated.destroyClientif it detected an expired client. This could lead to races, so it's now the sole responsibility of GC to cleanup old clients.