Rewrite GC to handle Redis failures #1

daveyeu · 2014-10-17T14:06:56Z

The most important change here is that Redis failures are now expected, and the entire GC/destroy client process has been updated to handle them. Broadly, failures are tolerated by allowing the GC to be re-runnable for a failed client ID.

When GC starts up, we fan out and call destroyClient(clientId) for each candidate expired client ID. Each one of these calls can fail individually, and processing simply stops for that client ID. Since the removal of the client ID from the /clients sorted set is the last step, GC will simply pickup failed client IDs and re-run them in the future. As such, the lock around GC has also been removed, since failures are expected and multiple deletion runs for a single client ID are tolerable.

Other notable changes:

The score is now assigned to a client ID in the initial ZADD call in createClient. Previously, the score was assigned to zero, and updated in the callback with ping(). This is susceptible to a race, however, where GC can run and pluck the client ID before its score is updated.
Publishing no longer destroys clients. Previously, publish would check the expiry of every client ID, and call destroyClient if it detected an expired client. This could lead to races, so it's now the sole responsibility of GC to cleanup old clients.

Small victories.

Publishing was stepping on GC's toes by calling `destroyClient` whenever it detected a client ID was past its prime. I *think* there may have been some contention between the two (publisher and gc), so this moves the cleanup responsibility wholly to the latter. Well, not completely. The `disconnect` function inside of Faye itself calls `destroyClient` as well, but let's leave that be.

A lot of changes here. Primarily, the destroyClient function now tolerates Redis failures all the way downstream, and if any error is detected, the destroy process stops. Instead of trying to recover, we rely on the hope that GC will pick up a client and re-run the calls in the future, since the removal from the /clients sorted set is the final Redis call. The GC lock has also been removed. After grabbing a list of expired client IDs, we iterate over them with destroyClient, which returns immediately. Since we're simply launching destroyClient into flight, and don't care if it succeeds or not, we allow GC to re-run in the future without any restrictions. Because destroyClient doesn't block, we have an additional, expected test failure -- "Redis engine destroyClient when the client has subscriptions stops the client receiving messages:". This will fail b/c it's written to expect the destroy to occur immediately. If the test is rewritten with the expectation in a callback, it succeeds.

Previously, it would set the initial score to zero, and then allow ping() to update the score to now in the callback. This seems like it would allow a window of opportunity for GC to sneak in and kill the newly created client, however, so now it simply sets the score in the initial call.

jpignata · 2014-10-17T14:50:49Z

lgtm

Rewrite GC to handle Redis failures

daveyeu added 9 commits October 16, 2014 16:39

Fix failing test with undefined clientId

b6e3abf

Small victories.

Remove useless connection logging

c16ee98

Convert console.log to Faye debug in GC

c9a906f

Modify Faye logging level when running tests

ebd7f3e

Update README with test running instructions

67c2cfb

Update version

f558fe6

Fixed namespace when removing subscriptions

f771ae5

daveyeu added a commit that referenced this pull request Oct 17, 2014

Merge pull request #1 from groupme/redis-errors

2d43d61

Rewrite GC to handle Redis failures

daveyeu merged commit 2d43d61 into master Oct 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite GC to handle Redis failures #1

Rewrite GC to handle Redis failures #1

Uh oh!

daveyeu commented Oct 17, 2014

Uh oh!

jpignata commented Oct 17, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rewrite GC to handle Redis failures #1

Rewrite GC to handle Redis failures #1

Uh oh!

Conversation

daveyeu commented Oct 17, 2014

Uh oh!

jpignata commented Oct 17, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants