Skip to content

Implementing Twitter™ Clone with SimpleFeed

Konstantin Gredeskoul edited this page Aug 12, 2017 · 3 revisions

To demonstrate features of this library with a well-understood example, consider a Use Case of story posting on a Social Network similar to Twitter™.

  • The events are pushed to the follower's activity streams as soon as they are published by the authors.

  • Followers who access their feed sometime later, will see the new story added to their feed, eventually.

  • We'll assume that, while we do want the events to appear in follower's feeds as quickly as possible, we place a higher value on ensuring that we serve individual activity streams to users very quickly: in near-constant time regardless of the size of the user-base. This is exactly the Use Case well-served by this library.

What is a User Anyway?

Note: the relationship between the author and their followers is left for the implementation to the developer using this library. You are expected to identify what "user" means in your Use Case, and pass their ID as the key to a unique activity stream.

Wait, what? Does it mean that...

As a side-effect of this approach, you are free to create additional activity streams by creating a fake users, such a "global" stream of events from ALL users, or geo-centered streams (you can use a zip-code of the author a "user id", and then "subscribe" users based on proximity).

So in a way, the term "user" (as used by SimpleFeed) simply denote a unique incarnation of an activity stream, one that aggregates events from a unique set of sources.

Translating to the API

Whenever a user on your site performs an interesting action, one that the product must display in other user's activity feeds, a developer using this library must get a list of users interested in the event first, create a multi-user instance of Activity, and then call #store on this instance.

Publishing Events and its Performance

The amount of time required to store the activity for all users in SimpleFeed depends entirely on whether or not your backend is sharded, ie. you can shard Redis into hundreds of shards by using a twemproxy sharding proxy. But you may not need to do this until you are serving activity feeds for millions of unique users, because the storage requirements are rather small (assuming you keep small values in SimpleFeed).

But, due to the use of Redis pipelining by the SimpleFeed backend, the hope is that the amount of time it takes to post all events grows non-linearly with the growing number of users. If you have some concrete benchmarks for SimpleFeed, please do share them.

Where to place the Publish Code?

If you are writing a web application, you are already, hopefully, using a background-processing framework such as Sidekiq. In any case, you will want to be publishing events to multiple users in a background job. This is because posting to hundreds of thousands (or millions) of users will take some time. One way or another it is much slower than reading individual user's feed (an operation, that this library is specifically designed to optimize for).

Taylor Swift posting on Twitter, and SimpleFeed API

One could say that, if you were to implement Twitter with SimpleFeed, you would be using the Multi-User API for writing to the feed, whenever users post events. Conversely, in order to render each user's activity in real-time (and in constant time) you will be using the Single-User API, as you will be reading from the feed.

Let's say we are on Twitter, and we follow Taylor Swift. Let's also say she sends a tweet. I expect my feed to be updated, and her post added. As a developer implementing this backend, you must first fetch all of her follower IDs, and then create a multi-user Activity instance, while passing the array of IDs as an argument. You will then call #store on the instance to publish this event, serialized somehow as the string value.

And yet, the Twitter example is just one particular Use Case. You can just as easily use this software to implement a 1-1 or many-1 event publishing paradigms.