Skip to content
This repository has been archived by the owner on Nov 26, 2019. It is now read-only.

Open Problem: PubSub at Scale (1M, 10M, 100M, 1B.. nodes) #5

Merged
merged 16 commits into from
Nov 3, 2019
128 changes: 128 additions & 0 deletions OPEN_PROBLEMS/PUBSUB_AT_SCALE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# PubSub at Scale

## Short Description
> In one sentence or paragraph.

Publish/Subscribe (pubsub) messaging systems have been proposed and traditionally used to disintermediate senders and receivers of messages. That is, in pubsub systems, publishers of content do not send the published messages directly to one or a group of receivers, but instead *publishers are sending messages to a topic.* This enhances asynchronous communication and reduces tremendously network traffic and bandwidth-requirements.

## Long Description

Pubsub systems are a subgroup of broadcast trees, according to which, once messages are published, they are broadcast to all nodes in the tree. Although broadcast can ensure timely delivery of messages to all receipients, it causes severe stress to the system in terms of bandwidth needed to deliver the messages.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not necessarily trees!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I mentioned they are a "subgroup" :) but I rephrased in the new version. Thanks!


One of the main benefits of pubsub systems is that receivers of information (or subscribers) can individually pull information when they need to. This reduces a lot the strain put in the system to deliver messages as they are published. However, in case of large-scale systems, the overhead of publishing information to a topic can still overload the system.

yiannisbot marked this conversation as resolved.
Show resolved Hide resolved
## State of the Art

> This survey on the State of the Art is not by any means complete, however, it should provide a good entry point to learn what are the existing work. If you have something that is fundamentally missing, please consider submitting a PR to augment this survey.

### Within the libp2p Ecosystem
> Existing attempts and strategies

##### libp2p pubsub (floodsub, gossipsub & episub)

- https://github.com/libp2p/specs/blob/master/pubsub/README.md
- https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md
- https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/episub.md

yiannisbot marked this conversation as resolved.
Show resolved Hide resolved
<strong> Gossip Protocols </strong>

Gossipsub is a gossip-based pubsub system developed within libp2p. Gossipsub borrows concepts from related literature (see list below) and blends them together to produce an efficient pubsub protocol. Traditionally, the concept behind gossiping is to address the issue of load-balancing between all nodes forwarding messages in the system. According to gossiping, once a node receives a message, it does not broadcast the message directly to all nodes subscribed to some topic, but instead it is choosing a fraction of nodes (we can use *t* to denote the number of nodes chosen) to distribute the message. In turn, those *t* nodes are choosing a further *t* nodes to distribute the message further. Clearly, receiving the message more than once is perfectly possible in gossiping systems. If a node receives a message twice, it discards the second (and any subsequent) message it receives.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation needs to have a greater focus on what libp2p-gossipsub today is rather than the general case of gossiping in the literature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done.

yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

The intrinsic redundancy inserted in the system through gossiping is improving the resilience of the system. At the same time, in order to maintain redundancy, every node in the system has to keep membership information for the entire system. Although gossip-based protocols reduce the stress put in the system in terms of bandwidth and connectivity requirements, it clearly poses scalability concerns due to the state that all nodes need to keep.
yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

In order to overcome those issues, several gossip systems implement what is called *partial views*, according to which the following strategies can be used to propagate messages throughout the network:
yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

- <strong> Eager push: </strong> This is the traditional approach, where once nodes receive a message they forward the message (together with its payload) to a random set of *t* peers.
- <strong> Pull: </strong> Nodes interested in a set of topics periodically send request messages to random nodes to inquire about newly received messages. If queried nodes have updates in the topic specified by the subscriber node, they forward the message to this nodee.
- <strong> Lazy push: </strong> When a node (from within the *t* group of another node) receives a message, it forwards a message identifier (i.e., *not the payload of the message*) to a number of random peers. If those nodes have not already received the message, they send a subsequent pull request to get the full payload of the message.

*Tradeoffs*

As usual with most communication systems, there are tradeoffs in the design choices, which affect the performance and resilience of the system. Inserting more traffic in the system (e.g., *eager push*) increases bandwidth requirements, but achieve higher reliability and lower latency. On the other hand, more relaxed approaches (e.g., *lazy push*) reduce both bandwidth requirements, but also load on end nodes.

Related Literature

- Epidemic Broadcast Trees, 2007, DOI: 10.1109/SRDS.2007.27, http://www.gsd.inesc-id.pt/~ler/docencia/rcs1617/papers/srds07.pdf
- HyParView: a membership protocol for reliable gossip-based broadcast, 2007, DOI: 10.1109/DSN.2007.56, http://asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf
- GoCast: Gossip-enhanced Overlay Multicast for Fast and Dependable Group Communication, 2005, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.4811&rep=rep1&type=pdf
- Exposing and eliminating vulnerabilities to denial of service attacks in secure gossip-based multicast, DSN 2004
- Emergent structure in unstructured epidemic multicast, DSN 2007


##### peer-base collaboration messaging protocol (Dias Peer Set)

- https://github.com/peer-base/peer-base/blob/master/docs/PROTOCOL.md#peer-base---protocol-explanation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs quick explanation (1 paragraph)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a well-known protocol? Or is it/has it been used in libp2p? I can't see anything special about it and the RFP is getting long..


### Within the broad Research Ecosystem
> How do people try to solve this problem?

- We've collected a vast amount of Research Material around PubSub. It can be found at https://github.com/libp2p/research-pubsub

Related literature in the broader research community has worked towards two incarnations of pubsub systems: i) content-based pubsub and ii) topic-based pubsub.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more, for example type, concept and also the topic base can be expanded to tag base.

Here is a quick list of many of the systems I compiled earlier:

Subject/Topic/Tag based - Lists of subscribers per subject/topic/tag

  • IP Multicast
  • BMTP
  • SRM Framework
  • Scribe (on top of Pastry)
  • Bayeux
  • Rappel
  • Tera
  • SpiderCast

Content Based Subscriptions (apply filters/matching to forward messages)

  • Pragmatic General Multicast
  • Elvin
  • SIENA
  • Provides a Scalable and efficient pubsub - G. Banavar, T. Chandra, B. Mukherjee, J. Nagarajarao, R. E. Strom, and D. C. Sturman, “An efficient multicast protocol for content-based publish-subscribe systems,” presented at the Proceedings. 19th IEEE International Conference on Distributed Computing Systems, 1999, pp. 262–272.
  • Gryphon
  • JEDI
  • Hermes
  • Rebecca
  • Meghdoot
  • PHT
  • O-P DHT
  • PastryStrings
  • Sub-2-Sub

Type based- middle ground approach between the two above

Concept/Semantics based - XML and semantic stuff everywhere and on the fly, more flexible but also more complex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included mention of most of these. Can add some survey papers.


<strong> Content-Based Pubsub </strong>

In content-based pubsub, newly published content is tagged with a set of attribute/value pairs. In turn, subscriptions are expressed as predicates of the attributes. When predicates match attributes, the subscriber receives the information.

Related literature

- Pietzuch, P., Bacon, J.: Hermes: a distributed event-based middleware architecture. In: ICDCS Workshops. pp. 611–618 (2002)
- Rosenblum, D.S., Wolf, A.L.: A design framework for internet-scale event observation and notification. In: ESEC. pp. 344–360 (1997)
- Ahmed, N., Linderman, M., Bryant, J.: Papas: Peer assisted publish and subscribe. In: Workshop on Middleware for Next Generation Internet Computing (MW4NG). pp. 7:1–7:6 (2012)
- Li, M., Ye, F., Kim, M., Chen, H., Lei, H.: A scalable and elastic publish/subscribe service. In: IPDPS. pp. 1254–1265 (2011)
- Zhang, B., Jin, B., Chen, H., Qin, Z.: Empirical evaluation of contentbased pub/sub systems over cloud infrastructure. In: Intl. Conference on Embedded and Ubiquitous Computing (EUC). pp. 81–88 (2010)

<strong> Topic-Based Pubsub </strong>

In contrast, in topic-based pubsub systems, subscribers declare interest in some topic. Publishers add/tag topics to their publication and those topics that match subscribers' interests are broadcast to them. Depending on the nature of the application running on top, topic-based pubsub can reduce the complexity of the system.

Related Literature

- Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.M.: The many faces of publish/subscribe. ACM Comput. Surv. 35(2), 114–131 (2003)
- Zhao, Y., Kim, K., Venkatasubramanian, N.: Dynatops: A dynamic topicbased publish/subscribe architecture. In: DEBS. pp. 75–86 (2013)
- Julien Gascon-Samson, Franz-Philippe Garcia, Bettina Kemme, Jörg Kienzle, Dynamoth: A Scalable Pub/Sub Middleware for Latency-Constrained Applications in the Cloud, IEEE ICDCS 2015

yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

### Known shortcommins of existing solutions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er, you probably mean _shortcomings`.

> What are the limitations on those solutions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The known shortcomings is currently empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One that should be highlighted is that almost no PubSub literature covers Tree Forming, they all mostly assume that the number of peers is consistent (no churn) and that they can dictate the spanning tree to their own design at any moment in time (without contemplating the challenges with having an accurate perspective of the network).

It would be valuable to highlight some papers that focus on efficient routing at the underlay level (i.e. OSPF, RIP, etc) to contrast and take inspiration from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One that should be highlighted is that almost no PubSub literature covers Tree Forming, they all mostly assume that the number of peers is consistent (no churn) and that they can dictate the spanning tree to their own design at any moment in time (without contemplating the challenges with having an accurate perspective of the network).

To my knowledge, this is mostly true for structured, managed and mostly cloud-based pub/sub systems. Some of this has now been discussed together with the differences to unstructured systems.

It would be valuable to highlight some papers that focus on efficient routing at the underlay level (i.e. OSPF, RIP, etc) to contrast and take inspiration from.

I didn't do that at this point. I don't have a clear view of how to link between routing in the underlay and overlay. E.g., run dijkstra on the overlay? Or take routing input from underlay protocols and optimise overlay routes?

## Solving this Open Problem

As mentioned earlier, there are several tradeoffs at play in the design of the system. Those tradeoffs are made more serious as scalability requirements come into the picture, that is, as the protocols is requested to serve orders of magnitude more users and more pubsub topics. Below, we provide a very brief description of the main issues that a sophisticated pubsub protocol needs to be able to deal with.

- *Load-balancing:* Keeping membership state and forwarding pubsub messages is loading both the memory and communication/networking requirements of a node. This is especially so for p2p systems, where end-nodes are not necessarily powerful servers. Furthermore, as some content is becoming popular, more load is put on the nodes that are relaying those messages. *A sophisticated (gossiping) pubsub protocol needs to be able to balance load among nodes.*

- *Latency:* Some applications require that messages are delivered to all nodes subscribed to a topic with the least possible delay. As pubsub systems are built as overlays on top of the physical Internet infrastructure, the underlying hop-count does not necessarily correspond to the overlay picture. Furthermore, approaches such as "eager-push" or "flooding" can reduce the delivery latency, but increase bandwidth requirements.

- *Authentication:* Whether a pubsub system is open to the public or not, there needs to be some authentication to those that publish to specific topics/channels. As such, there has been discussion (e.g., in https://github.com/ipfs/notes/issues/236) about a pubsub authentication API. According to this, every topic is signed by a public key. Anyone can subscribe to this key, but those that want to publish information to this key/topic need to sign the content with the corresponding private key. In case of a private pubsub system, content can be encrypted and the corresponding keys to decrypt the content should be shared with those that are allowed access to the topics. *Content published in pubsub systems need to be authenticated and in case of a private pubsub system the content itself needs to be encrypted using authenticated encryption.*

- *Scalability:* The ultimate issue that comprises a challenge of its own is to be able to scale up and support orders of magnitude more users/nodes. As more nodes join the system, both bandwidth and networking resources increase accordingly. That said, the scalability challenge encompasses all of the issues discussed above. *A sophisticated pubsub system should be able to support orders of magnitude more nodes, but at the same time take care of load-balancing between nodes and latency requirements of corresponding applications.*

### What is the impact

As the IPFS network grows and dependency on underlying libp2p (and supporting protocols) intensifies, we need to make sure that the design of the protocols is able to scale up and maintain performance.

- Where is gossipsub used within libp2p/IPFS?
- Which external systems use gossipsub?
yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

### What defines a complete solution?
> What hard constraints should it obey? Are there additional soft constraints that a solution would ideally obey?

Support for multiple times of usage:
- Blog
- Social Network
- Chat
- Newschannel
- Video Broadcast
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be expanded to inspire the reader to think about the scales that we are thinking about.


## Other

### Existing Conversations/Threads

- [Authenticated PubSub](https://github.com/ipfs/notes/issues/236)
- [pub/sub - publish / subscribe](https://github.com/ipfs/notes/issues/64)
- [Messaging on IPFS](https://github.com/ipfs/notes/issues/33)
- [PubSub notes from the IPFS Workshop - recursion and corecursion, PubSub API and Self Certified Streams](https://github.com/ipfs/notes/issues/154)
- [PulsarCast M.Sc Thesis - Scaling libp2p PubSub](https://github.com/ipfs/notes/issues/266)

### Extra notes