[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4

zakhaev26 · 2023-12-18T14:53:32Z

The goal of the GCSB project is to develop a robust system with multiple independent and decoupled APIs for sports. Currently, Server-Sent Events (SSE) have been identified as a suitable choice for achieving real-time comm. from server-->client due to their lightweight nature and ease of setup,but there are certain concerns that needs to be solved.

Key Concerns:

API Design Uniformity:
How should we design the APIs to ensure uniformity across the project? Should we opt for a single SSE or individual SSE for each API ?
We are aiming for a uniform approach that can enhance consistency and ease maintenance.

Single vs. Individual SSE: Trade-offs between having a single SSE for all APIs or individual SSE for each API.
Design Principles: We need to brainstorm on design principles to be followed for API endpoints, naming conventions, and response formats.

Need for Kafka/Similar Queue or Pub-Sub Messaging:

Question: Considering a maximum of 1000 concurrent users at worst case, do we really need a queuing or pub-sub architecture like Kafka/RabbitMQ? What are the pros and cons? Can SSE alone can handle the expected load or if a more scalable solution is necessary?

Please share your thoughts, concerns, and suggestions regarding the API design uniformity and the need for a queuing/pub-sub architecture in this issue thread.

Consider this as a High Priority Issue.

majorbruteforce · 2023-12-18T15:02:14Z

Kafka or other pub-sub services facilitate high throughput read and writes between 'systems'. They don't inherently support the client-serving server to deal with a large number of requests (they are probably not made for that purpose). Keeping the project dynamics in mind, we in fact don't have a large throughput to deal with; rather, we need to efficiently serve the data to a large volume of clients concurrently, reliably and in real-time. In my opinion, we should look into load balancing the servers and scaling them when require along with the usage of Redis to cache the data. We can build a system and test it using something like Apache JMeter. We should also test to first determine the degree of resilience our system requires so we don't overengineer it.

punitkr03 · 2023-12-18T15:14:24Z

@majorbruteforce I was having the same thought. We can work with caching as we do not need to have high data throughput. We will keep the cache in sync with the database in set intervals thus reducing complexity.

zakhaev26 · 2023-12-19T07:45:12Z

We have 2 choices as of now that cater to this need :
M1. Using Database Changestreams to track changes and emit those via SSE Server
M2. Having a Centralized Kafka Cluster having multiple kafka servers - Producers(admins) publishing messages to the partitions and consumers picking them up , process em up and emit via SSE.

Pros of M1 : Simple to setup
Cons of M1: Scalability

Pros of M2: Reliable,Fault tolerant,scalable,can help in making unified system
Cons of M2: Hard Learning curve, Maintainence overheads.
Eg : A 2 sport pub-sub system:

zakhaev26 · 2023-12-19T07:48:57Z

I beleive ~1k connections can be managed by both the Methods.
According to me , we should prioritize building the system with MongoDB Changestreams as of now as scalability is not really our need.
We might not want to overengineer and complicate things,although if there is a requirement,please tell
@majorbruteforce @punitkr03 Your thoughts?

punitkr03 · 2023-12-19T08:29:58Z

Seems complicated. Let me do some research.

punitkr03 · 2023-12-19T17:51:15Z

@majorbruteforce @zakhaev26 I did some digging and according to me using kafka will ensure stability in the long run if the project scales up. So it can be redundant. Also we have much time to implement it. I am all in on kafka implementation.

zakhaev26 · 2023-12-20T14:06:08Z

I am also interested in using Kafka..
@majorbruteforce @Brijendra-Singh2003 ?

zakhaev26 · 2023-12-20T23:39:02Z

Did some benchmarks to test out reliability of Changestreams v/s Kafka
These runs were performed on :

4 Core i5-6200U CPU @ 2.30GHz Ubuntu 22.04.3 LTS
Go v1.21.5
Kafka 3.6.1
Zookeeper 3.9.1

Test Scenario:

Concurrent writes to a MongoDB collection monitored by Change Streams.
Concurrent writes to a MongoDB & Concurrent publishing of messages in Apache Kafka
NOTE:Responses (inserted output) was serialized in JSON and sent to client + I used Postman Runner to perform 100 Iterations with a Delay of 0ms for both the cases

Outcome :

Functional Benchmarks :

Apache Kafka Avg Response Time : ~364ms with all iterations being a success

Mongo Changestreams Avg Response Time:
Trial 1: ~741ms. [Freezes & Fails after 8 Iterations ]

Trial 1: ~693ms. [Freezes & Fails after 16 Iterations ]

Performance Benchmarks :

10 VUs for 1min Fixed Load

Kafka :

ChangeStreams :

100 VUs for 1min Fixed Load

Kafka
(My Laptop shuts down everytime I perform the test. XD)
But a Screenshot towards the end of test:
Avg Time : 1755ms
Mongo Changestreams
Avg Time : 2946ms

Does this mean we shouldn't choose changestreams over kafka ? Nope
As these are Admin Updates.I don't think there would be 100 Admins / even 2 Admins at a single session to Upload Scores in Server.In that case Both are doable

But after these tests kafka seems good

@majorbruteforce @punitkr03 @Brijendra-Singh2003

I definitely didn't feel like primeagen after performing these tests :p

Source : https://github.com/zakhaev26/microservices-go

zakhaev26 · 2023-12-20T23:52:45Z

P.S: I did try out the most popular Kafka Library for JS , but it was slower to interact with Kafka IMO, whereas the confluentinc-kafka library for golang felt way more faster, even on a single thread.

So even if we are planning to use kafka , we need to make sure it works fine / manageable with Node so that devs can work with JS as well
Performance metrics alone don't tell the full story; practical integration and developer experience matters

majorbruteforce · 2023-12-21T12:25:40Z

I am also interested in using Kafka.. @majorbruteforce @Brijendra-Singh2003 ?

I am looking to create a load balanced system that caches the data using Redis. I will try to test how many SSE connections a server with standard specifications can handle.

zakhaev26 · 2023-12-24T23:06:39Z

What's the progress Jesse? @majorbruteforce 🕺

majorbruteforce · 2023-12-25T03:55:59Z

While building a two-layer system with a cache layer, I realized it makes no sense to use a cache for an application which has to update data constantly. Revalidating the cache so frequently is no better than broadcasting changes directly from change streams. I am trying to test a few mores ways like polling to see how they compare. I will run some benchmarks and start working on building the main APIs soon.

majorbruteforce · 2023-12-25T03:59:20Z

Also, @zakhaev26 try running the benchmarks for read operations once. I will do the same. The system is going to have bulk reads rather than writes.

zakhaev26 · 2023-12-25T08:54:59Z

The system is going to have bulk reads rather than writes.

Genuine

punitkr03 · 2023-12-26T10:41:37Z

A basic implementation of chess api architecture.

zakhaev26 · 2023-12-29T06:52:38Z

I used Sarama Library instead of Confluentinc one for interacting with Kafka,I felt it is a more reliable way of use producer + subscriber and is extremely fast
Look into it if writing in Go : Sarama

majorbruteforce · 2024-03-02T20:13:50Z

Kafka is tried and the best option to proceed with for events streaming as tested by @zakhaev26. Discussions regarding implementation of the same will be done on #38 from hereon.

zakhaev26 assigned majorbruteforce and punitkr03 and unassigned punitkr03 and majorbruteforce Dec 18, 2023

zakhaev26 added server P-high discussion help wanted Extra attention is needed question Further information is requested labels Dec 18, 2023

zakhaev26 assigned Brijendra-Singh2003 Dec 20, 2023

majorbruteforce mentioned this issue Mar 2, 2024

[FEAT] : Implementation of event streaming through Kafka #38

Open

majorbruteforce closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4

[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4

zakhaev26 commented Dec 18, 2023 •

edited

Loading

majorbruteforce commented Dec 18, 2023 •

edited

Loading

punitkr03 commented Dec 18, 2023

zakhaev26 commented Dec 19, 2023 •

edited

Loading

zakhaev26 commented Dec 19, 2023 •

edited

Loading

punitkr03 commented Dec 19, 2023

punitkr03 commented Dec 19, 2023

zakhaev26 commented Dec 20, 2023 •

edited

Loading

zakhaev26 commented Dec 20, 2023 •

edited

Loading

zakhaev26 commented Dec 20, 2023

majorbruteforce commented Dec 21, 2023 •

edited

Loading

zakhaev26 commented Dec 24, 2023

majorbruteforce commented Dec 25, 2023

majorbruteforce commented Dec 25, 2023

zakhaev26 commented Dec 25, 2023

punitkr03 commented Dec 26, 2023

zakhaev26 commented Dec 29, 2023 •

edited

Loading

majorbruteforce commented Mar 2, 2024

[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4

[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4

Comments

zakhaev26 commented Dec 18, 2023 • edited Loading

Key Concerns:

majorbruteforce commented Dec 18, 2023 • edited Loading

punitkr03 commented Dec 18, 2023

zakhaev26 commented Dec 19, 2023 • edited Loading

zakhaev26 commented Dec 19, 2023 • edited Loading

punitkr03 commented Dec 19, 2023

punitkr03 commented Dec 19, 2023

zakhaev26 commented Dec 20, 2023 • edited Loading

zakhaev26 commented Dec 20, 2023 • edited Loading

Outcome :

10 VUs for 1min Fixed Load

100 VUs for 1min Fixed Load

zakhaev26 commented Dec 20, 2023

majorbruteforce commented Dec 21, 2023 • edited Loading

zakhaev26 commented Dec 24, 2023

majorbruteforce commented Dec 25, 2023

majorbruteforce commented Dec 25, 2023

zakhaev26 commented Dec 25, 2023

punitkr03 commented Dec 26, 2023

zakhaev26 commented Dec 29, 2023 • edited Loading

majorbruteforce commented Mar 2, 2024

zakhaev26 commented Dec 18, 2023 •

edited

Loading

majorbruteforce commented Dec 18, 2023 •

edited

Loading

zakhaev26 commented Dec 19, 2023 •

edited

Loading

zakhaev26 commented Dec 19, 2023 •

edited

Loading

zakhaev26 commented Dec 20, 2023 •

edited

Loading

zakhaev26 commented Dec 20, 2023 •

edited

Loading

majorbruteforce commented Dec 21, 2023 •

edited

Loading

zakhaev26 commented Dec 29, 2023 •

edited

Loading