-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] : Consideration of Using Kafka for Streaming Data to Clients #4
Comments
Kafka or other pub-sub services facilitate high throughput read and writes between 'systems'. They don't inherently support the client-serving server to deal with a large number of requests (they are probably not made for that purpose). Keeping the project dynamics in mind, we in fact don't have a large throughput to deal with; rather, we need to efficiently serve the data to a large volume of clients concurrently, reliably and in real-time. In my opinion, we should look into load balancing the servers and scaling them when require along with the usage of Redis to cache the data. We can build a system and test it using something like Apache JMeter. We should also test to first determine the degree of resilience our system requires so we don't overengineer it. |
@majorbruteforce I was having the same thought. We can work with caching as we do not need to have high data throughput. We will keep the cache in sync with the database in set intervals thus reducing complexity. |
I beleive ~1k connections can be managed by both the Methods. |
Seems complicated. Let me do some research. |
@majorbruteforce @zakhaev26 I did some digging and according to me using kafka will ensure stability in the long run if the project scales up. So it can be redundant. Also we have much time to implement it. I am all in on kafka implementation. |
I am also interested in using Kafka.. |
Did some benchmarks to test out reliability of Changestreams v/s Kafka
Test Scenario:
Outcome :
Apache Kafka Avg Response Time : ~364ms with all iterations being a success Mongo Changestreams Avg Response Time:
10 VUs for 1min Fixed Load
100 VUs for 1min Fixed Load
Does this mean we shouldn't choose changestreams over kafka ? Nope But after these tests kafka seems good @majorbruteforce @punitkr03 @Brijendra-Singh2003 I definitely didn't feel like primeagen after performing these tests :p Source : https://github.com/zakhaev26/microservices-go |
P.S: I did try out the most popular Kafka Library for JS , but it was slower to interact with Kafka IMO, whereas the So even if we are planning to use kafka , we need to make sure it works fine / manageable with Node so that devs can work with JS as well |
I am looking to create a load balanced system that caches the data using Redis. I will try to test how many SSE connections a server with standard specifications can handle. |
What's the progress Jesse? @majorbruteforce 🕺 |
While building a two-layer system with a cache layer, I realized it makes no sense to use a cache for an application which has to update data constantly. Revalidating the cache so frequently is no better than broadcasting changes directly from change streams. I am trying to test a few mores ways like polling to see how they compare. I will run some benchmarks and start working on building the main APIs soon. |
Also, @zakhaev26 try running the benchmarks for read operations once. I will do the same. The system is going to have bulk reads rather than writes. |
Genuine |
I used Sarama Library instead of Confluentinc one for interacting with Kafka,I felt it is a more reliable way of use producer + subscriber and is extremely fast |
Kafka is tried and the best option to proceed with for events streaming as tested by @zakhaev26. Discussions regarding implementation of the same will be done on #38 from hereon. |
The goal of the GCSB project is to develop a robust system with multiple independent and decoupled APIs for sports. Currently, Server-Sent Events (SSE) have been identified as a suitable choice for achieving real-time comm. from server-->client due to their lightweight nature and ease of setup,but there are certain concerns that needs to be solved.
Key Concerns:
How should we design the APIs to ensure uniformity across the project? Should we opt for a single SSE or individual SSE for each API ?
We are aiming for a uniform approach that can enhance consistency and ease maintenance.
Need for Kafka/Similar Queue or Pub-Sub Messaging:
Question: Considering a maximum of 1000 concurrent users at worst case, do we really need a queuing or pub-sub architecture like Kafka/RabbitMQ? What are the pros and cons? Can SSE alone can handle the expected load or if a more scalable solution is necessary?
Please share your thoughts, concerns, and suggestions regarding the API design uniformity and the need for a queuing/pub-sub architecture in this issue thread.
Consider this as a High Priority Issue.
The text was updated successfully, but these errors were encountered: