Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a README explaining the backpressure #84

Merged
merged 2 commits into from
Feb 16, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/src/main/mdoc/technical-details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
id: technical-details
title: Technical Details
---

In the following sections, technical aspects of the library are detailed.

## Implementation Notes

Following are some general library implementation notes.

- The library relies on the Java Kafka client, and does not re-implement the Kafka client. In particular, the library implements a consumer threading model similar to the 'decouple consumption and processing' model described in the [documentation](https://kafka.apache.org/@KAFKA_DOCS_VERSION@/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html).

- Since the Java Kafka client is allowed to block (up to `pollTimeout` for `poll`), all Kafka client calls are run on a dedicated `ExecutionContext`. Unless explicitly provided when creating `ConsumerSettings`, a default single-threaded `ExecutionContext` will be created and used for this purpose.

## Consumer Streaming

To enable backpressured streaming of Kafka records, there are three pieces at work.

1. The 'consumer stream' continually issues fetch requests as long as there is demand. There is some room to issue fetch requests even when there isn't demand, to allow prefetching of records. This is controlled using the `maxPrefetchBatches` setting. Fetch requests will only be issued as long as processing is fast enough, so that less than `maxPrefetchBatches` batches are prefetched.
2. The 'consumer loop' handles fetch requests issued by the stream, as well as poll requests, and other requests (subscribe, seek, ...). Records are only fetched for topic-partitions where there is demand (pull-based). The consumer loop is mostly implemented by the internal `KafkaConsumerActor`.
3. The 'poll scheduler' which schedules poll requests via a poll queue (bounded to 1 element).

These pieces are assembled in `KafkaConsumer` and run concurrently in independent `Fiber`s.

### Backpressure

There are a few important points to understanding how the backpressure works.

- FS2 streams are pull-based, meaning there isn't a messaging system instructing up-stream to slow down. Instead, up-stream is asked to produce elements whenever down-stream is pulling. It's possible to run the consumer and producer independently, and use an asynchronous non-blocking queue for communication. The producer can then slow down when the consumer isn't fast enough.
- For performance reasons, we should not issue a request to Kafka for every record requested by down-stream. Instead, we fetch records in batches (size controlled by `max.poll.records`) and keep the records in memory until processed (either in `KafkaConsumerActor` or on a queue).
- Since the Java Kafka client `poll` blocks, we throttle poll requests to every `pollInterval`.
- The `KafkaConsumerActor` only requests records for partitions where there is a fetch request.
3 changes: 3 additions & 0 deletions website/i18n/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
},
"quick-example": {
"title": "Quick Example"
},
"technical-details": {
"title": "Technical Details"
}
},
"links": {
Expand Down
5 changes: 1 addition & 4 deletions website/sidebars.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
{
"docs": {
"Documentation": [
"overview",
"quick-example"
]
"Documentation": ["overview", "quick-example", "technical-details"]
}
}