You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: faq.md
+12-12
Original file line number
Diff line number
Diff line change
@@ -94,8 +94,8 @@ There is regex subscription coming up in Pulsar 2.0. See [PIP-13](https://github
94
94
### Does Pulsar have, or plan to have, a concept of log compaction where only the latest message with the same key will be kept ?
95
95
Yes, see [PIP-14](https://github.com/apache/pulsar/wiki/PIP-14:-Topic-compaction) for more details.
96
96
97
-
### When I use an exclusive subscription to a partitioned topic, is the subscription attached to the "whole topic" or to a "topic partition"?
98
-
On a partitioned topic, you can use all the 3 supported subscription types (exclusive, failover, shared), same as with non partitioned topics.
97
+
### When I use an exclusive subscription to a partitioned topic, is the subscription attached to the "whole topic" or to a "topic partition"?
98
+
On a partitioned topic, you can use all the 3 supported subscription types (exclusive, failover, shared), same as with non partitioned topics.
99
99
The “subscription” concept is roughly similar to a “consumer-group” in Kafka. You can have multiple of them in the same topic, with different names.
100
100
101
101
If you use “exclusive”, a consumer will try to consume from all partitions, or fail if any partition is already being consumed.
@@ -105,7 +105,7 @@ The mode similar to Kafka is “failover” subscription. In this case, you have
105
105
### What is the proxy component?
106
106
It’s a component that was introduced recently. Essentially it’s a stateless proxy that speaks that Pulsar binary protocol. The motivation is to avoid (or overcome the impossibility) of direct connection between clients and brokers.
107
107
108
-
---
108
+
---
109
109
110
110
## Usage and Configuration
111
111
### Can I manually change the number of bundles after creating namespaces?
@@ -119,7 +119,7 @@ Yes, you can use the cli tool `bin/pulsar-admin persistent unsubscribe $TOPIC -s
119
119
120
120
### How are subscription modes set? Can I create new subscriptions over the WebSocket API?
121
121
Yes, you can set most of the producer/consumer configuration option in websocket, by passing them as HTTP query parameters like:
see [the doc](http://pulsar.apache.org/docs/latest/clients/WebSocket/#RunningtheWebSocketservice-1fhsvp).
125
125
@@ -153,7 +153,7 @@ There is no currently "infinite" retention, other than setting to very high valu
153
153
The key is that you should use different subscriptions for each consumer. Each subscription is completely independent from others.
154
154
155
155
### The default when creating a consumer, is it to "tail" from "now" on the topic, or from the "last acknowledged" or something else?
156
-
So when you spin up a consumer, it will try to subscribe to the topic, if the subscription doesn't exist, a new one will be created, and it will be positioned at the end of the topic ("now").
156
+
So when you spin up a consumer, it will try to subscribe to the topic, if the subscription doesn't exist, a new one will be created, and it will be positioned at the end of the topic ("now").
157
157
158
158
Once you reconnect, the subscription will still be there and it will be positioned on the last acknowledged messages from the previous session.
159
159
@@ -190,16 +190,16 @@ What’s your use case for timeout on the `receiveAsync()`? Could that be achiev
190
190
### Why do we choose to use bookkeeper to store consumer offset instead of zookeeper? I mean what's the benefits?
191
191
ZooKeeper is a “consensus” system that while it exposes a key/value interface is not meant to support a large volume of writes per second.
192
192
193
-
ZK is not an “horizontally scalable” system, because every node receive every transaction and keeps the whole data set. Effectively, ZK is based on a single “log” that is replicated consistently across the participants.
193
+
ZK is not an “horizontally scalable” system, because every node receive every transaction and keeps the whole data set. Effectively, ZK is based on a single “log” that is replicated consistently across the participants.
194
194
195
-
The max throughput we have observed on a well configured ZK on good hardware was around ~10K writes/s. If you want to do more than that, you would have to shard it..
195
+
The max throughput we have observed on a well configured ZK on good hardware was around ~10K writes/s. If you want to do more than that, you would have to shard it..
196
196
197
197
To store consumers cursor positions, we need to write potentially a large number of updates per second. Typically we persist the cursor every 1 second, though the rate is configurable and if you want to reduce the amount of potential duplicates, you can increase the persistent frequency.
198
198
199
199
With BookKeeper it’s very efficient to have a large throughput across a huge number of different “logs”. In our case, we use 1 log per cursor, and it becomes feasible to persist every single cursor update.
200
200
201
201
### I'm facing some issue using `.receiveAsync` that it seems to be related with `UnAckedMessageTracker` and `PartitionedConsumerImpl`. We are consuming messages with `receiveAsync`, doing instant `acknowledgeAsync` when message is received, after that the process will delay the next execution of itself. In such scenario we are consuming a lot more messages (repeated) than the num of messages produced. We are using Partitioned topics with setAckTimeout 30 seconds and I believe this issue could be related with `PartitionedConsumerImpl` because the same test in a non-partitioned topic does not generate any repeated message.
202
-
PartitionedConsumer is composed of a set of regular consumers, one per partition. To have a single `receive()` abstraction, messages from all partitions are then pushed into a shared queue.
202
+
PartitionedConsumer is composed of a set of regular consumers, one per partition. To have a single `receive()` abstraction, messages from all partitions are then pushed into a shared queue.
203
203
204
204
The thing is that the unacked message tracker works at the partition level.So when the timeout happens, it’s able to request redelivery for the messages and clear them from the queue when that happens,
205
205
but if the messages were already pushed into the shared queue, the “clearing” part will not happen.
@@ -229,8 +229,8 @@ A final option is to check the topic stats. This is a tiny bit involved, because
229
229
There’s not currently an option for “infinite” (though it sounds a good idea! maybe we could use `-1` for that). The only option now is to use INT_MAX for `retentionTimeInMinutes` and LONG_MAX for `retentionSizeInMB`. It’s not “infinite” but 4085 years of retention should probably be enough!
230
230
231
231
### Is there a profiling option in Pulsar, so that we can breakdown the time costed in every stage? For instance, message A stay in queue 1ms, bk writing time 2ms(interval between sending to bk and receiving ack from bk) and so on.
232
-
There are latency stats at different stages. In the client (eg: reported every 1min in info logs).
233
-
In the broker: accessible through the broker metrics, and finally in bookies where there are several different latency metrics.
232
+
There are latency stats at different stages. In the client (eg: reported every 1min in info logs).
233
+
In the broker: accessible through the broker metrics, and finally in bookies where there are several different latency metrics.
234
234
235
235
In broker there’s just the write latency on BK, because there is no other queuing involved in the write path.
236
236
@@ -242,7 +242,7 @@ you can create reader with `MessageId.earliest`
242
242
yes, broker performs auth&auth while creating producer/consumer and this information presents under namespace policies.. so, if auth is enabled then broker does validation
243
243
244
244
### From what I’ve seen so far, it seems that I’d instead want to do a partitioned topic when I want a firehose/mix of data, and shuffle that firehose in to specific topics per entity when I’d have more discrete consumers. Is that accurate?
245
-
Precisely, you can use either approach, and even combine them, depending on what is more convenient for the use case. The general traits to choose one or the other are:
245
+
Precisely, you can use either approach, and even combine them, depending on what is more convenient for the use case. The general traits to choose one or the other are:
246
246
247
247
- Partitions -> Maintain a single “logical” topic but scale throughput to multiple machines. Also, ability to consume in order for a “partition” of the keys. In general, consumers are assigned a partition (and thus a subset of keys) without specifying anything.
248
248
@@ -258,7 +258,7 @@ Main difference: a reader can be used when manually managing the offset/messageI
258
258
259
259
260
260
### Hey, question on routing mode for partitioned topics. What is the default configuration and what is used in the Kafka adaptor?
261
-
The default is to use the hash of the key on a message. If the message has no key, the producer will use a “default” partition (picks 1 random partition and use it for all the messages it publishes).
261
+
The default is to use the hash of the key on a message. If the message has no key, the producer will use a “default” partition (picks 1 random partition and use it for all the messages it publishes).
262
262
263
263
This is to maintain the same ordering guarantee when no partitions are there: per-producer ordering.
0 commit comments