|
| 1 | +--- |
| 2 | +id: cookbooks-deduplication |
| 3 | +title: Message deduplication |
| 4 | +sidebar_label: Message deduplication |
| 5 | +--- |
| 6 | + |
| 7 | +import Tabs from '@theme/Tabs'; |
| 8 | +import TabItem from '@theme/TabItem'; |
| 9 | + |
| 10 | + |
| 11 | +When **Message deduplication** is enabled, it ensures that each message produced on Pulsar topics is persisted to disk *only once*, even if the message is produced more than once. Message deduplication is handled automatically on the server side. |
| 12 | + |
| 13 | +To use message deduplication in Pulsar, you need to configure your Pulsar brokers and clients. |
| 14 | + |
| 15 | +## How it works |
| 16 | + |
| 17 | +You can enable or disable message deduplication at the namespace level or the topic level. By default, it is disabled on all namespaces or topics. You can enable it in the following ways: |
| 18 | + |
| 19 | +* Enable deduplication for all namespaces/topics at the broker-level. |
| 20 | +* Enable deduplication for a specific namespace with the `pulsar-admin namespaces` interface. |
| 21 | +* Enable deduplication for a specific topic with the `pulsar-admin topics` interface. |
| 22 | + |
| 23 | +## Configure message deduplication |
| 24 | + |
| 25 | +You can configure message deduplication in Pulsar using the [`broker.conf`](reference-configuration.md#broker) configuration file. The following deduplication-related parameters are available. |
| 26 | + |
| 27 | +Parameter | Description | Default |
| 28 | +:---------|:------------|:------- |
| 29 | +`brokerDeduplicationEnabled` | Sets the default behavior for message deduplication in the Pulsar broker. If it is set to `true`, message deduplication is enabled on all namespaces/topics. If it is set to `false`, you have to enable or disable deduplication at the namespace level or the topic level. | `false` |
| 30 | +`brokerDeduplicationMaxNumberOfProducers` | The maximum number of producers for which information is stored for deduplication purposes. | `10000` |
| 31 | +`brokerDeduplicationEntriesInterval` | The number of entries after which a deduplication informational snapshot is taken. A larger interval leads to fewer snapshots being taken, though this lengthens the topic recovery time (the time required for entries published after the snapshot to be replayed). | `1000` |
| 32 | +`brokerDeduplicationProducerInactivityTimeoutMinutes` | The time of inactivity (in minutes) after which the broker discards deduplication information related to a disconnected producer. | `360` (6 hours) |
| 33 | + |
| 34 | +### Set default value at the broker-level |
| 35 | + |
| 36 | +By default, message deduplication is *disabled* on all Pulsar namespaces/topics. To enable it on all namespaces/topics, set the `brokerDeduplicationEnabled` parameter to `true` and re-start the broker. |
| 37 | + |
| 38 | +Even if you set the value for `brokerDeduplicationEnabled`, enabling or disabling via Pulsar admin CLI overrides the default settings at the broker-level. |
| 39 | + |
| 40 | +### Enable message deduplication |
| 41 | + |
| 42 | +Though message deduplication is disabled by default at the broker level, you can enable message deduplication for a specific namespace or topic using the [`pulsar-admin namespaces set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) or the [`pulsar-admin topics set-deduplication`](reference-pulsar-admin.md#topic-set-deduplication) command. You can use the `--enable`/`-e` flag and specify the namespace/topic. |
| 43 | + |
| 44 | +The following example shows how to enable message deduplication at the namespace level. |
| 45 | + |
| 46 | +```bash |
| 47 | + |
| 48 | +$ bin/pulsar-admin namespaces set-deduplication \ |
| 49 | + public/default \ |
| 50 | + --enable # or just -e |
| 51 | + |
| 52 | +``` |
| 53 | + |
| 54 | +### Disable message deduplication |
| 55 | + |
| 56 | +Even if you enable message deduplication at the broker level, you can disable message deduplication for a specific namespace or topic using the [`pulsar-admin namespace set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) or the [`pulsar-admin topics set-deduplication`](reference-pulsar-admin.md#topic-set-deduplication) command. Use the `--disable`/`-d` flag and specify the namespace/topic. |
| 57 | + |
| 58 | +The following example shows how to disable message deduplication at the namespace level. |
| 59 | + |
| 60 | +```bash |
| 61 | + |
| 62 | +$ bin/pulsar-admin namespaces set-deduplication \ |
| 63 | + public/default \ |
| 64 | + --disable # or just -d |
| 65 | + |
| 66 | +``` |
| 67 | + |
| 68 | +## Pulsar clients |
| 69 | + |
| 70 | +If you enable message deduplication in Pulsar brokers, you need complete the following tasks for your client producers: |
| 71 | + |
| 72 | +1. Specify a name for the producer. |
| 73 | +1. Set the message timeout to `0` (namely, no timeout). |
| 74 | + |
| 75 | +The instructions for Java, Python, and C++ clients are different. |
| 76 | + |
| 77 | +<Tabs |
| 78 | + defaultValue="Java clients" |
| 79 | + values={[ |
| 80 | + { |
| 81 | + "label": "Java clients", |
| 82 | + "value": "Java clients" |
| 83 | + }, |
| 84 | + { |
| 85 | + "label": "Python clients", |
| 86 | + "value": "Python clients" |
| 87 | + }, |
| 88 | + { |
| 89 | + "label": "C++ clients", |
| 90 | + "value": "C++ clients" |
| 91 | + } |
| 92 | +]}> |
| 93 | +<TabItem value="Java clients"> |
| 94 | + |
| 95 | +To enable message deduplication on a [Java producer](client-libraries-java.md#producers), set the producer name using the `producerName` setter, and set the timeout to `0` using the `sendTimeout` setter. |
| 96 | + |
| 97 | +```java |
| 98 | + |
| 99 | +import org.apache.pulsar.client.api.Producer; |
| 100 | +import org.apache.pulsar.client.api.PulsarClient; |
| 101 | +import java.util.concurrent.TimeUnit; |
| 102 | + |
| 103 | +PulsarClient pulsarClient = PulsarClient.builder() |
| 104 | + .serviceUrl("pulsar://localhost:6650") |
| 105 | + .build(); |
| 106 | +Producer producer = pulsarClient.newProducer() |
| 107 | + .producerName("producer-1") |
| 108 | + .topic("persistent://public/default/topic-1") |
| 109 | + .sendTimeout(0, TimeUnit.SECONDS) |
| 110 | + .create(); |
| 111 | + |
| 112 | +``` |
| 113 | + |
| 114 | +</TabItem> |
| 115 | +<TabItem value="Python clients"> |
| 116 | + |
| 117 | +To enable message deduplication on a [Python producer](client-libraries-python.md#producers), set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. |
| 118 | + |
| 119 | +```python |
| 120 | + |
| 121 | +import pulsar |
| 122 | + |
| 123 | +client = pulsar.Client("pulsar://localhost:6650") |
| 124 | +producer = client.create_producer( |
| 125 | + "persistent://public/default/topic-1", |
| 126 | + producer_name="producer-1", |
| 127 | + send_timeout_millis=0) |
| 128 | + |
| 129 | +``` |
| 130 | +</TabItem> |
| 131 | +<TabItem value="C++ clients"> |
| 132 | + |
| 133 | +To enable message deduplication on a [C++ producer](client-libraries-cpp.md#producer), set the producer name using `producer_name`, and set the timeout to `0` using `send_timeout_millis`. |
| 134 | + |
| 135 | +```cpp |
| 136 | + |
| 137 | +#include <pulsar/Client.h> |
| 138 | + |
| 139 | +std::string serviceUrl = "pulsar://localhost:6650"; |
| 140 | +std::string topic = "persistent://some-tenant/ns1/topic-1"; |
| 141 | +std::string producerName = "producer-1"; |
| 142 | + |
| 143 | +Client client(serviceUrl); |
| 144 | + |
| 145 | +ProducerConfiguration producerConfig; |
| 146 | +producerConfig.setSendTimeout(0); |
| 147 | +producerConfig.setProducerName(producerName); |
| 148 | + |
| 149 | +Producer producer; |
| 150 | + |
| 151 | +Result result = client.createProducer(topic, producerConfig, producer); |
| 152 | + |
| 153 | +``` |
| 154 | +</TabItem> |
| 155 | +
|
| 156 | +</Tabs> |
0 commit comments