Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage #16421

eolivelli · 2022-07-06T13:34:51Z

I am doing some testing on Shared subscription and Batch messages with the current Pulsar master.

The behaviour that I am observing is that when you have Batch messages the Consumer is sending flow control messages for more messages that it can handle.

This is how to reproduce the problem:

write 100.000 messages using batching
start a Consumer with a Shared subscription (from the beginning of the topic)
you will see that the PersistentDispatcherMultipleConsumers consumerFlow trigger the read of many messages

This is happening because consumerFlow calls readMoreEntries() and readMoreEntries() sees that there are messages to be re-delivered, because the consumer still haven't acknowledged them.

This is turn requests the ManagedCursor to read the data from storage.

I have observed this behaviour while working on offloader performances, but it also happens with regular BK based ledgers.

This simple test case reproduces the problem, I append it to this test

pulsar/pulsar-broker/src/test/java/org/apache/pulsar/broker/admin/CreateSubscriptionTest.java

Line 66 in 1ba180c

public class CreateSubscriptionTest extends ProducerConsumerBase {

  @Test
   public void testConsumerFlowOnSharedSubscription() throws Exception {
       String topic = "persistent://my-property/my-ns/topic" + UUID.randomUUID();
       admin.topics().createNonPartitionedTopic(topic);
       String subName = "my-sub";
       int numMessages = 20_000;
       final CountDownLatch count = new CountDownLatch(numMessages);
       try (Consumer<byte[]> consumer = pulsarClient.newConsumer()
               .subscriptionMode(SubscriptionMode.Durable)
               .subscriptionType(SubscriptionType.Shared)
               .topic(topic)
               .subscriptionName(subName)
               .messageListener(new MessageListener<byte[]>() {
                   @Override
                   public void received(Consumer<byte[]> consumer, Message<byte[]> msg) {
                       //log.info("received {} - {}", msg, count.getCount());
                       consumer.acknowledgeAsync(msg);
                       count.countDown();
                   }
               })
               .subscribe();
            Producer<byte[]> producer = pulsarClient
               .newProducer()
               .blockIfQueueFull(true)
               .enableBatching(true)
               .topic(topic)
                    .create()) {
           consumer.pause();
           byte[] message = "foo".getBytes(StandardCharsets.UTF_8);
           List<CompletableFuture<?>> futures = new ArrayList<>();
           for (int i = 0; i < numMessages; i++) {
               futures.add(producer.sendAsync(message).whenComplete( (id,e) -> {
                   if (e != null) {
                       log.error("error", e);
                   }
               }));
               if (futures.size() == 1000) {
                   FutureUtil.waitForAll(futures).get();
                   futures.clear();
               }
           }
           producer.flush();
           consumer.resume();
           assertTrue(count.await(20, TimeUnit.SECONDS));
       }

   }

The text was updated successfully, but these errors were encountered:

eolivelli · 2022-07-06T13:55:01Z

The problem is here, when you use MessageListener we increase permits of 1, even for Messages that are part of a batch.

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java

Line 1087 in 926834e

protected void callMessageListener(Message<T> msg) {

So the problem is mostly with the MessageListener API, used by pulsar-perf (and also OpenMessaging benchmark)

eolivelli · 2022-07-07T10:23:11Z

with batch messages we fall into this case:

pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java

Line 625 in 8b1d7a0

if (entriesToDispatch > 0) {

basically the dispatcher reads N entries from Storage (BK or Offloader) and the Consumer does not have enough permits (because we sent entries with multiple messages), so we are throwing away part of the entries that we read.

Those entries are added to the list of "messages to replay" and so they will be read again from storage next time.

eolivelli · 2022-07-07T10:24:32Z

After doing some testing I have found that the problem is that the ManagedLedger EntryCache is evicted only by taking into account the "readPosition" of the cursor.
But the "messages to replay" are behind the "readPosition", so those entries are evicted from the cache

eolivelli · 2022-07-07T12:15:57Z

In order to solve the problem we should not evict from the cache the entries that are still in "messagesToReply", otherwise we will keep reading from BK (or from TieredStorage).

eolivelli · 2022-07-07T14:30:43Z

I believe that this implementation about cacheEvictionByMarkDeletedPosition will also help with this problem, because it won't let the "messages to replay" to be evicted from the cache.

#14985

github-actions · 2022-08-07T02:14:49Z

The issue had no activity for 30 days, mark with Stale label.

eolivelli added the type/bug The PR fixed a bug or issue reported a bug label Jul 6, 2022

sijie mentioned this issue Jul 6, 2022

ISSUE-16421: Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage streamnative/pulsar-archived#4484

Open

eolivelli mentioned this issue Jul 7, 2022

[pulsar-broker] Support caching to drain backlog consumers #12258

Merged

eolivelli mentioned this issue Jul 7, 2022

Write to cache entries read but not served to Consumers datastax/pulsar#102

Closed

eolivelli mentioned this issue Jul 12, 2022

Broker cache eviction evicts entries that haven't been read by active consumers #16054

Closed

github-actions bot added the Stale label Aug 7, 2022

lhotari added the triage/lhotari/important lhotari's triaging label for important issues or PRs label Nov 5, 2024

This was referenced Nov 5, 2024

[Enhancement] Improve Pulsar Broker cache defaults to get better out-of-the-box performance #23466

Open

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage #16421

Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage #16421

eolivelli commented Jul 6, 2022

eolivelli commented Jul 6, 2022

eolivelli commented Jul 7, 2022 •

edited

Loading

eolivelli commented Jul 7, 2022

eolivelli commented Jul 7, 2022

eolivelli commented Jul 7, 2022

github-actions bot commented Aug 7, 2022

Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage #16421

Shared subscription + Batch messages: consumerFlow trigger reading the same messages from storage #16421

Comments

eolivelli commented Jul 6, 2022

eolivelli commented Jul 6, 2022

eolivelli commented Jul 7, 2022 • edited Loading

eolivelli commented Jul 7, 2022

eolivelli commented Jul 7, 2022

eolivelli commented Jul 7, 2022

github-actions bot commented Aug 7, 2022

eolivelli commented Jul 7, 2022 •

edited

Loading