Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix schema deletion error when deleting a partitioned topic with many partitions and schema #21977

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

heesung-sn
Copy link
Contributor

@heesung-sn heesung-sn commented Jan 27, 2024

Potentially related to
#15267
#14592

Motivation

Broker returns this error, No such ledger exists on Metadata Server to client when deleting a partitioned topic with many partitions and schema.

The root cause is that

  • the current code tries to delete the same schema for each topic partition, when that topic partition's reference is not ready.
  • the delete schema operation is not idempotent enough, passing the 'No such ledger exists on Metadata Server' bk error to the client, when the schema is not found.

reproduce steps



bin/pulsar-admin namespaces create test/test-traffic --bundles 1
bin/pulsar-admin topics create-partitioned-topic persistent://test/test-traffic/test-topic -p 500
bin/pulsar-admin schemas upload --filename schema.json persistent://test/test-traffic/test-topic
bin/pulsar-admin topics delete-partitioned-topic persistent://test/test-traffic/test-topic



Message: No such ledger exists on Metadata Server -  ledger=xxxxx - operation=Failed to open ledger

Stacktrace:

org.apache.pulsar.broker.service.schema.exceptions.SchemaException: No such ledger exists on Metadata Server -  ledger=xxxx - operation=Failed to open ledger
        at org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorage.bkException(BookkeeperSchemaStorage.java:711)
        at org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorage.lambda$openLedger$41(BookkeeperSchemaStorage.java:605)
        at org.apache.bookkeeper.client.LedgerOpenOp.openComplete(LedgerOpenOp.java:260)
        at org.apache.bookkeeper.client.LedgerOpenOp.lambda$initiate$0(LedgerOpenOp.java:116)
        at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
        at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at org.apache.pulsar.metadata.bookkeeper.PulsarLedgerManager.lambda$readLedgerMetadata$2(PulsarLedgerManager.java:208)
        at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:269)
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840)


Reason:
 --- An unexpected error occurred in the server ---

Modifications

  • Return completed future in BrokerService.deleteSchema when deleting a topic partition since the schema deletion should only happen at the upper level, when the base (partitioned) topic is finally deleted.
  • Remove the redundant getSchema check in BrokerService.deleteSchema as the BookkeeperSchemaStorage.delete(String key) already checks the getSchema inside by default.
  • Add BK error NoSuchLedgerExistsOnMetadataServerException (-25) in the unrecoverable conditions when throwing SchemaException for better idempotency.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added test.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 27, 2024
… topic and when the topic's reference is not ready
@heesung-sn heesung-sn self-assigned this Jan 27, 2024
@heesung-sn heesung-sn changed the title [fix][broker] fixed schema deletion erorr when deleting a partitioned topic and when the topic's reference is not ready [fix][broker] Fix schema deletion error when deleting a partitioned topic with schema Jan 27, 2024
@heesung-sn heesung-sn changed the title [fix][broker] Fix schema deletion error when deleting a partitioned topic with schema [fix][broker] Fix schema deletion error when deleting a partitioned topic with many partitions and schema Jan 27, 2024
@heesung-sn
Copy link
Contributor Author

related to : #19882

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It better also update the comments at here

@heesung-sn
Copy link
Contributor Author

It better also update the comments at here

Updated the comment.

@heesung-sn heesung-sn merged commit 1b4127a into apache:master Jan 30, 2024
46 of 47 checks passed
@Technoboy-
Copy link
Contributor

Related fix #21574

Technoboy- pushed a commit that referenced this pull request Jan 31, 2024
@KannarFr
Copy link
Contributor

KannarFr commented Jan 31, 2024

Do __change_events topics involved? I have partitions enforced on the cluster, so __change_events are partitioned too.

Jan 31 14:24:55 yo-pulsar-broker-c3-n6 pulsar[86483]: 2024-01-31T14:24:55,666+0000 [pulsar-io-3-11] WARN  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://orga_15611b30-6e4e-44a5-a80f-309d277e27fb/logs/__change_events-partition-0] [multiTopicsReader-6a2bfce189] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"No such ledger exists on Metadata Server -  ledger=20202 - operation=Failed to open ledger","reqId":3644475206336716231, "remote":"yo-pulsar-broker-c3-n2/192.168.2.2:6650", "local":"/192.168.2.6:44728"}
Jan 31 14:24:55 yo-pulsar-broker-c3-n6 pulsar[86483]: 2024-01-31T14:24:55,666+0000 [pulsar-io-3-11] WARN  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://orga_15611b30-6e4e-44a5-a80f-309d277e27fb/logs/__change_events-partition-0] [multiTopicsReader-6a2bfce189] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"No such ledger exists on Metadata Server -  ledger=20202 - operation=Failed to open ledger","reqId":3644475206336716231, "remote":"yo-pulsar-broker-c3-n2/192.168.2.2:6650", "local":"/192.168.2.6:44728"} -

Running broker with this patch.

I'm also encountering

Jan 31 14:47:56 yo-pulsar-broker-c3-n1 pulsar[3975173]: 2024-01-31T14:47:56,905+0000 [broker-client-shared-internal-executor-5-1] ERROR org.apache.pulsar.broker.service.BrokerService - Topic creation encountered an exception by initialize topic policies service. topic_name=persistent://orga_19647583-2185-4e67-96e5-202846a9522b/logs/app_e208acde-73a0-4248-8b55-5ae453b13581-partition-0 error_message={"errorMsg":"No such ledger exists on Metadata Server -  ledger=15336 - operation=Failed to open ledger","reqId":836394464329855562, "remote":"yo-pulsar-broker-c3-n1/192.168.2.1:6650", "local":"/192.168.2.1:58448"}    
Jan 31 14:47:56 yo-pulsar-broker-c3-n1 pulsar[3975173]: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"No such ledger exists on Metadata Server -  ledger=15336 - operation=Failed to open ledger","reqId":836394464329855562, "remote":"yo-pulsar-broker-c3-n1/192.168.2.1:6650", "local":"/192.168.2.1:58448"}

@heesung-sn
Copy link
Contributor Author

heesung-sn commented Jan 31, 2024

@KannarFr
can you share any full stacktrace when you hit "errorMsg":"No such ledger exists on Metadata Server - ledger=15336 ?, or can you share any reproduce steps?

This PR is specifically trying to fix the ledger error, No such ledger exists on Metadata Server, when deleting schema as part of deleting topic. Also, this ledger error is specifically from the schema deletion(schema registry), SchemaException: No such ledger exists on Metadata Server - ledger=xxxx - operation=Failed to open ledger.

Also, do your errors start showing with this fix as regression?

@KannarFr
Copy link
Contributor

Ok, it looks not related to this issue my bad. I'll open another issue about my problem.

@heesung-sn
Copy link
Contributor Author

heesung-sn commented Jan 31, 2024

Ok, it looks not related to this issue my bad. I'll open another issue about my problem.

I see. Thanks.

Technoboy- pushed a commit that referenced this pull request Feb 5, 2024
Technoboy- pushed a commit that referenced this pull request Feb 20, 2024
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 1, 2024
…opic with many partitions and schema (apache#21977)

(cherry picked from commit 75e2142)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 6, 2024
…opic with many partitions and schema (apache#21977)

(cherry picked from commit 75e2142)
@heesung-sn heesung-sn deleted the schema-delete branch April 2, 2024 17:44
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Aug 23, 2024
…opic with many partitions and schema (apache#21977)

(cherry picked from commit 1b4127a)
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants