Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS/SNS Pub-Sub: Message Processing Canceled During Health Check Failures or Graceful Shutdown #3618

Open
Runnable1996 opened this issue Nov 29, 2024 · 3 comments
Labels
kind/bug Something isn't working

Comments

@Runnable1996
Copy link

Runnable1996 commented Nov 29, 2024

Expected Behavior

  • When the Dapr health check fails during high load, the message currently being handled should complete processing without cancellation.

  • During a graceful shutdown (upon receiving a SIGTERM signal), Dapr should stop pulling new messages but allow the processing of the current message to complete before terminating.

Actual Behavior

  • It seems that the context (ctx) used in message processing is tied to the subscription lifecycle. This causes messages to be prematurely canceled during health check failures or graceful shutdowns, even if they are in the middle of processing.

  • When the Dapr health check fails under load, the message being processed gets canceled unexpectedly.

  • During a graceful shutdown, the current message being handled is canceled instead of completing, while no new messages are received as expected.

Steps to Reproduce the Problem

  1. Set up Dapr with Amazon SQS and SNS for pub/sub.

  2. Subscribe to an SQS queue with Dapr.

    • Simulate a high load to cause a Dapr health check failure. Observe that the message being processed is canceled.
    • Start handling a message and send a SIGTERM signal to initiate a graceful shutdown.
  3. Observe that the current message being processed is canceled rather than completing, contrary to expectations.

Additional Context

I suspect the issue might be related to the ctx passed in the processMessage function. It appears that the context is tied to the subscription itself, and replacing it with a context independent of the subscription lifecycle might resolve this issue.

@Runnable1996 Runnable1996 added the kind/bug Something isn't working label Nov 29, 2024
@Runnable1996
Copy link
Author

Runnable1996 commented Nov 29, 2024

Hi, @ItalyPaleAle, @yaron2,
I would greatly appreciate your insights on this issue.
Thank you in advance!

@yaron2
Copy link
Member

yaron2 commented Dec 1, 2024

@JoshVanL can you kindly verify if the wrong context is being used here?

@Runnable1996
Copy link
Author

Runnable1996 commented Dec 4, 2024

Hey @JoshVanL,@yaron2
I would appreciate it if you could verify this. I don't mind opening a PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants