Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PendingOrchestrationMessages count does not update when a lease is lost and reclaimed rapidly. #1274

Closed
ConnorMcMahon opened this issue Mar 16, 2020 · 4 comments
Assignees
Labels
Milestone

Comments

@ConnorMcMahon
Copy link
Contributor

Starting in this PR, we now abandon prefetched orchestration messages when we lose our lease. However, we fail to decrease pendingOrchestrationMessages when we do this, meaning that all of those abandoned messages still count against our limit, and we will not reclaim the "space" those lost messages are taking up until the application restarts.

@ghost ghost added the Needs: Triage 🔍 label Mar 16, 2020
@ConnorMcMahon ConnorMcMahon changed the title PendingOrchestrationMessages count does not update when we lose a lease PendingOrchestrationMessages count does not update when a lease is lost. Mar 16, 2020
@cgillum
Copy link
Member

cgillum commented Mar 16, 2020

The abandon message code path should be decrementing the pending message count here. Any idea why that's not happening?

@ConnorMcMahon
Copy link
Contributor Author

Ah, so we do handle it in that case. My case was subtly different, and I tried to make the issue describe the more general case, but as you pointed out, it should work there.

The case I was looking at it looked like we managed to reclaim the partition lease very quickly, and never actually abandoned the prefetched messages. However, when we reacquire the lease, it looks like we actually create a new ControlQueue. Since when we actually process the prefetched messages, when it comes time to delete the message, it is no longer in the pendingMessageIds field, as this is an instance variable, and the new ControlQueue starts with a fresh HashSet. Since we don't find a record of the message id in pendingMessageIds, we don't decrement.

@cgillum
Copy link
Member

cgillum commented Mar 16, 2020

Oh wow, that's very subtle. Great find!

@ConnorMcMahon ConnorMcMahon changed the title PendingOrchestrationMessages count does not update when a lease is lost. PendingOrchestrationMessages count does not update when a lease is lost and reclaimed rapidly. Mar 16, 2020
@ConnorMcMahon ConnorMcMahon self-assigned this Apr 9, 2020
@ConnorMcMahon ConnorMcMahon added this to the v2.2.1 milestone Apr 9, 2020
@ConnorMcMahon
Copy link
Contributor Author

Fixed when updated DurableTask.AzureStorage to 1.7.5 in #1294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants