Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub orchestrator retries #84

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

famarting
Copy link
Contributor

second part for dapr/go-sdk#541

adds support for call suborchestrator with retries

based on the existing implementation for activities

Signed-off-by: Fabian Martinez <46371672+famarting@users.noreply.github.com>
@famarting
Copy link
Contributor Author

ping @cgillum

@famarting
Copy link
Contributor Author

ping

@cgillum
Copy link
Member

cgillum commented Nov 20, 2024

@famarting looks like your new tests are not passing. The CI is reporting timeouts.

@famarting
Copy link
Contributor Author

@cgillum tests are failing because of this error

ERROR: 2024/11/20 19:09:57 orchestration-processor: failed to complete work item: orchestration instance already exists

It makes me think that the strategy I followed to implement sub orchestrator retries is not a valid strategy? do you have any idea or suggestion how to address this?

First I implemented activity retries and I implemented a recursive algorithm that using timers scheduled the activity introducing delays between tries... however this doesnt seem to work OOB for sub orchestrations due to the "already exists" error. Additionally, looking at the python sdk, seems like retries are not implemented using a recursive algorithm, instead extra logic has been added into process event to account for delays and retries.... am I forced to follow the same strategy as python? or what options do we have?

@cgillum
Copy link
Member

cgillum commented Nov 22, 2024

My first guess is that the sqlite backend implementation doesn't allow reusing the existing sub-orchestration ID, so it's returning ErrDuplicateInstance from the call to CompleteOrchestrationWorkItem.

Your test might pass if you forgo providing a specific instance ID and allow the sub-orchestration to use a random instance ID. However, if you want the scenario of reusing a user-specific instance ID to work, then we'll need to change sqlite to handle the case where an existing instance already exists such that we delete the old (failed) instance and create the new instance. Note that the Dapr Actors backend might require similar changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants