Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay of Local Activities Executed by an Interceptor #828

Closed
grantfuhr opened this issue Jun 7, 2022 · 5 comments · Fixed by #839
Closed

Replay of Local Activities Executed by an Interceptor #828

grantfuhr opened this issue Jun 7, 2022 · 5 comments · Fixed by #839

Comments

@grantfuhr
Copy link

Expected Behavior

Workflows replayed on the same version of a worker should not panic.

Actual Behavior

When replaying a workflow that contains a local activity called by an interceptor, the replay panics. This occurs when replaying on the same worker, or a copy of the worker on a different task queue.

Steps to Reproduce the Problem

Here is a gist with a test case that demonstrates the error.

Sample error output from running the test:

2022/06/07 15:12:49 ERROR Workflow panic WorkflowType Greet WorkflowID ReplayId RunID 0dfd038d-a331-40e1-b0c2-a583785ab439 Attempt 1 Error lookup failed for scheduledEventID to activityID: scheduleEventID: 8, activityID: 8 StackTrace process event for hello_world [panic]:
go.temporal.io/sdk/internal.panicIllegalState(...)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_decision_state_machine.go:409
go.temporal.io/sdk/internal.(*commandsHelper).handleActivityTaskScheduled(0xc004304870, {0x7248e80, 0x1}, 0xc0035ccec0?)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_decision_state_machine.go:973 +0x109
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc003874780, 0xc0021815c0, 0xf0?, 0x0)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_event_handlers.go:824 +0x27e
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc00370a380, 0xc0035ad470)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_handlers.go:878 +0xca8
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc000e95760, 0xc0035ad470, 0x0)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_task_handlers.go:727 +0x485
go.temporal.io/sdk/internal.(*WorkflowReplayer).replayWorkflowHistory(0xc003874720, {0x62bba78, 0xc003874738}, {0x62cd4f8?, 0xc0040008c0}, {0x5e32369, 0xf}, 0xc002f13890)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_worker.go:1197 +0x64e
go.temporal.io/sdk/internal.(*WorkflowReplayer).ReplayWorkflowHistory(0xc00348fe60?, {0x0?, 0x0?}, 0x5e2160f?)
	/Users/grant.fuhr/go/pkg/mod/go.temporal.io/sdk@v1.14.0/internal/internal_worker.go:1080 +0x205
github.com/DataDog/temporalite/temporaltest_test.TestReplayInterceptor(0xc0033564e0)
	/Users/grant.fuhr/open-source/temporalite/temporaltest/server_test.go:220 +0x876
testing.tRunner(0xc0033564e0, 0x5ef5f58)
	/usr/local/Cellar/go/1.18.2/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
	/usr/local/Cellar/go/1.18.2/libexec/src/testing/testing.go:1486 +0x35f
    server_test.go:222: lookup failed for scheduledEventID to activityID: scheduleEventID: 8, activityID: 8

Specifications

  • Go-SDK Version: 1.14.0
  • Server Version: 1.16.2
@yiminc yiminc transferred this issue from temporalio/temporal Jun 8, 2022
@cretz
Copy link
Member

cretz commented Jun 8, 2022

Thanks for the report! At first glance I agree, this should not error. I will investigate.

@cretz
Copy link
Member

cretz commented Jun 8, 2022

Ok, I believe I see the problem here. The replayer does not run interceptors. I will add worker.WorkflowReplayer.SetWorkerInterceptors([]WorkerInterceptor) which should be fairly easy.

@Lercher
Copy link

Lercher commented Jun 15, 2022

I'm getting apparently the same panic with Go SDK 1.14.1. The ambient situation is different from the original report. I can't reconstruct it properly from the logs, but I guess some workflow instance was re-hydrated from scratch (i.e. from the recorded events) in a plain worker process. However, I can't guarantee that the (interpreted) workflow code is indeed identical to the original one which created the event stream.

2022/06/14 16:44:36 ERROR Workflow panic 
Namespace default 
TaskQueue fse 
WorkerID 4644@some-server@ 
WorkflowType TRU-vt100-inmietsetzung@v100 
WorkflowID TRU-vt100-inmietsetzung.VTVertrag.7CVT01UDO4FGHU291MKNKNF5TKAS39 
RunID d7dd6ea4-b974-40f5-b192-e9eb634bb4cf 
Attempt 1 
Error lookup failed for scheduledEventID to activityID: scheduleEventID: 29, activityID: 29 StackTrace process event for fse [panic]:

go.temporal.io/sdk/internal.panicIllegalState(...)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_decision_state_machine.go:409
go.temporal.io/sdk/internal.(*commandsHelper).handleActivityTaskScheduled(0xc00008e140, {0xc000719778, 0x2}, 0x0?)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_decision_state_machine.go:973 +0x109
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc0003cc4b0, 0xc0002fb740, 0xd8?, 0x0)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_event_handlers.go:824 +0x27e
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0002589a0, 0xc00018d380)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_task_handlers.go:878 +0xca8
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0005b6000, 0xc00018d380, 0xc00018dcb0)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_task_handlers.go:727 +0x485
go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0005b00d0, 0xc00018d380)
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_task_pollers.go:284 +0x2cd
go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0005b00d0, {0xddb860?, 0xc00018d380?})
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_task_pollers.go:255 +0x6c
go.temporal.io/sdk/internal.(*baseWorker).processTask(0xc0000d6140, {0xddb420?,0xc0000b9e90})
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_worker_base.go:398 +0x167
created by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher
        go.temporal.io/sdk@v1.14.1-0.20220322131744-55cf584de5c7/internal/internal_worker_base.go:302 +0xb5

2022/06/14 16:44:36 WARN  Failed to process workflow task. Namespace default TaskQueue fse WorkerID 4644@some-server@ WorkflowType TRU-vt100-inmietsetzung@v100 WorkflowID TRU-vt100-inmietsetzung.VTVertrag.7CVT01UDO4FGHU291MKNKNF5TKAS39 RunID d7dd6ea4-b974-40f5-b192-e9eb634bb4cf Attempt 1 Error lookup failed for scheduledEventID to activityID: scheduleEventID: 29, activityID: 29
2022/06/14 16:44:36 INFO  Task processing failed with error Namespace default TaskQueue fse WorkerID 4644@some-server@ WorkerType WorkflowWorker Error Workflow executionsRow not found.  WorkflowId: fse, RunId: 34ba3ca7-4886-4157-aabe-65e2f3366761

(slightly formatted for better readability)

Sorry @cretz, I don't understand your last comment: Is it something I can investigate/change myself or is it about some possible SDK or temporal-server change? Regarding missing interceptors in the replayer: does such a change also affect re-hydration of ordinary workflow execution state or only the replays for tests and debugging?

Thanks, Martin.

PS: looking at the gist above, interceptors seem to be some user extensibility points as they are added to the options struct. I'm pretty sure that my worker code won't add such interceptors.

@cretz
Copy link
Member

cretz commented Jun 15, 2022

The error you received commonly occurs if you have non-determinism. See https://community.temporal.io/t/go-sdk-troubleshooting. This issue just happens to share that same stack trace because it appears non-deterministic due to a replayer issue.

@Lercher
Copy link

Lercher commented Jun 15, 2022

The error you received commonly occurs if you have non-determinism.

Yes, I see: It's indeed the best explanation as we are working on implementations currently. I just was distracted by the nearly identical stack trace and no one mentioning non-determinism in this ticket. Sorry, my bad, and thank you very much.

cretz added a commit to cretz/temporal-sdk-go that referenced this issue Jun 28, 2022
cretz added a commit that referenced this issue Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants