Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow replayer, docs, time-skipping test server, rename ActivityContext, core update, and more #48

Merged
merged 11 commits into from
Apr 19, 2023

Conversation

cretz
Copy link
Member

@cretz cretz commented Apr 17, 2023

What was changed

Sorry so much was included. It was originally small, but difficulties mounted during PR. Can ignore generated code of course.

@cretz cretz requested a review from a team April 17, 2023 20:23
@@ -2363,7 +2343,7 @@ public class PostPatchWorkflow : PatchWorkflowBase
}
}

[Fact]
[Fact(Skip = "TODO(cretz): This is currently not throwing expected non-determinism errors in core")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sushisource - this is the test that, on the code below that runs:

                var exc = await Assert.ThrowsAsync<WorkflowQueryFailedException>(
                    () => QueryWorkflowAsync(prePatchID));

This is not throwing the non-determinism error one might expect even with latest core. This is testing that using query to replay pre-patch code on a post-patch workflow will cause non-determinism (activity type change).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, weird. I'll check this out - what's the easiest way for me to run just this test? I haven't learned much of the tooling for this yet.

Copy link
Member Author

@cretz cretz Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download the dotnet binary, clone this repo recursively, then in this dir run dotnet build (which runs Rust build, so you'll need protoc and such), remove the Skip message from the attribute, and then you can do something like:

dotnet run --project tests/Temporalio.Tests -- -verbose -method "*.ExecuteWorkflowAsync_Patched_ProperlyHandled"

That should be enough for you to see. Feel free to comment out many parts of this multi-step test. Feel free to hit me up with any snags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is because the version of server this SDK is using for tests doesn't support the sdk metadata and hence internal flags feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will change the skip message to be more clear. I have opened #50. Semi-relatedly, I have also opened temporalio/sdk-java#1746.

Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not do an in-depth review but overall LGTM.

if: ${{ matrix.os != 'ubuntu-arm' }}
- name: Install protoc (non-Linux)
# Does not work on Linux
if: ${{ matrix.os != 'ubuntu-latest' && matrix.os != 'ubuntu-arm' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use contains(matrix.os, 'ubuntu').

Comment on lines +65 to +68
/// <param name="payloadConverterType">Payload converter type. This must be a type and not
/// an instance so it can be serialized across a sandbox boundary.</param>
/// <param name="failureConverterType">Failure converter type. This must be a type and not
/// an instance so it can be serialized across a sandbox boundary.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep using types instead of instances?
Would it make sense to require a factory here so users can pass in config to their converters (given that it's mostly codecs that need to be configurable)?

Copy link
Member Author

@cretz cretz Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda annoying for users, but I like that even though I know this won't ever be sandboxed probably, it still forces them to think about how these are isolated. Because they can't state share in their payload or failure converters which is a good thing anyways (well, unless they use statics).

/// <param name="payloadCodec">Optional payload codec.</param>
public DataConverter(
Type payloadConverterType,
Type failureConverterType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the failure converter optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is effectively optional if they do DataConverter.Default with { PayloadConverterType = typeof(MyPayloadConverter) } which the README suggests.

/// <returns>Results of the replay runs.</returns>
public async IAsyncEnumerable<WorkflowReplayResult> ReplayWorkflowsAsync(
IAsyncEnumerable<WorkflowHistory> histories,
bool throwOnReplayFailure = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this flag?
I removed it in TS since the user can decide to throw while iterating throw the results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sushisource advocated for it on the Python side. Same for single workflow replay. Are you suggesting remove it from both, or have single workflow replay throw but multi-workflow replay not throw? I don't mind either way, I just want to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting that single replay throws and multiple (async iterator) doesn't.
We can give the option for the multi replay method that replays synchronously.

Copy link
Member Author

@cretz cretz Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting that single replay throws and multiple (async iterator) doesn't.

That's already the case by default. Are you saying both need to disallow changing the default? Or just one have the option of changing the default?

We can give the option for the multi replay method that replays synchronously.

They all replay synchronously in this sense. There is no concurrent replay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the option in the async iterator method.

They all replay synchronously in this sense. There is no concurrent replay.

I was talking about the multi replay method that returns task<iterator>.
I'm of course familiar with how the replayer works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the option in the async iterator method.

This argues for inconsistency. Why is a foreach ok to throw but not a await foreach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the task<iterator> method you can't get the results one by one, and it's nice to be able to fail fast.
In the async iterator method, the user has full control and can abort whenever they like.

Copy link
Member Author

@cretz cretz Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the task<iterator> method you can't get the results one by one

You can. The reason for the async iterator in this case is because I allow async iteration input. You can have either:

foreach (var result in await replayer.ReplayWorkflowsAsync(syncHistIter))
{
    // Process 1 by t1
}

or

await foreach (var result in replayer.ReplayWorkflowsAsync(asyncHyncHistIter))
{
    // Process 1 by 1
}

And technically the latter doesn't work on all .NET versions we support.

(await ReplayWorkflowsAsync(new[] { history }, throwOnReplayFailure).ConfigureAwait(false)).Single();

/// <summary>
/// Replay multiple workflows from the given histories.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to direct users to the async API if they're on a new enough version of .NET.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear enough when looking at API docs. And the iterable passed in is clear enough I think.

}
}

private static void CollectPathsToFix(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I should probably move it to compile time code generator. But I figured lazy on first use is ok to start with.

public class WorkflowEnvironmentTests : TestBase
{
// Time-skipping test server only runs on x86/x64
public sealed class OnlyIntelFactAttribute : FactAttribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not only intel, amd has this arch too but 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Intel" is a common term for the instruction set, even though it's AMD's 64-bit extensions. I just used the word they are using to describe it at https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.architecture. This is test only code anyways. I could changed to OnlyIntelBasedFactAttrbiute or something.

@@ -34,6 +34,7 @@ public async Task InitializeAsync()
},
},
});
// env = new(await Temporalio.Client.TemporalClient.ConnectAsync(new("localhost:7233")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally so, because I come here and run against local server so I have a UI (I used it to download those JSONs for the replayer tests). But I can make a comment to that effect.

@cretz cretz merged commit 3408d01 into temporalio:main Apr 19, 2023
@cretz cretz deleted the workflows-more-5 branch April 19, 2023 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants