Suggested changes for LambdaBootstrap testability #540

martincostello · 2019-11-03T10:41:39Z

This PR is initially more for discussion/suggestion as the use case might be too niche to actually change the LambdaBootstrap class that ships to NuGet.org, or the change to enable the use case might be better achieved in other ways for changing the public API surface.

Scenario

I've recently started some work on porting a .NET Core 3.0 worker service which processes SQS messages to be an AWS Lambda function instead.

As the project is already targeting .NET Core 3.0, I decided to go with the custom runtime support rather than downgrade to .NET Core 2.1 to use the built-in support.

As part of the work I looked at creating some "end-to-end" or "integration" style tests I've used before with HTTP APIs to run the project as a "black box" in-memory and exercise the code from the outside so things like configuration, serialization etc. are covered by tests as well.

This has lead to the following open source project which I put together yesterday and have published as a NuGet package as a v0.1.0 which builds on top of ASP.NET Core's TestServer: https://github.com/martincostello/lambda-test-server

Essentially it emulates the Lambda runtime that Amazon.Lambda.RuntimeSupport calls internally over HTTP to deliver requests to the message loop and listens for responses and errors from the function for inspection by the test code.

Here's some example code of what you can achieve with it:

Problem

However, there's two things I've had "workaround" in the way LambdaBootstrap is designed to get things working.

The first is that there's no way to inject a custom HttpClient into LambdaBootstrap which provides the hook to redirect the HTTP calls in-memory. This could be achieved by hosting the Lambda test server on an actual HTTP port and setting the AWS_LAMBDA_RUNTIME_API environment variable to its host and port, but this increases the complexity to some degree rather than simple use of TestServer and makes it less lightweight.

For the time being this can be worked around in test code by using Reflection to change the runtime API client that's being used, but it's a bit of a hack. It would be cleaner if it was possible to specify a custom HttpClient as a constructor argument to LambdaBootstrap.

https://github.com/martincostello/lambda-test-server/blob/da1ac1800af1a10eff2a74a2aa192140a9ae26a6/tests/AwsLambdaTestServer.Tests/FunctionRunner.cs#L24-L32

The second is that the internal loop will invariably always be listening for at least one more message from the runtime using what's coded like long-polling after a single invocation. This can be minimised by getting the timing "just-so" so the CancellationToken passed to LambdaBootstrap.RunAsync() is cancelled before the loop starts again, but if the timing is off you need to deliver at least one more message to get the polling to stop, otherwise it hangs.

You also can't just throw an exception or return no content or an HTTP error, as it causes the loop in the bootstrapper to throw an exception either by receiving a non 2xx status code, throws the exception from the TestServer out to the caller or throws because one of the required headers isn't set.

This required me to deliver a "dummy" empty JSON response back to the Lambda runtime to get the loop to terminate:

https://github.com/martincostello/lambda-test-server/blob/da1ac1800af1a10eff2a74a2aa192140a9ae26a6/src/AwsLambdaTestServer/RuntimeHandler.cs#L125-L137

This then delivers a dud message to the function handler, which may or may not handle it correctly (though it can be worked around), but having to provide an "extra" fake message to get the loop to terminate is also a bit of a hack considering that the loop has an argument for a CancellationToken, but that it isn't flowed through to the HttpClient to cancel the request to get a new message.

One way of improving this might be to flow the CancellationToken from RunAsync() through to the HttpClient's requests instead of the None value that is used today, and then catching the OperationCancelled exception to stop the loop.

I realise that this isn't an issue with a production Lambda runtime because the process is frozen when there are no messages so as far as the function's concerned there's a continuous loop of messages that don't need to timeout/cancel/abort, but it seems like a small change to make testing/simulation outside of the Lambda environment itself easier to work with.

Possible Solution

This PR proposes one possible set of changes to Amazon.Lambda.RuntimeSupport that would improve the scenario described above.

The first is to provide two new constructors that accept an HttpClient parameter to allow for the in-process test server to be used. I considered making the Client property public instead, but I thought this approach neater as it removes the extra HttpClient being allocated that's then no longer used. The new constructors are used rather than another optional parameter as it's a non-breaking API surface area change.

The second is more contentious as it consists of two parts.

The first simpler part is catching OperationCanceled exception in the message loop and swallowing it, which then causes the loop to observed the CancellationToken has been signalled and return without having to provide "one more" message to process for the function before the token is observed.

The second part is trickier as due to the use of the public IRuntimeApiClient interface, it wasn't possible to pass through the CancellationToken without making a breaking change to the interface by introducing a new optional parameter to the GetNextInvocationAsync() method.

I'm open to suggestions on a better way to achieve this without having to change the public API surface area.

Next Steps

As I said at the top of the issue, this PR is for discussion and it might not be a correct solution or enough of a real-world problem to actually change the library code.

I could expect the outcode of this PR to be one of the following outcomes (there may be others):

Accept as-is for a v2.0.0 with the breaking change to IRuntimeApiClient
Accept in-principle, but change the approach to improve the scenario in another way that may or may not make breaking changes
Close the PR without merging as the changes do not fit with the overall architecture and goals of the Lambda runtime support package and/or Lambda runtime in production environments
Close the PR without merging as the use-case is too niche

Thanks for taking the time to consider the above and this PR!

Support a user-provided HttpClient with LambdaBootstrap.

Pass the CancellationToken from LambdaBootstrap.RunAsync() through to IRuntimeApiClient.GetNextInvocationAsync() and catch OperationCanceledException in LambdaBootstrap and return.

martincostello · 2019-11-03T10:44:19Z

And something I forgot to include, here's a commit that shows how the test server and function code could be tidied up with these changes in place: martincostello/lambda-test-server@d7dff32

normj

I'm good with the idea of the PR but need a few changes done first.

normj · 2019-11-03T18:56:56Z

Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs

+                {
+                    await InvokeOnceAsync(cancellationToken);
+                }
+                catch (OperationCanceledException)


What if we only swallow OperationCanceledException if cancellationToken.IsCancellationRequested is true?

That's fair - I did tinker with inspecting the OperationCanceledException but it wasn't 100% reliable in cases where different cancellation tokens where combined. I think that'd be a good compromise against accidentally swallowing some exception coming from inside the library that wasn't prompted by the caller cancelling.

normj · 2019-11-03T18:59:28Z

Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs

        {
+            _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));


We would need to keep track if bootstrap created the httpclient or if it was given to bootstrap. If it was given to bootstrap then we should not dispose it as that is the responsibility of the caller.

normj · 2019-11-03T19:23:40Z

Libraries/src/Amazon.Lambda.RuntimeSupport/Client/RuntimeApiClient.cs

        /// <returns>A Task representing the asynchronous operation.</returns>
-        public async Task<InvocationRequest> GetNextInvocationAsync()
+        public async Task<InvocationRequest> GetNextInvocationAsync(CancellationToken cancellationToken)


I'm okay with this slight breaking change but we should make it cancellationToken = default so it won't break users that recompile their application.

Also we might as well add cancellationToken to all of the methods in IRuntimeApiClient to to make the one "breaking" change all at once.

I don't really anticipate users calling IRuntimeApiClient directly which is why I'm okay making the breaking change.

Only swallow OperationCanceledException if IsCancellationRequested is true on the CancellationToken passed into the method by the caller.

Do not dispose of HttpClient if a value is provided through the constructor by the user. Only dispose if created by the instance itself.

Add an optional CancellationToken argument to all IRuntimeApiClient methods.

normj · 2019-11-05T07:03:33Z

I have merged the PR into the dev branch to go out with the next release.

martincostello · 2019-11-05T07:04:22Z

Awesome - thanks!

martincostello · 2019-11-05T10:09:34Z

Are you able to share a rough estimate of when the next SDK release will ship?

normj · 2019-11-06T06:30:12Z

Version 1.1.0 of Amazon.Lambda.RuntimeSupport is out with this PR. Thanks for your contribution!

React to changes from aws/aws-lambda-dotnet#540 and remove workarounds required to plug in the Lambda test server that were needed with v1.0.0 of Amazon.Lambda.RuntimeSupport that aren't required with changes made in v1.1.0.

martincostello added 2 commits November 3, 2019 08:40

Support HttpClient injection

f2cf315

Support a user-provided HttpClient with LambdaBootstrap.

Pass CancellationToken through to NextAsync

3a115e0

Pass the CancellationToken from LambdaBootstrap.RunAsync() through to IRuntimeApiClient.GetNextInvocationAsync() and catch OperationCanceledException in LambdaBootstrap and return.

normj requested changes Nov 3, 2019

View reviewed changes

martincostello added 3 commits November 3, 2019 19:32

Only swallow cancelled exception if token signalled

3f70fdd

Only swallow OperationCanceledException if IsCancellationRequested is true on the CancellationToken passed into the method by the caller.

Do not dispose of HttpClient if passed in

6945418

Do not dispose of HttpClient if a value is provided through the constructor by the user. Only dispose if created by the instance itself.

Add cancellation token to all runtime client methods

f2f3b97

Add an optional CancellationToken argument to all IRuntimeApiClient methods.

martincostello requested a review from normj November 4, 2019 10:40

normj approved these changes Nov 5, 2019

View reviewed changes

normj merged commit f2f3b97 into aws:master Nov 6, 2019

martincostello deleted the Lambda-Bootstrap-Testability branch November 6, 2019 07:19

martincostello mentioned this pull request Nov 6, 2019

Remove workarounds for Amazon.Lambda.RuntimeSupport martincostello/lambda-test-server#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested changes for LambdaBootstrap testability #540

Suggested changes for LambdaBootstrap testability #540

martincostello commented Nov 3, 2019

martincostello commented Nov 3, 2019

normj left a comment

normj Nov 3, 2019

martincostello Nov 3, 2019

normj Nov 3, 2019

normj Nov 3, 2019

normj commented Nov 5, 2019

martincostello commented Nov 5, 2019

martincostello commented Nov 5, 2019

normj commented Nov 6, 2019

		{
		_httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient));

Suggested changes for LambdaBootstrap testability #540

Suggested changes for LambdaBootstrap testability #540

Conversation

martincostello commented Nov 3, 2019

Scenario

Problem

Possible Solution

Next Steps

martincostello commented Nov 3, 2019

normj left a comment

Choose a reason for hiding this comment

normj Nov 3, 2019

Choose a reason for hiding this comment

martincostello Nov 3, 2019

Choose a reason for hiding this comment

normj Nov 3, 2019

Choose a reason for hiding this comment

normj Nov 3, 2019

Choose a reason for hiding this comment

normj commented Nov 5, 2019

martincostello commented Nov 5, 2019

martincostello commented Nov 5, 2019

normj commented Nov 6, 2019