Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Lambda .NET 6 ARM Exceptions #1244

Closed
kubalunacek opened this issue Jul 14, 2022 · 21 comments
Closed

AWS Lambda .NET 6 ARM Exceptions #1244

kubalunacek opened this issue Jul 14, 2022 · 21 comments
Labels
bug This issue is a bug. closed-for-staleness module/lambda-client-lib needs-reproduction This issue needs reproduction. p1 This is a high priority issue queued response-requested Waiting on additional info and feedback. Will move to close soon in 7 days.

Comments

@kubalunacek
Copy link

kubalunacek commented Jul 14, 2022

Describe the bug

We have an application that runs on AWS Lambda. Until recently we were running it on .NET Core 3.1 and x86 architecture. We have now ported it to .NET6 on ARM64 and this has brought unexpected problems, as we are randomly getting errors that we believe point to a problem in AWS Lambda runtime for .NET 6. On the original application (.NET Core 3.1 + x86) these errors were not present. Does anyone have any experience with the migration to ARM and .NET6 and if it is more memory intensive or could it be a bug in the Lambda runtime? Any ideas on how to solve the problem will be gratefully received.
Thanks.

Exception;

START RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Version: $LATEST  
Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
END RequestId: 79052b48-798c-4626-8cbb-7f9240196df6
REPORT RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Duration: 1051.21 ms Billed Duration: 1052 ms Memory Size: 512 MB Max Memory Used: 368 MB
RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Error: Runtime exited with error: signal: aborted Runtime.ExitError

Expected Behavior

Lambda works without errors.

Current Behavior

Lambda fails unexpectedly.

START RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Version: $LATEST  
Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
END RequestId: 79052b48-798c-4626-8cbb-7f9240196df6
REPORT RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Duration: 1051.21 ms Billed Duration: 1052 ms Memory Size: 512 MB Max Memory Used: 368 MB
RequestId: 79052b48-798c-4626-8cbb-7f9240196df6 Error: Runtime exited with error: signal: aborted Runtime.ExitError

Reproduction Steps

We have no trivial code for that, but we have global TryCatch block in our Lambda handler and it does not catch anything.

public class PingServer : AController
{
    public async Task<APIGatewayProxyResponse> HandleRequest(APIGatewayProxyRequest req, ILambdaContext context)
    {
        try
        {
            using (var scope = CreateScope())
            {
                return await HandleRequest(scope, scope.ServiceProvider.GetService<CoreController.PingServer>(), req, context);
            }
        }
        catch (Exception ex)
        {
            context.Logger.LogCritical($"Error from {nameof(PingServer)}_{nameof(HandleRequest)}\n{ex.ToString()}");
            throw;
        }
    }
}

Possible Solution

No response

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

Amazon.Lambda.APIGatewayEvents 2.4.1
Amazon.Lambda.Core 2.1.0
Amazon.Lambda.Serialization.SystemTextJson 2.3.0
AWSXRayRecorder.Handlers.AwsSdk 2.8.3

Targeted .NET Platform

.NET 6

Operating System and version

AWS Lambda

@kubalunacek kubalunacek added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 14, 2022
@normj
Copy link
Member

normj commented Jul 15, 2022

Does this happen during the cold start after a some number invocations? A way to check if this happens during cold starts would be if it is the first invoke in the CloudWatch Log Stream.

@kubalunacek
Copy link
Author

Hey, it is not definitely cold start, via images. What is interesting that we have two Lambdas with same code base, the first one is 256MB of memory. The second one with 512MB of memory. The first always crash with Size: 256 MB Max Memory Used: 256 MB. The second one with Size: 512 MB Max Memory Used: 368 MB.

Lambda log:
image

@ashishdhingra ashishdhingra added module/lambda-client-lib and removed needs-triage This issue or PR still needs to be triaged. labels Jul 15, 2022
@bjorg
Copy link
Contributor

bjorg commented Jul 18, 2022

I ran into the same issue as well. Also ARM64, but using 128MB for configuration. It's pretty rare as I've been doing benchmarking of various configuration and deployment options and have only seen it once.

image

@bjorg
Copy link
Contributor

bjorg commented Jul 18, 2022

One thing I noticed is that all invocations were short until the one that failed. And it's about the same duration as in the original report.
image

@ashishdhingra ashishdhingra added the needs-reproduction This issue needs reproduction. label Jul 22, 2022
@kubalunacek
Copy link
Author

Hi @ashishdhingra ,
what do you need for reproduction? We have out production code deployed on AWS Lambdas. I am unable to provide you the production code. But we get this error once or four times a day. And from our observations, it's not so much the code as it is the number of calls at the moment. Because the other 10+ Lambdas with the same code base and less frequent calls don't suffer from anything like this.

@dglozano
Copy link

dglozano commented Aug 2, 2022

We are experiencing the exact same issue. Our try/catch in the handler doesn't catch anything, and it crashes with the same Runtime error.

It happens randomly, on average once a day I would say.

.NET 6, ARM64 and 1024MB of configured memory.

And we observe the same thing as @kubalunacek 👇🏻

But we get this error once or four times a day. And from our observations, it's not so much the code as it is the number of calls at the moment. Because the other 10+ Lambdas with the same code base and less frequent calls don't suffer from anything like this.

@JirHan
Copy link

JirHan commented Aug 8, 2022

Hey, we are having exactly the same issue. Few lambda calls per day fails with the error above. It is always the endpoints with the higher invocation number. Tried to debug and analyze this, but without any success.

@LarsAndreasEk
Copy link

We have the exact same issue on arm/.net6 lambdas

@kubalunacek
Copy link
Author

We switched same solution (code base, .NET6) to x86_64 architecture and all these errors disappeared. So problem is in the.NET6 ARM64 Lambda runtime. Do you have some progress @normj ?

@antonfirsov
Copy link

antonfirsov commented Oct 6, 2022

This seems to be definitely related to dotnet/runtime#72365.

@bjorg @kubalunacek @LarsAndreasEk or others, any chance you can produce & share a dump?

@antonfirsov
Copy link

Actually, this could be caused by ValueTask misuse in user code see dotnet/runtime#70486 (comment).

@holthe-tveit
Copy link

hi @antonfirsov I'm working on the same team as @LarsAndreasEk. I do not think we captured any dump related to this issue, but it did go away for us when we changed over to the AWS supported dotnet6 runtime and x86_64.
Our code did not use any value tasks directly as far as I could find but under the hood this is of course in use.

@ik130 ik130 added A queued and removed queued labels Oct 14, 2022
@ashishdhingra ashishdhingra added p1 This is a high priority issue and removed A labels Nov 3, 2022
@kknd4eva
Copy link

Reporting the same issue, running an AWS Lambda function using ARM64 and .NET6. The crash has only occurred a single time, and appears to be (though it's difficult to say for sure) in the code that requests information from DynamoDb.

        /// <summary>
        /// Get the token from the cache. If it's expired, generate a new one first
        /// </summary>
        /// <param name="env">The targeted environment, e.g. prod or preprod</param>
        /// <returns></returns>
        public async Task<DbCacheItem> GetTokenAsync(string env)
        {
            DbCacheItem dbCacheItem;
            DateTime expiryDateTime;

            try
            {
                dbCacheItem = await _dynamoDbContext.LoadAsync<DbCacheItem>(env, _dBOperationConfig);

                if (dbCacheItem == null)
                {
                    expiryDateTime = DateTime.UtcNow.AddMinutes(-5);
                }
                else
                {
                    expiryDateTime = DateTime.Parse(dbCacheItem.DateTimeOfExpiry);
                }

                if (expiryDateTime < DateTime.UtcNow)
                {
                    Logger.LogInformation($"Existing token is null or has expired. Generating a new one");
                    var sapCloudService = new SapCloudPlatformIntegrationService().WithEnvironment(env);
                    var refreshedToken = await sapCloudService.GenerateTokenAsync();
                    dbCacheItem = ConvertToCacheItem(refreshedToken, env);
                    await SaveTokenAsync(dbCacheItem);
                }

                return dbCacheItem;
            }
            catch (Exception ex)
            {
                throw;
            }

        }

The exception appears to be unhandled:

2023-02-14T05:11:38.681Z c33d7e47-4d0b-40f2-9f28-804408778387 info
{
"cold_start": false,
"xray_trace_id": "1-63eb180a-79315a2f75da76e74d6273e2",
"client": "S2CProcessMessage",
"request_id": "No Request Id",
"function_name": "Service-SCPI-Authenticator",
"function_version": "$LATEST",
"function_memory_size": 2048,
"function_arn": "arn:aws:lambda:ap-southeast-2:xxxxxxx:function:Service-SCPI-Authenticator",
"function_request_id": "c33d7e47-4d0b-40f2-9f28-804408778387",
"timestamp": "2023-02-14T05:11:38.6812320Z",
"level": "Information",
"service": "service_undefined",
"name": "AWS.Lambda.Powertools.Logging.Logger",
"message": "Attempting to fetch access token"
}

2023-02-14T16:11:38.694+11:00	Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
2023-02-14T16:11:38.694+11:00	at System.Threading.ThreadPool.<>c.<.cctor>b__86_0(Object state)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.OnCompletedInternal()
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAsyncSuccess(Int32 bytesTransferred, SocketFlags flags)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.CompletionCallback(Int32 bytesTransferred, SocketFlags flags, SocketError socketError)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32 bytesTransferred, Byte[] socketAddress, Int32 socketAddressSize, SocketFlags receivedFlags, SocketError socketError)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.ProcessAsyncOperation(TOperation op)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.ReadOperation.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.HandleEvents(SocketEvents events)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00	at System.Threading.ThreadPoolWorkQueue.Dispatch()
2023-02-14T16:11:38.694+11:00	at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
2023-02-14T16:11:38.694+11:00	at System.Threading.Thread.StartCallback()
2023-02-14T16:11:38.875+11:00
Copy
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted
Runtime.ExitError
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted Runtime.ExitError 

We have seen this error only once, but it is concerning. Should we look to change back to .NET 6 on x86 until this is understood?

@kknd4eva
Copy link

Reporting the same issue, running an AWS Lambda function using ARM64 and .NET6. The crash has only occurred a single time, and appears to be (though it's difficult to say for sure) in the code that requests information from DynamoDb.

        /// <summary>
        /// Get the token from the cache. If it's expired, generate a new one first
        /// </summary>
        /// <param name="env">The targeted environment, e.g. prod or preprod</param>
        /// <returns></returns>
        public async Task<DbCacheItem> GetTokenAsync(string env)
        {
            DbCacheItem dbCacheItem;
            DateTime expiryDateTime;

            try
            {
                dbCacheItem = await _dynamoDbContext.LoadAsync<DbCacheItem>(env, _dBOperationConfig);

                if (dbCacheItem == null)
                {
                    expiryDateTime = DateTime.UtcNow.AddMinutes(-5);
                }
                else
                {
                    expiryDateTime = DateTime.Parse(dbCacheItem.DateTimeOfExpiry);
                }

                if (expiryDateTime < DateTime.UtcNow)
                {
                    Logger.LogInformation($"Existing token is null or has expired. Generating a new one");
                    var sapCloudService = new SapCloudPlatformIntegrationService().WithEnvironment(env);
                    var refreshedToken = await sapCloudService.GenerateTokenAsync();
                    dbCacheItem = ConvertToCacheItem(refreshedToken, env);
                    await SaveTokenAsync(dbCacheItem);
                }

                return dbCacheItem;
            }
            catch (Exception ex)
            {
                throw;
            }

        }

The exception appears to be unhandled:

2023-02-14T05:11:38.681Z c33d7e47-4d0b-40f2-9f28-804408778387 info { "cold_start": false, "xray_trace_id": "1-63eb180a-79315a2f75da76e74d6273e2", "client": "S2CProcessMessage", "request_id": "No Request Id", "function_name": "Service-SCPI-Authenticator", "function_version": "$LATEST", "function_memory_size": 2048, "function_arn": "arn:aws:lambda:ap-southeast-2:xxxxxxx:function:Service-SCPI-Authenticator", "function_request_id": "c33d7e47-4d0b-40f2-9f28-804408778387", "timestamp": "2023-02-14T05:11:38.6812320Z", "level": "Information", "service": "service_undefined", "name": "AWS.Lambda.Powertools.Logging.Logger", "message": "Attempting to fetch access token" }

2023-02-14T16:11:38.694+11:00	Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
2023-02-14T16:11:38.694+11:00	at System.Threading.ThreadPool.<>c.<.cctor>b__86_0(Object state)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.OnCompletedInternal()
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAsyncSuccess(Int32 bytesTransferred, SocketFlags flags)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.CompletionCallback(Int32 bytesTransferred, SocketFlags flags, SocketError socketError)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32 bytesTransferred, Byte[] socketAddress, Int32 socketAddressSize, SocketFlags receivedFlags, SocketError socketError)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.ProcessAsyncOperation(TOperation op)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.ReadOperation.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncContext.HandleEvents(SocketEvents events)
2023-02-14T16:11:38.694+11:00	at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
2023-02-14T16:11:38.694+11:00	at System.Threading.ThreadPoolWorkQueue.Dispatch()
2023-02-14T16:11:38.694+11:00	at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
2023-02-14T16:11:38.694+11:00	at System.Threading.Thread.StartCallback()
2023-02-14T16:11:38.875+11:00
Copy
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted
Runtime.ExitError
RequestId: c33d7e47-4d0b-40f2-9f28-804408778387 Error: Runtime exited with error: signal: aborted Runtime.ExitError 

We have seen this error only once, but it is concerning. Should we look to change back to .NET 6 on x86 until this is understood?

Just an FYI we have had more occurrences of this issue. Very infrequent, but still concerning.

@antonfirsov
Copy link

Note that that a bug has been identified in .NET that may be potentially causing this. It will be fixed in upcoming releases 6.0.18 and 7.0.7 (in June if I'm not mistaken):
dotnet/runtime#84432
dotnet/runtime#84641

@kknd4eva and others, please stay tuned for the update and let us know if it helped!

@kknd4eva
Copy link

kknd4eva commented Jul 30, 2023

This is still occurring. Should we have expected this to be resolved by now? (Once again only impacting .NET6 + Arm64

`CloudWatch Logs Insights
region: ap-southeast-2
log-group-names: /aws/lambda/SOE-Order-File-Service
start-time: -43200s
end-time: 0s
query-string:

fields @timestamp, @logStream, @log, @message 
| filter @message like 'Specified argument was out of the range of valid values. (Parameter'
    

@timestamp @logstream @log @message
2023-07-30 06:49:39.317 2023/07/30/[$LATEST]2f985c3216d7442981641381a86663f8 298552055425:/aws/lambda/SOE-Order-File-Service Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
---`

@antonfirsov
Copy link

Are you 100% sure your code is running on the updated (6.0.18+) runtime? What is your exact dotnet runtime version?

@kknd4eva
Copy link

Are you 100% sure your code is running on the updated (6.0.18+) runtime? What is your exact dotnet runtime version?

I believe so. The Lambda runtime update is set to 'AUTO' and the function runtime is currently:

INIT_START Runtime Version: dotnet.6v17 Runtime Version ARN: arn:aws:lambda:ap-southeast-2::runtime:4b286ddbfcfeaf773c960ecafbd89d987b45ebf710c123baf0640c1fd8798259


@kknd4eva
Copy link

Ah, I see - the version says v17. I would have thought an AUTO configuration would have automatically updated. I'll investigate how I can force this update.

@philasmar
Copy link
Contributor

@kknd4eva could you please check if the issue has been resolved in the latest Lambda .NET patches? I believe they are on dotnet.6v19 now.
I am trying to reproduce the issue locally, but I haven't been able to so far.

@ashishdhingra ashishdhingra added the response-requested Waiting on additional info and feedback. Will move to close soon in 7 days. label Nov 16, 2023
Copy link
Contributor

This issue has not received a response in 5 days. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness module/lambda-client-lib needs-reproduction This issue needs reproduction. p1 This is a high priority issue queued response-requested Waiting on additional info and feedback. Will move to close soon in 7 days.
Projects
None yet
Development

No branches or pull requests