Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotnet Core 2.1 - AmazonSQSClient AmazonServiceException when call ReceiveMessageAsync #1004

Closed
actumn opened this issue Jul 17, 2018 · 25 comments
Assignees
Labels
guidance Question that needs advice or information.

Comments

@actumn
Copy link

actumn commented Jul 17, 2018

Hi.
When I updated dotnet core (2.0.6 -> 2.1) and deployed on server, we met some error about credentials in AmazonSQSClient.
I think it is from dotnet core 2.1 HttpClientHandler update.
So I rollback dotnet core version (2.1 -> 2.0.6), then there is no exception anymore.

Expected Behavior

Whenever we call Amazon.SQS.AmazonSQSClient.ReceiveMessageAsync(ReceiveMessageRequest request, CancellationToken cancellationToken = default(CancellationToken)), there should be no exception.

Current Behavior

The function throws exception very frequently(3 minutes in 6 hour average) like below.

Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeAsync[T](IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeAsync[T](IExecutionContext executionContext)

Context

I want to get the network performance provided by dotnet core 2.1, with AWS features.

Your Environment

  • AWSSDK.Core version used: 3.3.24.2
  • AWSSDK.SQS version used: 3.3.3.10
  • Operating System and version: Amazon Linux AMI 2017.09.k x86_64 ECS HVM GP2
  • Dotnet Build Docker Image: microsoft/dotnet:2.1-sdk
  • Dotnet Runtime Docker Image: microsoft/dotnet:2.1-runtime

.NET Core Info

  • .NET Core version used for development: 2.1
  • .NET Core version installed in the environment where application runs: microsoft/dotnet:2.1-runtime (docker image)
  • Output of dotnet --info:
Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

.NET Core SDKs installed:
  No SDKs were found.

.NET Core runtimes installed:
  Microsoft.NETCore.App 2.1.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
@actumn actumn closed this as completed Jul 17, 2018
@actumn actumn changed the title d Dotnet Core 2.1 - AmazonSQSClient Exception when call ReceiveMessageAsync Jul 17, 2018
@actumn actumn changed the title Dotnet Core 2.1 - AmazonSQSClient Exception when call ReceiveMessageAsync Dotnet Core 2.1 - AmazonSQSClient AmazonServiceException when call ReceiveMessageAsync Jul 17, 2018
@actumn actumn reopened this Jul 17, 2018
@normj normj added the Question label Jul 18, 2018
@normj
Copy link
Member

normj commented Jul 18, 2018

Interesting, could you try running on .NET Core 2.1 again and set the environment variable DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER equal to 0. This is a .NET Core environment variable that you can use to toggle back to the libcurl implementation of HttpClient. I'm really curious if you see a change in behavior.

@actumn
Copy link
Author

actumn commented Jul 19, 2018

These are what I found on AWS SDK Internal Log when AmazonSQSClient threw Exception without the env DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER setting

DefaultInstanceProfileAWSCredentials 5390|2018-07-19T03:33:08.174Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)


AmazonSQSClient 5393|2018-07-19T03:33:10.306Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
DefaultInstanceProfileAWSCredentials 11507|2018-07-19T04:41:17.431Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.OperationCanceledException: The operation was canceled.
at System.Net.Http.HttpClient.HandleFinishSendAsyncError(Exception e, CancellationTokenSource cts)
at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)


AmazonSQSClient 11510|2018-07-19T04:41:18.111Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)

As we can see, the exception come from here:

public static string DownloadStringContent(Uri uri, TimeSpan timeout)
{
#if PCL || CORECLR
using (var client = new System.Net.Http.HttpClient())
{
if (timeout > TimeSpan.Zero)
client.Timeout = timeout;
var content = AsyncHelpers.RunSync<string>(() =>
{
return client.GetStringAsync(uri);
});
return content;
}

@actumn
Copy link
Author

actumn commented Jul 19, 2018

And after set the environment variable as you suggest, there is still exception (but rarely than previous).
I attached logs containing request information.

efaultInstanceProfileAWSCredentials 13559|2018-07-19T07:25:59.131Z|ERROR|Failed to retrieve credentials from EC2 Instance Metadata Service. --> System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.GetStringAsyncCore(Task`1 getTask)
at Amazon.Runtime.Internal.Util.AsyncHelpers.<>c__DisplayClass1_1`1.<<RunSync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Amazon.Runtime.Internal.Util.AsyncHelpers.ExclusiveSynchronizationContext.BeginMessageLoop()
at Amazon.Runtime.Internal.Util.AsyncHelpers.RunSync[T](Func`1 task)
at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
at Amazon.Util.EC2InstanceMetadata.get_IAMSecurityCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)
AmazonSQSClient 13560|2018-07-19T07:26:14.118Z|DEBUG|Received response (truncated to 1024 bytes): [<?xml version="1.0"?><ReceiveMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><ReceiveMessageResult/><ResponseMetadata><RequestId>21d0c202-d6a1-5bd2-b89b-91c8e9e36a52</RequestId></ResponseMetadata></ReceiveMessageResponse>]
AmazonSQSClient 13561|2018-07-19T07:26:14.119Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True",
        "CanonicalRequest": "{secret}",
        "StringToSign": "{secret}",
        "ServiceName": "Amazon.SQS",
        "ServiceEndpoint": "https://sqs.ap-northeast-1.amazonaws.com/",
        "MethodName": "ReceiveMessageRequest",
        "RequestSize": "102",
        "StatusCode": "OK",
        "BytesProcessed": "240",
        "AWSRequestID": "21d0c202-d6a1-5bd2-b89b-91c8e9e36a52"
    },
    "timings": {
        "CredentialsRequestTime": 0.0088,
        "RequestSigningTime": 0.5013000000000001,
        "HttpRequestTime": 20007.033900000002,
        "ResponseUnmarshallTime": 0.33140000000000003,
        "ResponseProcessingTime": 0.5935,
        "ClientExecuteTime": 20008.325500000003
    },
    "counters": {}
}
AmazonSQSClient 13562|2018-07-19T07:26:14.136Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13563|2018-07-19T07:26:14.137Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True"
    },
    "timings": {
        "CredentialsRequestTime": 0.7505000000000001,
        "ClientExecuteTime": 13.6751
    },
    "counters": {}
}
AmazonSQSClient 13564|2018-07-19T07:26:14.150Z|ERROR|AmazonServiceException making request ReceiveMessageRequest to https://sqs.ap-northeast-1.amazonaws.com/. Attempt 1. --> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()
at Amazon.Runtime.Internal.CredentialsRetriever.InvokeAsync[T](IExecutionContext executionContext)
at Amazon.Runtime.Internal.RetryHandler.InvokeAsync[T](IExecutionContext executionContext)
AmazonSQSClient 13565|2018-07-19T07:26:14.151Z|INFO|Request metrics:
{
    "properties": {
        "AsyncCall": "True"
    },
    "timings": {
        "CredentialsRequestTime": 0.0601,
        "ClientExecuteTime": 0.8111
    },
    "counters": {}
}

@arishlabroo
Copy link

also, looking at the implementation here

https://github.com/aws/aws-sdk-net/blob/master/sdk/src/Core/Amazon.Runtime/Credentials/_bcl+coreclr/DefaultInstanceProfileAWSCredentials.cs

Looks like even if there is a failure to get the credentials, null credentials are cached for 2 minutes. All subsequent calls fail with default FailedToGetCredentialsMessage. This is a major issue we are running into production without any seeming workaround.

Is there a workaround 🤔 you can recommend till you look into this more?

setting the DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER didnt get rid of it for us.

@actumn
Copy link
Author

actumn commented Jul 25, 2018

With @arishlabroo saying, I think it should not set null but remain previous Instance.lastRetrievedCredentials when thrown TaskCanceledException.
Like it :

catch (TaskCanceledException e)
{
	Instance?.logger.Error(e, "RenewCredentials task canceled");
}
catch (Exception e)
{
	// we want to suppress any exceptions from this timer task.
	Instance?.logger.Error(e, FailedToGetCredentialsMessage);

	Instance.refreshLock.TryEnterWriteLock(lockWaitTimeOut);
	{
		Instance.lastRetrievedCredentials = null;
	}
	Instance.refreshLock.ExitWriteLock();
}

@AnthonySteele
Copy link

AnthonySteele commented Jul 25, 2018

We have a web api on .NET core 2.1 and publishing messages to SNS. We have seen that same error call-stack in our logs:

Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentialsAsync()

We have the problem seen at the same time that sometimes a machine's CPU goes up to 100%, requests fail, and about half the time it recovers, and half the time the resulting healthcheck failure causes it to be terminated. This severe issue means that we are not rolling out .NET core into production any further.

Since something is going wrong in the class DefaultInstanceProfileAWSCredentials, we therefore raised this PR for discussion: #1010

My questions related to this class are:

  • Why do you use Instance?.logger.Error() on line 155? Instance?. - with a null check - is not done anywhere else. if Instance is null, it will fail in other places, including the line that immediately follows. I don't think that Instance can be null anywhere in RenewCredentials().

  • The Instance getter uses a lock to make sure that the _instance is not null. But the lock is acquired every time the property is accessed. This appears to be an ideal candidate for double-checked locking to reduce the amount of locking here drastically.

  • I would expect Instance.lastRetrievedCredentials = null on line 159 to come before the logging that preceeds it, since getting into the desired state is more important and faster than the logging. If logging takes a while or blocks, then other threads have more chance to access the state that you have decided is unwanted.

  • As pointed out by @arishlabroo, is it necessary to set the creds to null at all? Is keeping the previous creds not a better option?

  • Are you aware of the issues with spinlocks under load? This looks a bit like the CPU spikes that we see. 5 seconds (line 42) is a very long time to wait on a cpu-intensive spinlock.

  • Where the Spinlock is used, Is any lock necessary at all? It is around assignments and reads. .NET reference assignment is an atomic operation anyway. There is no halfway state that other threads need to be locked away from, inside a statement of the form a = b;. I think this is also true of statements like a.b.c = d;.

If you're worried about things changing during your slightly more complex read expression, then you can do

var mine = lastRetrievedCredentials; // atomic assign
 credentials = mine?.Copy();

But if the IL for that is the same as before, then there is no need for even a temp var.

  • Have you considered not using Instance in RenewCredentials() since the Instance getter can potentially call RenewCredentials() again. There may be no actual potential for recursion, but it gives me a bad feeling to call the property that called you, to essentially get this with repeated checks and locks. You can use _instance, or even just use this implicitly, which should be the same object as Instance. If they're not the same not, you have other issues.

The way that EC2InstanceMetadata.IAMSecurityCredentials is sync but actually also async inside AsyncHelpers is a deeper issue that requires a more comprehensive fix, out of scope today.

@jonstelly
Copy link

jonstelly commented Jul 31, 2018

Not to hijack this issue, but I'm getting a really similar exception though it sounds like in a different environment (.NET 4.7.1, ECS and SSM not SQS).

We're running a Windows IIS docker image in ECS, running our ASP.NET 4.7.1 web application. AWSSDK is latest available in nuget. We're getting this with the SimpleSystemsManagement Client and not SQS, but this error seems to be happening in AWSSDK.Core so I thought it may be related.

I've verified that we have all of the ECS Task IAM Role stuff configured correctly, I see the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable set, if I docker exec to the container I'm able to do an Invoke-WebRequest "http://169.254.170.2$($env:AWS_CONTAINER_CREDENTIALS_RELATIVE_URI)" -UseBasicParsing and see what looks like a valid metadata document. I've restarted the task a couple times to try different ECS hosts. Nothing seems to work here.

Sorry if this turns out to be a different/unrelated issue, but I thought it might be helpful for tracking this issue down since they seem so similar.

Stack Trace:

2018-07-31 03:32:18.8118 ERROR [R= C= I=] An exception occurred that was not caught by the controller
EXCEPTION: System.InvalidOperationException: Failed to retrieve credentials from EC2 Instance Metadata Service. (See inner exception for details.) ---> Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.MetricsHandler.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RuntimePipeline.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.AmazonServiceClient.Invoke[TRequest,TResponse](TRequest request, IMarshaller`2 marshaller, ResponseUnmarshaller unmarshaller)
   at Amazon.SimpleSystemsManagement.AmazonSimpleSystemsManagementClient.GetParameter(GetParameterRequest request)
   at Name.Changed.To.Protect.The.Innocent.AmazonParameterStoreConfigurationProvider.RetrieveValue(String name) in C:\w\Source\Path\Changed\Too\AmazonParameterStoreConfigurationProvider.cs:line 27
   at lambda_method(Closure , Object[] )
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   --- End of inner exception stack trace ---
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   at Autofac.Core.Lifetime.LifetimeScope.GetOrCreateAndShare(Guid id, Func`1 creator)
   at Autofac.Core.Resolving.InstanceLookup.Execute()
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.InstanceLookup.Activate(IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Execute()
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.ResolveOperation.Execute(IComponentRegistration registration, IEnumerable`1 parameters)
   at Autofac.ResolutionExtensions.TryResolveService(IComponentContext context, Service service, IEnumerable`1 parameters, Object& instance)
   at Autofac.ResolutionExtensions.ResolveOptionalService(IComponentContext context, Service service, IEnumerable`1 parameters)
   at Autofac.Integration.Mvc.AutofacDependencyResolver.GetService(Type serviceType)
   at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
   --- End of inner exception stack trace ---
   at System.Web.Mvc.DefaultControllerFactory.DefaultControllerActivator.Create(RequestContext requestContext, Type controllerType)
   at System.Web.Mvc.MvcHandler.ProcessRequestInit(HttpContextBase httpContext, IController& controller, IControllerFactory& factory)
   at System.Web.Mvc.MvcHandler.BeginProcessRequest(HttpContextBase httpContext, AsyncCallback callback, Object state)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

AWS SDK Log:

Amazon Error: 0 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 2 : AmazonServiceException making request GetParameterRequest to https://ssm.us-east-1.amazonaws.com/. Attempt 1., Amazon.Runtime.AmazonServiceException: Failed to retrieve credentials from EC2 Instance Metadata Service.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.GetCredentials()
   at Amazon.Runtime.Internal.CredentialsRetriever.PreInvoke(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.CredentialsRetriever.InvokeSync(IExecutionContext executionContext)
   at Amazon.Runtime.Internal.RetryHandler.InvokeSync(IExecutionContext executionContext)
Amazon Error: 1 : Unable to contact EC2 Metadata service., System.Net.WebException: The operation has timed out
   at System.Net.HttpWebRequest.GetResponse()
   at Amazon.Util.AWSSDKUtils.DownloadStringContent(Uri uri, TimeSpan timeout)
   at Amazon.Util.EC2InstanceMetadata.GetItems(String relativeOrAbsolutePath, Int32 tries, Boolean slurp)
Amazon Error: 1 : Failed to retrieve credentials from EC2 Instance Metadata Service., System.NullReferenceException: Object reference not set to an instance of an object.
   at Amazon.Runtime.DefaultInstanceProfileAWSCredentials.RenewCredentials(Object unused)

@AnthonySteele
Copy link

In order to characterise these GetCredentials() issues as related or not, I would ask:

  • Do they occur at startup or later?
  • Every time a certain circumstance happens, or intermittently?
  • Does it happen more at peak request volume, or under light load?
  • What does the machine CPU level do at the same time? Was it low before, did it increase?

@jonstelly
Copy link

  • We see it on all requests we make for a parameter. We've got some caching (5m) for values but we never cache the value since we get the exception
  • every time
  • The EC2 instances are t2.large, very light load, plenty of memory free
  • always
  • CPU usage hits 30-40% while our app initializes, then sits at ~3% usage

I think I'm going to spin up a sample web app that does nothing but get an SSM parameter and spit out some diagnostic info... I may even build against a one-off build of the AWS SDK with some additional logging thrown in there.

If you don't think the issues are related I can open a new issue for tracking them independently.

@AnthonySteele
Copy link

AnthonySteele commented Aug 1, 2018

That sounds quite different to what I am seeing. What i am seeing is

  • intermittent, after the machines have been running for hours / days
  • more likely to happen under load, but not predictable
  • CPU spikes from the baseline around 3%, up to 100%

@jonstelly
Copy link

jonstelly commented Aug 1, 2018

We found our issue, it's a quirky behavior of the IIS docker image:

  • Docker container environment variables are set as process variables for the entrypoint (not machine environment variables)
  • servicemonitor.exe sets/propagates the process-level environment variables to IIS but only for the default app pool
  • we were using custom app-pools so the docker-injected environment variables weren't set
  • AWS SDK looks for the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable, uses the ECS Task IAM Role/Metadata behavior if its set
  • Since it wasn't set, the AWS SDK was using the default 169.254.169.254 address to try and retrieve metadata

I'm clearing out of this thread but wanted to document this a bit here. Thanks for the help/discussion.

@sstevenkang
Copy link
Contributor

sstevenkang commented Aug 1, 2018

@actumn how are you resolving the credentials in your container? Have you tried setting AWS_EC2_METADATA_DISABLED to true if you are not relying on IMDS?

@Reasonably
Copy link

@sstevenkang @actumn is my colleague. We only retrieve credential from IMDS. So if I set AWS_EC2_METADATA_DISABLED true, sqs client can't find credential, then throwing exception.

@sstevenkang
Copy link
Contributor

I am a little confused because it looks according to the report, you are running this application in a container not in an EC2 instance. Is it hitting IMDS on the host EC2 instance that's running the docker daemon?

@sstevenkang sstevenkang self-assigned this Aug 2, 2018
@Reasonably
Copy link

@sstevenkang Yes. I didn't set ECS Task Role and AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.

@LOB-JimD
Copy link

Below code is working for me.
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false)

Please refer to below document:
https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotnet-core-2-1

The .NetCore used is v2.1.2.

@Reasonably
Copy link

@LOB-JimD That code is same with setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER false.
After #1010 merged and aws core sdk version updated, the problem has been solved.

@LOB-JimD
Copy link

Thanks @Reasonably for the update. Look forward to use it.

@pnquest
Copy link

pnquest commented Sep 21, 2018

@Reasonably would this fix have specifically applied to the SQS Client? I am seeing the same issue when updating from core 2.0 to core 2.1 with the PutObjectAsync call. Setting DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER to 0 solves this issue.

@LOB-JimD
Copy link

LOB-JimD commented Sep 22, 2018

@Reasonably also tested #1010, it does not fix the issue. Using the IWebHostBuilder.UseLibuv() to work around this issue.
aspnet/Announcements#296

@Reasonably
Copy link

@LOB-JimD #1010 is for that the credential don't set to null when retrieving credential has failed.
Before #1010 , when retrieving credentials failed, the client set credentials to null and so many error occurred and cpu spiked during caching time.
I think it(http request fail) can happened sometimes. But it is not critical because of some retry logic.

P.S. I'm so sorry about my terrible english skill

@AnthonySteele
Copy link

AnthonySteele commented Sep 22, 2018

@Reasonably
Have you tried updating .NETCore to version 2.1.402 and Microsoft.AspNetCore.All to version 2.1.4 ?
It fixes this issue: aspnet/KestrelHttpServer#2694
And in our experience so far, we're not seeing CPU spikes afterwards.

@normj
Copy link
Member

normj commented Jan 3, 2019

The issue has gotten to confusing and I'm sure what is the state of things. I'm going to close this issue. If anybody is still experience the feel free to create a new issue.

@normj normj closed this as completed Jan 3, 2019
@diehlaws diehlaws added guidance Question that needs advice or information. and removed Question labels Jan 3, 2019
@kalebmccalllancom
Copy link

For anyone encountering this error on a windows EC2 instance, try navigate to http://169.254.169.254/latest/meta-data/ from the server. If the page is unreachable then the SSM agent has not been initialized, this can happen if you customize the base image

@gtrak
Copy link

gtrak commented Apr 12, 2019

I had this issue on asp.net core 2.1 targeted to framework net462 via the IIS hosting module, and it turned out to be some proxy configuration that I couldn't easily see, though presented itself as a nullreferenceexception initially.

After getting AWS SDK logging working, I was clearly able to see it was trying to access the EC2 Instance Metadata Service at 127.0.0.1:8888, which made no sense given IIS configuration.

The workaround was to disable proxies globally in code by setting this to null: https://docs.microsoft.com/en-us/dotnet/api/system.net.webrequest.defaultwebproxy?view=netframework-4.7.2#System_Net_WebRequest_DefaultWebProxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests