Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Extensions.Configuration.AzureKeyVault package 3.1.1 fails to retrieve KV objects #50037

Closed
v-joolat opened this issue Jun 18, 2020 · 17 comments

Comments

@v-joolat
Copy link

Describe the bug

We have an Azure App service referencing a secret in KeyVault. However we have noticed recently the the application isn't able to reach KV for some reason. Checking the errorlogs we see:

Application '/LM/W3SVC/XXXXXXX/ROOT' with physical root 'D:\home\site\wwwroot' hit unexpected managed exception, exception code = '0xe0434352'. First 30KB characters of captured stdout and stderr logs:
Unhandled exception. System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

From the KV side we do not see any corresponding failed requests, only a bunch of 200s.
After several troubleshooting steps, we discovered that the issue is with the Microsoft.Extensions.Configuration.AzureKeyVault package. We are currently running version 3.1.1 and downgrading to 2.2.0 resolves this issue.

We suspect that the 3.1.1 package is consuming too many network connections with Key Vault.

To Reproduce

Steps to reproduce the behavior:

  1. Using version '...' of package '...'
  2. Run this code '....'
  3. With these arguments '....'
  4. See error

Expected behavior

We expect that the application should be able to retrieve keys from the KV without throwing error 500s

Screenshots

image

Additional context

@msfcolombo
Copy link

The symptom is the same of ephemeral port exhaustion: https://docs.microsoft.com/en-us/azure/app-service/troubleshoot-intermittent-outbound-connection-errors.

The hypothesis is that the new version is consuming more TCP connections to Key Vault service than the previous one either because they stay longer before getting garbage-collected, or because they are being leaked.

@Pilchie
Copy link
Member

Pilchie commented Jun 18, 2020

@pakrym - Do you have any ideas here?

@v-joolat - can you consider moving to the https://www.nuget.org/packages/Azure.Extensions.Configuration.Secrets/ package that will replace this one going forward?

@pakrym
Copy link
Contributor

pakrym commented Jun 18, 2020

Doesn't seem like port exhaustion because the error happens during application startup. Maybe some networking problems?

@samuelbcollie
Copy link

I'm having the same issue, but I ran the diagnostic tool in Azure and I was getting SNAT port exhaustion warnings. This is only happening after upgrading to NET Core 3.1 (from 2.2). It looks like there was a recent change to the way secrets are pulled in at startup. They used to be pull in serial, but now I think they're being pulled in parallel: dotnet/extensions#944

@kirkone
Copy link
Member

kirkone commented Jul 22, 2020

Hi,

I ran into the same isse here: codez-one/EasyConfig#2

I will give the new package mentioned by @Pilchie a try later today.
btw: the correct link for the new package is: https://www.nuget.org/packages/Azure.Extensions.AspNetCore.Configuration.Secrets/

@jvirtala
Copy link

jvirtala commented Aug 6, 2020

We also started seeing this suddenly after a boot on an Azure App Service. We're using Microsoft.Extensions.Configuration.AzureKeyVault 3.1.2

We've had some issues before with requesting too many secrets at startup, which hits KV rate limits, but this is something different. And weirdly it only affects one instance of two. Locally I have no issues running against same KV.

@kirkone did you have any luck with the new package?

@kirkone
Copy link
Member

kirkone commented Aug 7, 2020

@janiilmari I tried the new package, migration was not so complicated.
My extension is working in my test environment, but I must check in more detail.

I will update as soon as possible.

@pradeepiyerust
Copy link

@msfcolombo I'm still facing this issue with version 3.1.1. Has this been addressed in the newer version of this package? For example, Version 3.1.6. If not, are you suggesting any alternatives (a stable version for Production use) to this package. Key Vault is a very important part of our infrastructure, and not being able to upgrade this package in future could be a huge problem.

@jvirtala
Copy link

@kirkone thanks for updating.

We downgraded Microsoft.Extensions.Configuration.AzureKeyVault to 2.2.0 and that helped for a while. We're now in talks with Azure Support and they said that it might be a network issue. Key Vault logs do not show failed requests even though some instances get SocketExceptions.

@SamuelCox
Copy link

SamuelCox commented Aug 27, 2020

Any update on this? Also affected

@jvirtala
Copy link

jvirtala commented Aug 27, 2020 via email

@OskarKlintrot
Copy link

OskarKlintrot commented Sep 7, 2020

Any updates on this? We also had a few hours downtime in production because of this issue. Since Azure App Service boots up new instances from time to time this can happen even if the initial deploy went fine, in our case an instance just died after a few weeks of uptime. Kinda nasty.

@Pilchie
Copy link
Member

Pilchie commented Sep 9, 2020

There hasn't been a lot of investigation here, because AFAIK, everyone who has updated to the new Microsoft.Azure packages has been successful, so trying those would be my first suggestion.

@jvirtala
Copy link

jvirtala commented Sep 9, 2020

There hasn't been a lot of investigation here, because AFAIK, everyone who has updated to the new Microsoft.Azure packages has been successful, so trying those would be my first suggestion.

I got this official statement from Product Group through Azure Support:

The Microsoft Azure App Service Team has identified an issue with the Key Vault references for App Service and Azure Functions feature related to intermittent failure to resolve references at runtime.

Engineers identified a regression in the system that reduced the performance and availability of our scale unit’s ability to retrieve key vault references at runtime. A patch has been written and deployed to our fleet of VMs to mitigate this issue.

We are continuously taking steps to improve the Azure Web App service and our processes to ensure such incidents do not occur in the future, and in this case, it includes (but is not limited to):
Improving detection and testing of performance and availability of the Key Vault App Setting References feature
Improvements to our platform to ensure high availability of this feature at runtime.
We apologize for any inconvenience.

I can’t verify if this has been resolved as we don’t experience the issue anymore (see our solution to it in the previous comment).

@Anipik Anipik transferred this issue from dotnet/extensions Mar 22, 2021
@Anipik Anipik transferred this issue from dotnet/aspnetcore Mar 22, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 22, 2021
@ghost
Copy link

ghost commented Mar 23, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Describe the bug

We have an Azure App service referencing a secret in KeyVault. However we have noticed recently the the application isn't able to reach KV for some reason. Checking the errorlogs we see:

Application '/LM/W3SVC/XXXXXXX/ROOT' with physical root 'D:\home\site\wwwroot' hit unexpected managed exception, exception code = '0xe0434352'. First 30KB characters of captured stdout and stderr logs:
Unhandled exception. System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

From the KV side we do not see any corresponding failed requests, only a bunch of 200s.
After several troubleshooting steps, we discovered that the issue is with the Microsoft.Extensions.Configuration.AzureKeyVault package. We are currently running version 3.1.1 and downgrading to 2.2.0 resolves this issue.

We suspect that the 3.1.1 package is consuming too many network connections with Key Vault.

To Reproduce

Steps to reproduce the behavior:

  1. Using version '...' of package '...'
  2. Run this code '....'
  3. With these arguments '....'
  4. See error

Expected behavior

We expect that the application should be able to retrieve keys from the KV without throwing error 500s

Screenshots

image

Additional context

Author: v-joolat
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@karelz
Copy link
Member

karelz commented Mar 30, 2021

Triage:
Original problem seems to be timeout. Perhaps hitting SNAT limit? ... without additional repro, or logs we can't say much more and it is not actionable.
From further discussion this seems to be gone and perhaps it was a problem on the server.
Closing for now as not enough data for investigation.

If anyone has isolated repro, we can reopen and help further. Azure KeyVault team should be probably involved to expedite the investigation.

@karelz karelz closed this as completed Mar 30, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 29, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

13 participants