Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow /.well-known/openid-configuration endpoints #1565

Open
4865783a5d opened this issue Jan 28, 2025 · 7 comments
Open

Slow /.well-known/openid-configuration endpoints #1565

4865783a5d opened this issue Jan 28, 2025 · 7 comments

Comments

@4865783a5d
Copy link

4865783a5d commented Jan 28, 2025

Which version of Duende IdentityServer are you using?
7.1.0
Which version of .NET are you using?
NET8
Describe the bug
The endpoints
/.well-known/openid-configuration
/.well-known/openid-configuration/jwks

have a 99th percentile performance of ~1 sec with peaks up to 20 secs during ~ 5.45 AM GMT+1

App Insights Performance
Image

Trace
Image

During peaks to ~20 secs, other endpoints slow down considerably as well

Image

To Reproduce
Deploy to Azure App Service

Expected behavior
99th percentile performance of ~20 ms.

Log output/exception with stacktrace

09:23:09.119	Request	Name: GET /.well-known/openid-configuration/jwks, Successful request: true, Response time: 1.0 s, URL: https://123/.well-known/openid-configuration/jwks
09:23:09.119	Dependency	Name: CachingCorsPolicyService.IsOriginAllowed, Type: Other, Call status: true, Duration: 7.3 μs
09:23:09.121	Trace	Severity level: Warning, Message: CorsPolicyService did not allow origin: https://zzz
09:23:09.122	Dependency	Name: IdentityServerProtocolRequest, Type: Other, Call status: true, Duration: 594.3 μs
09:23:09.122	Dependency	Name: DiscoveryEndpoint, Type: Other, Call status: true, Duration: 355.9 μs
09:23:09.122	Dependency	Name: DiscoveryResponseGenerator.CreateJwkDocument, Type: Other, Call status: true, Duration: 347.4 μs
09:23:09.122	Dependency	Name: DefaultKeyMaterialService.GetValidationKeys, Type: Other, Call status: true, Duration: 333.9 μs
09:23:09.122	Dependency	Name: KeyManager.GetAllKeys, Type: Other, Call status: true, Duration: 307.1 μs
09:23:09.122	Trace	Message: Getting all the keys.
09:23:09.122	Trace	Message: Cache hit when loading all keys.
09:23:09.122	Trace	Message: Looking for active signing keys.
09:23:09.122	Trace	Message: Looking for an active signing key for alg RS256.
09:23:09.122	Trace	Message: Checking if key with kid xxxxx is active (respecting activation delay).
09:23:09.122	Trace	Message: Key with kid xxxxx is active.
09:23:09.122	Trace	Message: Active signing key found (respecting the activation delay) with kid: xxxxx.
09:23:09.122	Trace	Message: Found active signing key for alg RS256 with kid xxxxx.
09:23:09.122	Trace	Message: Checking if key with kid xxxxx is active (respecting activation delay).
09:23:09.122	Trace	Message: Key with kid xxxxx is active.
09:23:09.122	Trace	Message: Active signing key found (respecting the activation delay) with kid: xxxxx.
09:23:09.122	Trace	Message: Key rotation not required for alg RS256; New key expected to be created in 25.23:20:06.2274435
09:23:09.122	Dependency	Name: WriteJson, Type: Other, Call status: true, Duration: 175.9 μs

Additional context
Our infrastructure consists of Azure App Services in West- and North Europe, load balanced through Azure Front Door. Our Azure Sql Server is a Business Critical Gen5 / 8 vCores (40 GB Ram). The PaaS resource usage is less than 5%.

We found #1361 but no improvement occurred after updating Azure.Core to 1.44.
The data protection is configured with EF Core and protected with Azure Key Vault.

NuGet Versions with ~20ms 99th percentile (Pre 8th January)

> Azure.Core                                                                         1.37.0
> Azure.Data.AppConfiguration                                                        1.3.0
> Azure.Extensions.AspNetCore.Configuration.Secrets                                  1.3.1
> Azure.Extensions.AspNetCore.DataProtection.Keys                                    1.2.3
> Azure.Identity                                                                     1.10.4
> Azure.Messaging.EventGrid                                                          4.7.0
> Azure.Security.KeyVault.Keys                                                       4.2.0
> Azure.Security.KeyVault.Secrets                                                    4.3.0
> MdS.Azure.Identity                                                                 3.0.1
> Microsoft.Azure.AppConfiguration.AspNetCore                                        7.0.0
> Microsoft.Azure.KeyVault                                                           2.3.2
> Microsoft.Azure.KeyVault.WebKey                                                    2.0.7
> Microsoft.Azure.Services.AppAuthentication                                         1.0.3
> Microsoft.Extensions.Configuration.AzureAppConfiguration                           7.0.0
> Microsoft.Extensions.Configuration.AzureKeyVault                                   3.1.24
> Microsoft.Rest.ClientRuntime.Azure                                                 3.3.7
> Microsoft.SourceLink.AzureRepos.Git                                                8.0.0

> Microsoft.IdentityModel.Abstractions                                               7.1.2
> Microsoft.IdentityModel.Clients.ActiveDirectory                                    3.14.2
> Microsoft.IdentityModel.JsonWebTokens                                              7.1.2
> Microsoft.IdentityModel.Logging                                                    7.1.2
> Microsoft.IdentityModel.Protocols                                                  7.1.2
> Microsoft.IdentityModel.Protocols.OpenIdConnect                                    7.1.2
> Microsoft.IdentityModel.Tokens                                                     7.1.2
> System.IdentityModel.Tokens.Jwt                                                    7.1.2

> Duende.IdentityServer                                                              7.0.4
> Duende.IdentityServer.AspNetIdentity                                               7.0.4
> Duende.IdentityServer.EntityFramework                                              7.0.4
> Duende.IdentityServer.EntityFramework.Storage                                      7.0.4
> Duende.IdentityServer.Storage                                                      7.0.4

NuGet Versions with ~1sec 99th percentile (Post 8th January)

> Azure.Core                                                                         1.44.1
> Azure.Data.AppConfiguration                                                        1.3.0
> Azure.Extensions.AspNetCore.Configuration.Secrets                                  1.3.1
> Azure.Extensions.AspNetCore.DataProtection.Keys                                    1.3.0
> Azure.Identity                                                                     1.10.4
> Azure.Messaging.EventGrid                                                          4.7.0
> Azure.Security.KeyVault.Keys                                                       4.6.0
> Azure.Security.KeyVault.Secrets                                                    4.3.0
> MdS.Azure.Identity                                                                 3.0.1
> Microsoft.Azure.AppConfiguration.AspNetCore                                        7.0.0
> Microsoft.Azure.KeyVault                                                           2.3.2
> Microsoft.Azure.KeyVault.WebKey                                                    2.0.7
> Microsoft.Azure.Services.AppAuthentication                                         1.0.3
> Microsoft.Extensions.Configuration.AzureAppConfiguration                           7.0.0
> Microsoft.Extensions.Configuration.AzureKeyVault                                   3.1.24
> Microsoft.Rest.ClientRuntime.Azure                                                 3.3.7
> Microsoft.SourceLink.AzureRepos.Git                                                8.0.0

> Microsoft.IdentityModel.Abstractions                                               7.1.2
> Microsoft.IdentityModel.Clients.ActiveDirectory                                    3.14.2
> Microsoft.IdentityModel.JsonWebTokens                                              7.1.2
> Microsoft.IdentityModel.Logging                                                    7.1.2
> Microsoft.IdentityModel.Protocols                                                  7.1.2
> Microsoft.IdentityModel.Protocols.OpenIdConnect                                    7.1.2
> Microsoft.IdentityModel.Tokens                                                     7.1.2
> System.IdentityModel.Tokens.Jwt                                                    7.1.2

> Duende.IdentityServer                                                              7.1.0
> Duende.IdentityServer.AspNetIdentity                                               7.1.0
> Duende.IdentityModel                                                               7.0.0
> Duende.IdentityServer.EntityFramework                                              7.1.0
> Duende.IdentityServer.EntityFramework.Storage                                      7.1.0
> Duende.IdentityServer.Storage                                                      7.1.0

We have added aggressive caching with a custom DiscoveryResponseGenerator and are now seeing the following behavior:

Image

During the time between 14:30 - 14:40, I ran a dummy app to poll the discovery endpoint in an interval - latency was perfectly fine there. After stopping the dummy app, the regular traffic calling the endpoint started seeing latencies of ~1 sec again.

The relevant caching code for CreateDiscoveryDocumentAsync is here, CreateJwkDocumentAsync is similarly implemented.

  var key = $"{_memoryKeyPrefix}{baseUrl}-{issuerUri}";

  if (_memoryCache.TryGetValue(key, out Dictionary<string, object>? discoveryDocument) && discoveryDocument != null)
      return discoveryDocument;

  discoveryDocument = await base.CreateDiscoveryDocumentAsync(baseUrl, issuerUri);

  var expiration = _timeProvider.GetUtcNow().AddSeconds(Options.Discovery.ResponseCacheInterval ?? _defaultExpirationInSeconds);
  _memoryCache.Set(key, discoveryDocument, expiration);

  return discoveryDocument;
@AndersAbel
Copy link
Member

AndersAbel commented Jan 28, 2025

It indeed looks like something happened in that upgrade, yes.

Looking at the example trace for the GET request indicates that the problem is outside of the IdentityServer pipeline. The IdentityServerProtocolRequest activity wraps (nearly*) all the processing done by IdentityServer. In the screen shot we can see that the GET request took a total of 1 sec, but that the IdentityServerProtocolRequest was done in 565 micro seconds. That leaves 99.5 milliseconds to explain by something else.

*) To be 100% correct, there are a few infrastructure level things that are done outside of that block. The first is that the activity only fires if the path matches an IdentityServer endpoint => the endpoint resolution happens outside of the block. It's a simple for loop with only in memory dependencies and I cannot imagine how that would take close to a second.

Also if you are using the dynamic providers feature, handling of those is outside of the IdentityServerProtocolRequest activity.

My overall feeling here based on the diagnostics shared is that it's something happening before or after the actual IdentityServer code is invoked. Did you do any code changes as part of the upgrade? Any changes to infrastructure?

@4865783a5d
Copy link
Author

4865783a5d commented Jan 28, 2025

Thanks for your response. There is no pre / post processing there but I'll try to get more diagnostics from the NET Core pipeline. The strange thing is, that its only affecting one region in Azure - North Europe has about 20% traffic of West Europe but no issues at all. I'll try to reproduce locally as we see the problem in 3 different environments. I'll also further investigate transitive dependencies

Also note, all other protocol endpoints (/introspect, /token etc.) behave normal. Its just the well-known one.

@4865783a5d
Copy link
Author

4865783a5d commented Jan 29, 2025

Checking App Insights today, we've seen the following:

Image

/well-known Endpoints have very high latency while other Duende Protocol Endpoints (Introspect, Token) increase as well, just not as much. I'll investige further within the ASP.NET Core pipeline but if you could check on your end as well @AndersAbel to see if there is a difference between the endpoints.

We're using YARP as proxy in front of identity server, I'll check if there is a problem there.

@AndersAbel
Copy link
Member

Thanks for sharing those stats. The ./well-known endpoints are actually the ones I consider most simple in their implementation. There is less code to run and less storage/database access. The token endpoint in comparison is more complex, but also (as far as I remember) utilizes all storage/config that the ./well-known endpoints to.

I do not doubt that this is a problem, but to properly troubleshoot we would need full activity traces that shows timing all the way from the client's requests to how it is handled on the server side. The only tangible data point we have so far is the one I referenced above and that one shows that the execution of the discovery endpoint class only takes up a fraction of the total time.

Are you using the dynamic providers feature?

@4865783a5d
Copy link
Author

I've opened another issue for YARP and after collecting metrics and checking with the team there, it seems that YARP does not cause the latency. Metrics suggest that the request is immediatly forwarded to the network stack.

We do not use dynamic providers.

As I've mentioned above, we started service the ./well-known endpoints response from memory cache, as it is very static. Still, the 99P is 1.03 secs for both endpoints (App Insights from today morning):

Image

In comparison, the /connect/token is way more complex in terms of logic and PaaS access (DB calls to PersistedGrant table, which in our case is ~30 GB).

I'll enable ASP.NET core telemetry to see where the additional time comes from.

@AndersAbel
Copy link
Member

We do not use dynamic providers.
Thank you for that confirmation. The dynamic providers is a feature that runs outside of the IdentityServerProtocolRequest activity and that has database access and runs code that potentially be slow. I just wanted to make sure that we can exclude those from the possible causes.

I'll enable ASP.NET core telemetry to see where the additional time comes from.

I think that is the right next step. Right now we don't know where the time is spent and for any performance issue metrics is the only way to solve it. There are things in the discovery endpoint as well as in the IdentityServer endpoint selection/routing that could potentially cause timing issues (never say never in these cases until it is proved). The only thing I can say is that the numbers shared so far indicate that the issue is outside of the IdentityServer middleware. That doesn't mean IdentityServer is not to blame - we won't know until we have metrics that show where the issue is.

@4865783a5d
Copy link
Author

I'll set metrics logging up, that will require some code changes to use the new OpenTelemetry packages. We noticed that those long running requests come in pairs within a range of 10 ms from the same client

/.well-known/openid-configuration
/.well-known/openid-configuration/jwks

one of them finishes within the expected latency, the other is at 1 sec, as if there is a lock / resource contest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants