-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow /.well-known/openid-configuration endpoints #1565
Comments
It indeed looks like something happened in that upgrade, yes. Looking at the example trace for the GET request indicates that the problem is outside of the IdentityServer pipeline. The *) To be 100% correct, there are a few infrastructure level things that are done outside of that block. The first is that the activity only fires if the path matches an IdentityServer endpoint => the endpoint resolution happens outside of the block. It's a simple for loop with only in memory dependencies and I cannot imagine how that would take close to a second. Also if you are using the dynamic providers feature, handling of those is outside of the My overall feeling here based on the diagnostics shared is that it's something happening before or after the actual IdentityServer code is invoked. Did you do any code changes as part of the upgrade? Any changes to infrastructure? |
Thanks for your response. There is no pre / post processing there but I'll try to get more diagnostics from the NET Core pipeline. The strange thing is, that its only affecting one region in Azure - North Europe has about 20% traffic of West Europe but no issues at all. I'll try to reproduce locally as we see the problem in 3 different environments. I'll also further investigate transitive dependencies Also note, all other protocol endpoints (/introspect, /token etc.) behave normal. Its just the well-known one. |
Checking App Insights today, we've seen the following: /well-known Endpoints have very high latency while other Duende Protocol Endpoints (Introspect, Token) increase as well, just not as much. I'll investige further within the ASP.NET Core pipeline but if you could check on your end as well @AndersAbel to see if there is a difference between the endpoints. We're using YARP as proxy in front of identity server, I'll check if there is a problem there. |
Thanks for sharing those stats. The ./well-known endpoints are actually the ones I consider most simple in their implementation. There is less code to run and less storage/database access. The token endpoint in comparison is more complex, but also (as far as I remember) utilizes all storage/config that the ./well-known endpoints to. I do not doubt that this is a problem, but to properly troubleshoot we would need full activity traces that shows timing all the way from the client's requests to how it is handled on the server side. The only tangible data point we have so far is the one I referenced above and that one shows that the execution of the discovery endpoint class only takes up a fraction of the total time. Are you using the dynamic providers feature? |
I've opened another issue for YARP and after collecting metrics and checking with the team there, it seems that YARP does not cause the latency. Metrics suggest that the request is immediatly forwarded to the network stack. We do not use dynamic providers. As I've mentioned above, we started service the In comparison, the I'll enable ASP.NET core telemetry to see where the additional time comes from. |
I think that is the right next step. Right now we don't know where the time is spent and for any performance issue metrics is the only way to solve it. There are things in the discovery endpoint as well as in the IdentityServer endpoint selection/routing that could potentially cause timing issues (never say never in these cases until it is proved). The only thing I can say is that the numbers shared so far indicate that the issue is outside of the IdentityServer middleware. That doesn't mean IdentityServer is not to blame - we won't know until we have metrics that show where the issue is. |
I'll set metrics logging up, that will require some code changes to use the new OpenTelemetry packages. We noticed that those long running requests come in pairs within a range of 10 ms from the same client /.well-known/openid-configuration one of them finishes within the expected latency, the other is at 1 sec, as if there is a lock / resource contest. |
Which version of Duende IdentityServer are you using?
7.1.0
Which version of .NET are you using?
NET8
Describe the bug
The endpoints
/.well-known/openid-configuration
/.well-known/openid-configuration/jwks
have a 99th percentile performance of ~1 sec with peaks up to 20 secs during ~ 5.45 AM GMT+1
App Insights Performance
Trace
During peaks to ~20 secs, other endpoints slow down considerably as well
To Reproduce
Deploy to Azure App Service
Expected behavior
99th percentile performance of ~20 ms.
Log output/exception with stacktrace
Additional context
Our infrastructure consists of Azure App Services in West- and North Europe, load balanced through Azure Front Door. Our Azure Sql Server is a Business Critical Gen5 / 8 vCores (40 GB Ram). The PaaS resource usage is less than 5%.
We found #1361 but no improvement occurred after updating Azure.Core to 1.44.
The data protection is configured with EF Core and protected with Azure Key Vault.
NuGet Versions with ~20ms 99th percentile (Pre 8th January)
NuGet Versions with ~1sec 99th percentile (Post 8th January)
We have added aggressive caching with a custom
DiscoveryResponseGenerator
and are now seeing the following behavior:During the time between 14:30 - 14:40, I ran a dummy app to poll the discovery endpoint in an interval - latency was perfectly fine there. After stopping the dummy app, the regular traffic calling the endpoint started seeing latencies of ~1 sec again.
The relevant caching code for
CreateDiscoveryDocumentAsync
is here,CreateJwkDocumentAsync
is similarly implemented.The text was updated successfully, but these errors were encountered: