-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of TimeoutExceptions caused by locking in OrchardCore.Environment.Shell.Scope.ShellScope+<ActivateShellInternalAsync> #14381
Comments
Shell activation (run only once for a given shell) should be run atomically because for example it may execute database migrations. The time out and expiration values are configurable through |
@jtkech The problem we're running into is that we see this exception occurring for 40 minutes across all tenants (including Default) after a new instance is started up. I'm continuing to investigate to see if these exceptions are cause or symptom of another problem. |
Okay, let me know. Did you enable If yes, normally a lock auto expires with an expiration of 30s by default. If no, we use an |
@jtkech Yes, we're using Redis. I've been able to recreate this locally with a single tenant by just hitting it with a bunch (k6 test using 200 vus/400 iterations) of parallel requests. Works perfectly fine without Redis but as soon as I enabled Redis I only had a 67% pass rate. The weird thing is the entire test completes within 32 seconds. I wouldn't expect the lock time to matter so much with a single tenant so I'm looking into this now. |
@jtkech Interestingly enough I was able to get 100% pass rate using this simple change in the ShellScope.UsingAsync method. I'm going to do some more testing with this but wonder if you can see any problems it it?
|
Yes, we throw a
So looks like on startup the shell activation is too long, the default expiration and timeout are 30s, can you try to increase these values, for example to 60s
Otherwise good idea to only throw if the shell is not activated, for example
Or maybe just increase the default values. |
Just tried by using the Blog recipe, a local Redis instance, and Redis.Lock enabled. The test send 1000 requests with a concurrency level of 100, I did not have any issue. But still a good idea to not throw if the shell is activated but only log an error or warning instead. |
@jtkech Local Redis might be why you didn't see an issue. I used an Azure Redis instance for my testing when I was able to recreate locally. The change is going into production tomorrow so we'll be able to tell the impact next week since we have to restart our instances twice weekly due to memory leakage. If it works I can PR it for you since I'm already setup to do so and I could use the practice. Let me know if you want that or if you'll just PR it because it's so simple and might only take you 10 min compared to my hour+ :D |
Okay cool, about the PR no problem as you want, just let me know. Hmm, maybe also add a log message (maybe the warning level) when the lock failed to be acquired. And a comment ;) |
Okay I retried it with the max concurrent requests I was able to do and I could not repro, which is good somewhere ;) We have a retry logic to acquire a lock where we increase the delay before retrying, the max delay being 10s. What I think happens is that if there is at least 3 requests still waiting with this max retry delay, one may experience a timeout while waiting before a new retry. I did a PR to apply our suggested change. |
I'm mostly curious as to why there is an IDistributedCache being used for activating a ShellContext since this means that you can only have one ShellContext being activated no matter how many instances/tenants you have.
Shouldn't this always be a local lock so that each instance can start a shell context independent of other instances?
The text was updated successfully, but these errors were encountered: