-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful Shutdown #6602
Comments
Interesting read on this topic: https://www.rudderstack.com/blog/implementing-graceful-shutdown-in-go/ |
We should probably first tackle this issue for the case when we're not using the ocis runtime. (I.e. fix the case when running the service as separated processes. This should at least avoid the race condition with the signal handlers of the different reva services. We then just need to make the graceful shutdown behavior, which is currently triggered with a SIGQUIT, the default behavior for SIGTERM and SIGINT. (maybe based on a config setting). And get rid of those os.Exit() calls. In second step we could then figure out how to disable the signal handlers in reva and trigger a proper reva shutdown from within the ocis runtime. |
@rhafer I agree, we should focus on the single service runtime first. IMO we could even look only on the storage users and storage system services for now. Another idea might be to gather some information how severe the problem shows up in real life. We could run the inspector on the load test instance on the VP @wkloucek Is that system still there? |
To scan all personal and project spaces this might be helpful:
|
Add STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT setting to allow an graceful shutdown of the storage-users service. This currently only applicable when running storage-user as a sepearate service. Setting STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT to a non-zero value gives the storage-users service a chance to cleanly shutdown and finish any in progess tasks (e.g. metadata propagation) before exiting. Partial-Fix: #6602
The main problem here is the reva runtime on the ocis one. |
#6840 together with cs3org/reva#4072 should allow a more graceful shutdown of the storage-users service, when running it standalone. It might make sense to introduce the To do a proper shutdown when running everything in one process, we'll need to get rid of the |
Add STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT setting to allow an graceful shutdown of the storage-users service. This currently only applicable when running storage-user as a sepearate service. Setting STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT to a non-zero value gives the storage-users service a chance to cleanly shutdown and finish any in progess tasks (e.g. metadata propagation) before exiting. Partial-Fix: #6602
Add STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT setting to allow a graceful shutdown of the storage-users service. This currently only applicable when running storage-user as a sepearate service. Setting STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT to a non-zero value gives the storage-users service a chance to cleanly shutdown and finish any in progess tasks (e.g. metadata propagation) before exiting. Partial-Fix: #6602
Add STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT setting to allow a graceful shutdown of the storage-users service. This currently only applicable when running storage-user as a sepearate service. Setting STORAGE_USERS_GRACEFUL_SHUTDOWN_TIMEOUT to a non-zero value gives the storage-users service a chance to cleanly shutdown and finish any in progess tasks (e.g. metadata propagation) before exiting. Partial-Fix: #6602
Describe the bug
When running in single binary mode, sending a SIGTERM (or SIGINT or SIGQUIT) to the ocis process will trigger a signal handlers for each reva service (https://github.com/cs3org/reva/blob/master/cmd/revad/internal/grace/grace.go#L265) where the first Handler that finishes with just call os.Exit() and end the complete
ocis
process.We need to find a way to cleanly shutdown an ocis, finishing all in-flight request in order to prevent things like: https://github.com/owncloud/enterprise/issues/5783 from happening during regluar operation.
The text was updated successfully, but these errors were encountered: