-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orderly router shutdown and resource cleanup #293
Comments
Needs fixing: #156 |
Goal: We want avoid runtime leaks in the router Old way: All leak analysis at process exit (by asan or alloc pool leak detection) New way: Shutdown all listeners, terminate connections, and OTHER STUFF; and then, |
I'm sorry but I still don't much understand this.
|
Even simpler trial case of a leak to be judged by the new rules, #156 |
I guess this is the main stumbling block for me. How do you do that? How do you tell that "this buffer was 'leaked' because we did not bother to free it (yet). How can you know it is not truly leaked because there is a bug in the code and it would never be cleared up, even if the connection it is related to was closed (while the router continues running)"? |
Won't fix. It has been decided to maintain the current resource cleanup architecture |
On shutdown the router stops all processes "mid stream" and attempts to manually clean up resources that remain in use. This is a best effort attempt and involves bespoke cleanup code for just about every system in the router. Here is an example where the router core attempts to release resources for all links that were active at the point of shutdown.
This is error prone and triggers many leak events in ASAN. This makes it nearly impossible to determine if a leak occurs during runtime (probably a serious issue) or just at shutdown (sloppy, but not service affecting).
The router's shutdown needs to be refactored to instead perform an orderly shutdown controlled by management. This feature would cause management to:
This will trigger the existing run-time cleanup for all outstanding messages, dispositions, etc. Once all connections finish closing, all in-flight messages are released, and other provisioned entities and services have been released, the subsystems (core, proactor, etc) can clean up and shutdown.
The devil is in the details for sure - this will be no small job.
The text was updated successfully, but these errors were encountered: