-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/Reload without downtime #4622
Labels
Comments
daipom
added a commit
to daipom/serverengine
that referenced
this issue
Aug 30, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.take_over_another_server(path) This starts a new server that has all UDP/TCP sockets of the existing server. It receives the sockets from the existing server and stops it before starts a new server. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
daipom
added a commit
to daipom/serverengine
that referenced
this issue
Aug 30, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.take_over_another_server(path) This starts a new server that has all UDP/TCP sockets of the existing server. It receives the sockets from the existing server and stops it after starts a new server. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
ashie
pushed a commit
to daipom/serverengine
that referenced
this issue
Sep 3, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.take_over_another_server(path) This starts a new server that has all UDP/TCP sockets of the existing server. It receives the sockets from the existing server and stops it after starts a new server. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Merged
3 tasks
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 21, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.take_over_another_server(path) This starts a new server that has all UDP/TCP sockets of the existing server. The old process should stop without removing the file for the socket after the new process starts. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 21, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.take_over_another_server(path) This starts a new server that has all UDP/TCP sockets of the existing server. The old process should stop without removing the file for the socket after the new process starts. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 21, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.share_sockets_with_another_server(path) This starts a new server that shares all UDP/TCP sockets with the existing server. The old process should stop without removing the file for the socket after the new process starts. This may not be the primary use case assumed by ServerEngine, but we need this feature to replace both the server and the workers with a new process without downtime. Currently, ServerEngine does not provide this feature for network servers. At the moment, I assume that the application side uses this feature ad hoc, but, in the future, this could be used to support live reload for entire network servers. ref: fluent/fluentd#4622 Limitation: This feature would not work well if the process opens new TCP ports frequently. Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 21, 2024
This provides live restart feature for network servers. (The existing live restart feature does not support network servers.) Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.share_sockets_with_another_server(path) This starts a new server that shares all UDP/TCP sockets with the existing server. The old process should stop without removing the file for the socket after the new process starts. ref: fluent/fluentd#4622 Limitation: This feature would not work well if the process opens new TCP ports frequently. Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 21, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.share_sockets_with_another_server(path) This starts a new server that shares all UDP/TCP sockets with the existing server. The old process should stop without removing the file for the socket after the new process starts. This allows us to replace both the server and the workers with new processes without socket downtime. (The existing live restart feature does not support network servers. We can restart workers without downtime, but there is no way to restart the network server without downtime.) ref: fluent/fluentd#4622 Limitation: This feature would not work well if the process opens new TCP ports frequently. Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
daipom
added a commit
to clear-code/serverengine
that referenced
this issue
Oct 22, 2024
Another process can take over UDP/TCP sockets without downtime. server = ServerEngine::SocketManager::Server.share_sockets_with_another_server(path) This starts a new server that shares all UDP/TCP sockets with the existing server. The old process should stop without removing the file for the socket after the new process starts. This allows us to replace both the server and the workers with new processes without socket downtime. (The existing live restart feature does not support network servers. We can restart workers without downtime, but there is no way to restart the network server without downtime.) ref: fluent/fluentd#4622 Limitation: This feature would not work well if the process opens new TCP ports frequently. Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com> Co-authored-by: Shizuo Fujita <fujita@clear-code.com>
github-project-automation
bot
moved this from Work-In-Progress
to Done
in Fluentd Kanban
Nov 28, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
Updating Fluentd or reloading a config causes downtime.
Plugins that receive data as a server, such as
in_udp
,in_tcp
, andin_syslog
, cannot receive data during this time.This means that the data sent by a client is lost during this time unless the client has a re-sending feature.
This makes updating Fluentd or reloading a config difficult in some cases.
Describe the solution you'd like
Add a new feature: Update/Reload without downtime.
For example, implement a mechanism similar to nginx's feature for upgrading on the fly.
The main problem is that Fluentd can't run in parallel with the same config.
(It causes some conflicts, such as buffer files)
Because of this problem, it is very difficult to support all plugins.
However, it is possible to support only plugins that can run in parallel.
Based on the above, the following mechanism would be a good way to achieve this.
More specifically, it would be better to run only limited Input plugins in parallel, such as
in_tcp
,in_udp
, andin_syslog
.Stop all plugins except those Input plugins, and prepare a dedicated file buffer for Output.
After the new workers start, they load the file buffer and route those events to the
@ROOT
label.Describe alternatives you've considered
None.
Additional context
I have already started to create a PoC.
The text was updated successfully, but these errors were encountered: