Description
Related dev. PR: tarantool/tarantool#6813
Product: Tarantool
Since: 2.10
Audience/target: developers
Root document:
https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_ctl/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_events/ (when it appears, after #2407)
https://www.tarantool.io/en/doc/latest/dev_guide/internals/box_protocol/ (check after #2408 if anything is to be added)
SME: @ locker
Details
In Tarantool 2.10.0-beta2-78-g2e9cbec3091e, a new system event was introduced, 'box.shutdown'. A server generates this event with the value equal to true
when it's asked to exit (os.exit()
is called or SIGTERM signal is received). (Essentially, a server simply calls box.broadcast('box.shutdown', true)
from a box.ctl.on_shutdown()
trigger callback.) As any other event, 'box.shutdown' is broadcasted to all remote watchers subscribed to it (see IPROTO_WATCH
). The event is supposed to be used by connectors to implement the graceful shutdown protocol:
- Server receives a shutdown request (
os.exit()
or SIGTERM). - Server broadcasts 'box.shutdown' event with the value set to
true
. - Server stops accepting new connections.
- Server waits for all connections that subscribed to the event to close.
- Client receives 'box.shutdown' event with the value
true
. - Client does its housekeeping needed to gracefully close the connection. (It may send new requests.)
- Client closes the connection.
- Server exits once all connections that received the 'box.shutdown' event have been closed or a timeout occurs.
The timeout is configured with box.ctl.set_on_shutdown_timeout()
. It's set to 3 seconds by default.
The graceful shutdown protocol is implemented by the net.box connector as follows:
- Upon receiving a 'box.shutdown' event with the value set to
true
, a net.box connection invokes user-defined triggers installed with the new connection method,on_shutdown()
. Theon_shutdown()
method has the same API as any other connection method used for installing triggers, for example,on_disconnect()
.on_shutdown()
triggers are invoked from a new fiber. Whileon_shutdown()
triggers are running, the connection remains active. This means that it's allowed to send new requests from a trigger callback. - After
on_shutdown()
triggers return, the net.box connection switches to the newgraceful_shutdown
state. In this state, no new requests are allowed. - Once all in-progress requests have been completed, the net.box connection is closed. To be more precise, it's switched to the
error
orerror_reconnect
state, depending on whetherreconnect_after
connection option is set, with the error message set to "Peer closed", just like it used to without the newgraceful_shutdown
state, when the server immediately closed the connection on shutdown.
If the server doesn't support the new 'box.shutdown' event (or doesn't support IPROTO_WATCH
), on_shutdown()
triggers will never be executed and the connection will be abruptly closed by the server.
Don't forget to update the net.box state machine diagram on this page:
initial -> auth -> fetch_schema <-> active
fetch_schema, active -> graceful_shutdown
(any state, on error) -> error_reconnect -> auth -> ...
\
-> error
(any state, but 'error') -> closed
Definition of done
Do this issue after the following:
Document box.watch and box.broadcast
Document IPROTO watchers
- Document the box.shutdown system event
- Document box.ctl.set_on_shutdown_timeout()
- Document conn:on_shutdown()
- Update the net.box state machine diagram
- Check the translation