Skip to content

Document graceful shutdown of net.box connections #2633

Closed
@TarantoolBot

Description

@TarantoolBot

Related dev. PR: tarantool/tarantool#6813

Product: Tarantool
Since: 2.10
Audience/target: developers
Root document:
https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_ctl/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_events/ (when it appears, after #2407)
https://www.tarantool.io/en/doc/latest/dev_guide/internals/box_protocol/ (check after #2408 if anything is to be added)
SME: @ locker

Details

In Tarantool 2.10.0-beta2-78-g2e9cbec3091e, a new system event was introduced, 'box.shutdown'. A server generates this event with the value equal to true when it's asked to exit (os.exit() is called or SIGTERM signal is received). (Essentially, a server simply calls box.broadcast('box.shutdown', true) from a box.ctl.on_shutdown() trigger callback.) As any other event, 'box.shutdown' is broadcasted to all remote watchers subscribed to it (see IPROTO_WATCH). The event is supposed to be used by connectors to implement the graceful shutdown protocol:

  1. Server receives a shutdown request (os.exit() or SIGTERM).
  2. Server broadcasts 'box.shutdown' event with the value set to true.
  3. Server stops accepting new connections.
  4. Server waits for all connections that subscribed to the event to close.
  5. Client receives 'box.shutdown' event with the value true.
  6. Client does its housekeeping needed to gracefully close the connection. (It may send new requests.)
  7. Client closes the connection.
  8. Server exits once all connections that received the 'box.shutdown' event have been closed or a timeout occurs.

The timeout is configured with box.ctl.set_on_shutdown_timeout(). It's set to 3 seconds by default.

The graceful shutdown protocol is implemented by the net.box connector as follows:

  1. Upon receiving a 'box.shutdown' event with the value set to true, a net.box connection invokes user-defined triggers installed with the new connection method, on_shutdown(). The on_shutdown() method has the same API as any other connection method used for installing triggers, for example, on_disconnect(). on_shutdown() triggers are invoked from a new fiber. While on_shutdown() triggers are running, the connection remains active. This means that it's allowed to send new requests from a trigger callback.
  2. After on_shutdown() triggers return, the net.box connection switches to the new graceful_shutdown state. In this state, no new requests are allowed.
  3. Once all in-progress requests have been completed, the net.box connection is closed. To be more precise, it's switched to the error or error_reconnect state, depending on whether reconnect_after connection option is set, with the error message set to "Peer closed", just like it used to without the new graceful_shutdown state, when the server immediately closed the connection on shutdown.

If the server doesn't support the new 'box.shutdown' event (or doesn't support IPROTO_WATCH), on_shutdown() triggers will never be executed and the connection will be abruptly closed by the server.

Don't forget to update the net.box state machine diagram on this page:

initial -> auth -> fetch_schema <-> active

fetch_schema, active -> graceful_shutdown

(any state, on error) -> error_reconnect -> auth -> ...
                                         \
                                          -> error
(any state, but 'error') -> closed

Definition of done

Do this issue after the following:
Document box.watch and box.broadcast
Document IPROTO watchers

  • Document the box.shutdown system event
  • Document box.ctl.set_on_shutdown_timeout()
  • Document conn:on_shutdown()
  • Update the net.box state machine diagram
  • Check the translation

Metadata

Metadata

Assignees

Labels

featureA new functionalityiprotoRelated to the iproto protocolreference[location] Tarantool manual, Reference partserver[area] Task relates to Tarantool's server (core) functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions