Skip to content

Commit 1409c55

Browse files
authored
Document graceful shutdown of net.box connections (#3100)
Fixes #2633 * Add ``box.shutdown`` event, ``box_ctl-on_shutdown_timeout`` function, and net.box method * Add diagram * Update .po files
1 parent a18e639 commit 1409c55

File tree

10 files changed

+161
-129
lines changed

10 files changed

+161
-129
lines changed

doc/dev_guide/internals/box_protocol.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -967,6 +967,8 @@ See the :ref:`Watchers <box-protocol-watchers>` section below.
967967
Watchers
968968
--------
969969

970+
Since :doc:`2.10.0 </release/2.10.0>`.
971+
970972
The commands below support asynchronous server-client notifications signalled
971973
with :ref:`box.broadcast() <box-broadcast>`.
972974
Servers that support the new feature set the ``IPROTO_FEATURE_WATCHERS`` feature in reply to the ``IPROTO_ID`` command.
@@ -1073,6 +1075,48 @@ The body is a 2-item map:
10731075
``IPROTO_EVENT_DATA`` (code 0x57) contains data sent to a remote watcher.
10741076
The parameter is optional, the default value is ``nil``.
10751077

1078+
.. _box-protocol-shutdown:
1079+
1080+
Graceful shutdown protocol
1081+
--------------------------
1082+
1083+
Since :doc:`2.10.0 </release/2.10.0>`.
1084+
1085+
The graceful shutdown protocol is a mechanism that helps to prevent data loss in requests in case of a shutdown command.
1086+
According to the protocol, when a server receives an ``os.exit()`` command or a ``SIGTERM`` signal,
1087+
it does not exit immediately.
1088+
Instead of that, first, the server stops listening for new connections.
1089+
Then, the server sends the shutdown packets to all connections that support the graceful shutdown protocol.
1090+
When a client is notified about the upcoming server exit, it stops serving any new requests and
1091+
waits for active requests to complete before closing the connections.
1092+
Once all connections are terminated, the server will be shut down.
1093+
1094+
The protocol uses the event subscription system.
1095+
That is, the feature is available if the server supports the :ref:`box.shutdown <system-events_box-shutdown>` event
1096+
and ``IPROTO_WATCH``.
1097+
For more information about it, see :ref:`reference for the event watchers <box-watchers>`
1098+
and the :ref:`corresponding section <box-protocol-watchers>` of this document.
1099+
1100+
The shutdown protocol works in the following way:
1101+
1102+
#. First, the server receives a shutdown request.
1103+
It can be either an ``os.exit()`` command or a :ref:`SIGTERM <admin-server_signals>` signal.
1104+
1105+
#. Then the :ref:`box.shutdown <system-events_box-shutdown>` event is generated.
1106+
The server broadcasts it to all subscribed remote watchers (see :ref:`IPROTO_WATCH <box_protocol-watch>`).
1107+
That is, the server calls :ref:`box.broadcast('box.shutdown', true) <box-broadcast>`
1108+
from the :ref:`box.ctl.on_shutdown() <box_ctl-on_shutdown>` trigger callback.
1109+
Once this is done, the server stops listening for new connections.
1110+
1111+
#. From now on, the server waits until all subscribed connections are terminated.
1112+
1113+
#. At the same time, the client gets the ``box.shutdown`` event and shuts the connection down gracefully.
1114+
1115+
#. After all connections are closed, the server will be stopped.
1116+
Otherwise, a timeout occurs, and the Tarantool exits immediately.
1117+
You can set up the required timeout with the
1118+
:ref:`set_on_shutdown_timeout() <box_ctl-on_shutdown_timeout>` function.
1119+
10761120
.. _box_protocol-responses:
10771121

10781122
Responses if no error and no SQL

doc/reference/reference_lua/box_ctl.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ Below is a list of all ``box.ctl`` functions.
4444
* - :doc:`./box_ctl/on_shutdown`
4545
- Create a "shutdown trigger"
4646

47+
* - :doc:`./box_ctl/set_on_shutdown_timeout`
48+
- Set a timeout in seconds for the ``on_shutdown`` trigger
49+
4750
* - :doc:`./box_ctl/is_recovery_finished`
4851
- Check if recovery has finished
4952

@@ -57,5 +60,6 @@ Below is a list of all ``box.ctl`` functions.
5760
box_ctl/wait_rw
5861
box_ctl/on_schema_init
5962
box_ctl/on_shutdown
63+
box_ctl/set_on_shutdown_timeout
6064
box_ctl/is_recovery_finished
6165
box_ctl/promote

doc/reference/reference_lua/box_ctl/on_shutdown.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
1-
.. _box_ctl-on_shutdown:
1+
.. _box_ctl-on_shutdown:
22

33
===============================================================================
44
box.ctl.on_shutdown()
55
===============================================================================
66

7-
.. module:: box.ctl
7+
.. module:: box.ctl
88

99
The ``box.ctl`` submodule also contains two functions for the two
1010
:ref:`server trigger <triggers>` definitions: ``on_shutdown`` and ``on_schema_init``.
1111
Please, familiarize yourself with the mechanism of trigger functions before using them.
12+
Details about trigger characteristics are in the :ref:`triggers <triggers-box_triggers>` section.
1213

13-
.. function:: on_shutdown(trigger-function [, old-trigger-function])
14+
.. function:: on_shutdown(trigger-function [, old-trigger-function])
1415

1516
Create a "shutdown :ref:`trigger <triggers>`".
1617
The ``trigger-function`` will be executed
@@ -29,5 +30,6 @@ Please, familiarize yourself with the mechanism of trigger functions before usin
2930
If the parameters are (nil, old-trigger-function), then the old
3031
trigger is deleted.
3132

32-
Details about trigger characteristics are in the :ref:`triggers <triggers-box_triggers>` section.
33+
If you want to set a timeout for this trigger,
34+
use the :ref:`set_on_shutdown_timeout <box_ctl-on_shutdown_timeout>` function.
3335

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.. _box_ctl-on_shutdown_timeout:
2+
3+
===============================================================================
4+
box.ctl.set_on_shutdown_timeout()
5+
===============================================================================
6+
7+
.. module:: box.ctl
8+
9+
.. function:: set_on_shutdown_timeout([timeout])
10+
11+
Set a timeout for the :ref:`on_shutdown <box_ctl-on_shutdown>` trigger.
12+
If the timeout has expired, the server stops immediately
13+
regardless of whether any ``on_shutdown`` triggers are left unexecuted.
14+
15+
:param double timeout: time to wait for the trigger to be completed. The default value is 3 seconds.
16+
17+
:return: nil
18+

doc/reference/reference_lua/box_events.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
Event watchers
44
==============
55

6+
Since :doc:`2.10.0 </release/2.10.0>`.
7+
68
The ``box`` module contains some features related to event subscriptions, also known as :term:`watchers <watcher>`.
79
The subscriptions are used to inform the client about server-side :term:`events <event>`.
810
Each event subscription is defined by a certain key.

doc/reference/reference_lua/box_events/system_events.rst

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
System events
44
=============
55

6+
Since :doc:`2.10.0 </release/2.10.0>`.
7+
68
Predefined events have a special naming schema -- theirs names always start with the reserved ``box.`` prefix.
79
It means that you cannot create new events with it.
810

@@ -12,6 +14,7 @@ The system processes the following events:
1214
* ``box.status``
1315
* ``box.election``
1416
* ``box.schema``
17+
* ``box.shutdown``
1518

1619
In response to each event, the server sends back certain ``IPROTO`` fields.
1720

@@ -26,7 +29,7 @@ This triggers the ``box.info`` event, which states that the value of ``box.info.
2629
while ``box.info.uuid`` and ``box.info.cluster.uuid`` remain the same.
2730

2831
box.id
29-
~~~~~~
32+
------
3033

3134
Contains :ref:`identification <box_info_info>` of the instance.
3235
Value changes are rare.
@@ -50,7 +53,7 @@ Value changes are rare.
5053
}
5154
5255
box.status
53-
~~~~~~~~~~
56+
----------
5457

5558
Contains generic information about the instance status.
5659

@@ -67,7 +70,7 @@ Contains generic information about the instance status.
6770
}
6871
6972
box.election
70-
~~~~~~~~~~~~
73+
------------
7174

7275
Contains fields of :doc:`box.info.election </reference/reference_lua/box_info/election>`
7376
that are necessary to find out the most recent writable leader.
@@ -87,7 +90,7 @@ that are necessary to find out the most recent writable leader.
8790
}
8891
8992
box.schema
90-
~~~~~~~~~~
93+
----------
9194

9295
Contains schema-related data.
9396

@@ -99,6 +102,22 @@ Contains schema-related data.
99102
MP_STR “version”: MP_UINT schema_version,
100103
}
101104
105+
.. _system-events_box-shutdown:
106+
107+
box.shutdown
108+
------------
109+
110+
Contains a boolean value which indicates whether there is an active shutdown request.
111+
112+
The event is generated when the server receives a shutdown request (``os.exit()`` command or
113+
:ref:`SIGTERM <admin-server_signals>` signal).
114+
115+
The ``box.shutdown`` event is applied for the graceful shutdown protocol.
116+
It is a feature which is available since :doc:`2.10.0 </release/2.10.0>`.
117+
This protocol is supposed to be used with connectors to signal a client about the upcoming server shutdown and
118+
close active connections without broken requests.
119+
For more information, refer to the :ref:`graceful shutdown protocol <box-protocol-shutdown>` section.
120+
102121
Usage example
103122
-------------
104123

doc/reference/reference_lua/net_box.rst

Lines changed: 53 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -56,29 +56,37 @@ Most ``net.box`` methods accept the last ``{options}`` argument, which can be:
5656
The default value is ``false``.
5757
For an example, see option description :ref:`below <net_box-return_raw>`.
5858

59+
.. _net_box-state_diagram:
60+
5961
The diagram below shows possible connection states and transitions:
6062

6163
.. ifconfig:: builder not in ('latex', )
6264

63-
.. image:: net_states.svg
65+
.. image:: net_states.png
6466
:align: center
65-
:alt: net_states.svg
67+
:alt: net_states.png
6668

6769
On this diagram:
6870

69-
* ``net_box.connect()`` method spawns a worker fiber, which will establish the connection and start the state machine.
71+
* ``net_box.connect()`` method spawns a worker fiber, which will establish the connection and start the state machine.
72+
73+
* The state machine goes to the ``initial`` state.
7074

71-
* The state machine goes to the ‘initial‘ state.
75+
* Authentication and schema upload.
76+
It is possible later on to re-enter the ``fetch_schema`` state from ``active`` to trigger schema reload.
7277

73-
* Authentication and schema upload.
74-
It is possible later on to re-enter the ‘fetch_schema’ state from ‘active’ to trigger schema reload.
78+
* The state changes to the ``graceful_shutdown`` state when the state machine
79+
receives a :ref:`box.shutdown <system-events_box-shutdown>` event from the remote host
80+
(see :ref:`conn:on_shutdown() <net_box-on_shutdown>`).
81+
Once all pending requests are completed, the state machine switches to the ``error`` (``error_reconnect``) state.
7582

76-
* The transport goes to the ‘error’ state in case of an error.
77-
It can happen, for example, if the server closed the connection.
78-
If the ``reconnect_after`` option is set, instead of the ‘error’ state, the transport goes to the ‘error_reconnect’ state.
83+
* The transport goes to the ``error`` state in case of an error.
84+
It can happen, for example, if the server closed the connection.
85+
If the ``reconnect_after`` option is set, instead of the ‘error’ state,
86+
the transport goes to the ``error_reconnect`` state.
7987

80-
* ``conn.close()`` method sets the state to closed and kills the worker.
81-
If the transport is already in the error state, ``close()`` does nothing.
88+
* ``conn.close()`` method sets the state to ``closed`` and kills the worker.
89+
If the transport is already in the ``error`` state, ``close()`` does nothing.
8290

8391
===============================================================================
8492
Index
@@ -131,7 +139,9 @@ Below is a list of all ``net.box`` functions.
131139
* - :ref:`conn:on_connect() <net_box-on_connect>`
132140
- Define a connect trigger
133141
* - :ref:`conn:on_disconnect() <net_box-on_disconnect>`
134-
- Define a disconnect trigger
142+
- Define a disconnect trigger
143+
* - :ref:`conn:on_shutdown() <net_box-on_shutdown>`
144+
- Define a shutdown trigger
135145
* - :ref:`conn:on_schema_reload() <net_box-on_schema_reload>`
136146
- Define a trigger when schema is modified
137147
* - :ref:`conn:new_stream() <conn-new_stream>`
@@ -820,9 +830,39 @@ With the ``net.box`` module, you can use the following
820830
be replaced by trigger-function
821831
:return: nil or function pointer
822832

833+
.. _net_box-on_shutdown:
834+
835+
.. function:: conn:on_shutdown([trigger-function[, old-trigger-function]])
836+
837+
Define a trigger for shutdown when a :ref:`box.shutdown <system-events_box-shutdown>` event is received.
838+
839+
The trigger starts in a new fiber.
840+
While the ``on_shutdown()`` trigger is running, the connection stays active.
841+
It means that the trigger callback is allowed to send new requests.
842+
843+
After the trigger return, the ``net.box`` connection goes to the ``graceful_shutdown`` state
844+
(check :ref:`the state diagram <net_box-state_diagram>` for details).
845+
In this state, no new requests are allowed.
846+
The connection waits for all pending requests to be completed.
847+
848+
Once all in-progress requests have been processed, the connection is closed.
849+
The state changes to ``error`` or ``error_reconnect``
850+
(if the ``reconnect_after`` option is defined).
851+
852+
Servers that do not support the ``box.shutdown`` event or :ref:`IPROTO_WATCH <box_protocol-watch>`
853+
just close the connection abruptly.
854+
In this case, the ``on_shutdown()`` trigger is not executed.
855+
856+
:param function trigger-function: function which will become the trigger
857+
function. Takes the ``conn``
858+
object as the first argument
859+
:param function old-trigger-function: existing trigger function which will
860+
be replaced by trigger-function
861+
:return: nil or function pointer
862+
823863
.. _net_box-on_schema_reload:
824864

825-
.. function:: conn:on_schema_reload([trigger-function[, old-trigger-function]])
865+
.. function:: conn:on_schema_reload([trigger-function[, old-trigger-function]])
826866

827867
Define a trigger executed when some operation has been performed on the remote
828868
server after schema has been updated. So, if a server request fails due to a
187 KB
Loading

0 commit comments

Comments
 (0)