Skip to content

Add TCM failover docs #4485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/tooling/tcm/tcm_cluster_management/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ to learn how to perform various management operations on Tarantool clusters from
tcm_cluster_state
tcm_cluster_config
tcm_cluster_users
tcm_cluster_metrics
tcm_cluster_metrics
tcm_supervised_failover
72 changes: 72 additions & 0 deletions doc/tooling/tcm/tcm_cluster_management/tcm_supervised_failover.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
.. _tcm_supervised_failover:

Using supervised failover
=========================

.. include:: ../index.rst
:start-after: ee_note_tcm_start
:end-before: ee_note_tcm_end

For Tarantool clusters that use :ref:`supervised failover <repl_supervised_failover>`,
|tcm_full_name| offers tools for interaction with external failover coordinators from its web interface.

The tools for using supervised failover are located on the **Failovers** page
available from the **Actions** menu on the cluster stateboard.

.. note::

|tcm| can interact with failover coordinators that are already running.
There is no way to start or stop coordinators from |tcm|.

.. _tcm_supervised_failover_view:

Viewing failover coordinators
-----------------------------

To view failover coordinators running on the cluster, go to the **Failovers** tab.
On this tab, you can see the information about all Tarantool instances that the cluster
uses as failover coordinators. The information includes:

- Current coordinator status -- ``Active`` or ``Not active``
- ``PID`` -- process ID
- ``Hostname`` -- the host on which the coordinator is running
- ``UUID`` -- the coordinator ID
- ``Term`` -- a value that defines the order in which coordinators become active
(take the lock) over time.


.. _tcm_supervised_failover_commands:

Executing failover commands
---------------------------

To send a failover command to a coordinator, go to the **Commands** tab and click **Add**.
Then, provide the command description in the YAML format. It can include the following
fields:

- ``command`` -- the command name. Possible value: ``switch`` -- switch master
in a replica set.
- ``new_master`` -- the name of the instance to make the new master.
- ``timeout`` -- the command execution timeout.

Example:

.. code-block:: yaml

command: switch
new_master: instance-002
timeout: 30

After entering the command, click **Save** to send the command for execution.

Tarantool assigns an id to the command and waits for the active coordinator to process the command.

All failover commands executed on the cluster are shown on the **Commands** tab with
their ids and statuses. A command can have the following statuses:

- ``taken`` -- a failover coordinator has started the command execution.
- ``success`` -- the command has completed successfully.
- ``failed`` -- an error occurred during the command execution.
A short error description is shown in the **Reason** field.

To see the command execution details, click this command in the list.
Loading