Copyright © 2020 Polymath Inc. All Rights Reserved.
No part of this manual, including the products and software described in it, may be reproduced, transmitted or transcribed to a third-party, or translated into any language in any form or by any means without the express written permission of Polymath Inc. (“Polymath”).
THIS MANUAL IS PROVIDED “AS-IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OR CONDITIONS OF COMPLETENESS, ACCURACY, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL POLYMATH, ITS AFFILIATES OR ANY OF THEIR DIRECTORS, OFFICERS, EMPLOYEES OR AGENTS BE LIABLE FOR ANY INDIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES (INCLUDING DAMAGES FOR LOSS OF PROFITS, LOSS OF BUSINESS, LOSS OF USE OR DATA, INTERRUPTION OF BUSINESS AND THE LIKE), EVEN IF POLYMATH HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES ARISING FROM ANY DEFECT OR ERROR IN THIS MANUAL. POLYMATH ACCEPTS NO LIABILITY AND SHALL NOT BE LIABLE FOR ANY DAMAGES, DIRECT OR INDIRECT, RESULTING FROM YOUR USE OF THIS MANUAL. IN THE EVENT POLYMATH, ITS AFFILIATES, LICENSORS OR SUPPLIERS IS FOUND LIABLE, SUCH LIABILITY SHALL BE LIMITED TO 10 BARBADOS DOLLARS AND THE PAYMENT OF SUCH AMOUNT TO YOU SHALL BE YOUR EXCLUSIVE REMEDY.
Specifications and information contained in this manual are furnished for informational use only, and are subject to change without notice, and should not be construed as advice by Polymath. Recipient must obtain their own professional or specialist advice before taking, or refraining from, any action on the basis of the information contained in this manual.
Polymath assumes no responsibility or liability for any errors or inaccuracies that may appear in this manual and gives no undertaking, and is under no obligation, to update this document if any errors or inaccuracies become apparent. The information in this document has not been independently verified.
- Polymesh Operator Guide
- About
- Table of Contents
- Introduction
- Governance
- Becoming an Operator
- Key Management
- Network Architecture
- High Availability
- Getting the Polymesh Node Software
- Node Resource Requirements
- Securing the Instances
- Upgrading or Replacing a Node
- Backing Up a Node
- Auto Restarting Nodes
- Common Parameters for Running a Polymesh Node
- Running an Operator Node
- Getting the Identity of a Node
- Metrics and Monitoring
- Bonding POLYX
- Setting Session Keys
- Activating your Operator Node
- Stop Being an Operator
- Glossary
Operators perform critical functions for the network, and as such, have strict uptime requirements. This document contains information about the recommended setup and maintenance of a Polymesh operator node. The intended audience for this document is the operator’s IT team, however, some business considerations were included for completeness and to provide the operator’s IT team with the necessary context.
Polymesh is a permissioned network meaning potential operators must go through a governance process in order to be permissioned to work with the Polymesh network. The governance process is on-chain and managed via the Polymesh Improvement Proposal (PIP) mechanism.
To become an operator on Polymesh, you may need to bond (lock) POLYX in the system. This facilitates the economic incentives that the security of Polymesh relies on. The account that stores your bonded POLYX is called your Stash account and the account that decides what to do with the bonded POLYX is called your Controller account. Rewards that are generated for running an operator node can be paid to the Stash account or another specified account.
You do not need to bond all of the POLYX in your Stash account and you can always bond more POLYX later. However, withdrawing any bonded POLYX requires to wait for the duration of the unbonding period, which is currently 28 days.
NB - Please note that operators should use Ledger Nano X's, not Ledger Nano S's.
The Nano S doesn't support setting session keys so is not suitable for operators.
There are three main types of keys that an operator must manage:
- Session keys
- Controller key
- Stash key
The session keys are the only type of keys that the operator node needs access to. The other two keys should be kept securely in a supported hardware wallet.
The session keys are the keys that an operator node uses to sign data needed for consensus. These keys are stored on the operator node itself. Session keys don’t hold any funds but they can be used to perform actions that will result in a penalty, like double signing. Hence, it is important to keep these keys secure.
These keys can either be generated offline and injected in the operator node or can be generated within the operator node by calling the appropriate RPC method. Once generated the session keys should be persisted.
In the future, Polymesh will support signing payloads outside the client so that keys can be stored on another device, e.g. a hardware security module (HSM) or a secure enclave. For the time being, however, session keys must be either stored within the client or be mounted from secure storage via external methods.
The controller key is used to manage bonded funds, vote with bonded funds and do similar actions on chain. This key is not directly needed by the operator node and hence must never be shared with the operator node. It should be a multisig account or a supported hardware wallet. These keys can hold funds and directly control funds bonded by the operator and therefore these should be kept very securely. Consider these keys to be a semi-cold wallet.
This is the account which holds the POLYX that has been bonded and optionally where the operator rewards are sent. This should be a cold wallet, never attached to the operator node.
The recommended secure operator setup for Testnet / Mainnet consists of the following:
- An active operator node with configured session keys
- A warm spare operator node configured like an operator node but without session keys
A minimum recommended Testnet setup would include just a single operator node.
The --reserved-only
flag in conjunction with the --reserved-nodes
parameter lets you set up a whitelist of nodes that the node may peer with.
To operate properly your Polymesh nodes should have at least the following traffic whitelisted:
- All nodes:
- NTP egress: System clock drift can cause a node to fail to produce blocks due to mismatched timestamps between the node and the network. Ensure your nodes are synchronised with a reliable NTP server.
- Port 443 egress (HTTPS) (optional but recommended): Used to send basic telemetry to Polymath servers.
- Operator nodes:
- Libp2p ingress/egress: Operator nodes should be able to send and receive p2p events from WAN or a trusted set of other nodes that do have WAN connectivity.
The network is resilient to temporary outages of some of its operator nodes. Any one operator node experience a few minutes of downtime for upgrades, but should not have frequent or extended downtime lest they risk getting slashed from the network.
It is imperative that only one operator node is active with the same session keys. If multiple operator nodes with the same session keys do end up online at the same time then they will end up signing conflicting blocks and will thus get penalised for equivocation. We recommend that you do not configure automatic failover and instead maintain only a warm spare that is failed over in a supervised manner.
There are two possible failover methods:
- Shared session key
- Unique session key
With the shared session key method the operator node session keys are added to the warm spare in case of a primary operator node failure. In this case the primary node must not come back online. The penalty for equivocation is much higher than the penalty for being offline.
The unique session key method uses different session keys for different instances of operator nodes. If the primary operator node goes down for some reason, the controller will need to change the active session keys on the blockchain for the secondary node to become active. Since a key change takes effect only in the next session, you may still get penalised for being offline for one session if your primary node went down without producing any blocks in that session. However this approach eliminates the risk of equivocation penalties.
It is not recommended that you store your controller keys on a server for the automated signing of the key change transaction. However, you can pre-sign an immortal transaction (a transaction without a timeout) and store the signed transaction on a server that will broadcast it if the primary node goes down. Please see Upgrading or Replacing a Node for more details.
All Polymesh nodes use the same binary and only differ in the parameters used to run them.
There are a number of ways to get and deploy the node binary:
- Fetch the prebuilt container image from the Polymath Docker Hub repository.
There are two flavours available:
debian
anddistroless
. The latter has no shell and thus provides a reduced attack surface, whereas the former's shell helps with debugging during the initial setup. The images are tagged with<flavour>
and<flavour>-<version>
. We recommend using the latter for deterministic versioning, but the former can be used if you set your image pull policy to always pull. We have also published sample docker-compose files. The two release flavours (debian
anddistroless
) are interchangeable in terms of operation - a setup running thedebian
flavour can be changed to use thedistroless
flavour by only changing the container tag and vice-versa. - Fetch the precompiled binary from our GitHub releases page. In addition
to the release source code files we publish four files: The Polymesh binary and its checksum (identified by the
-linux-amd64
suffix indicating the CPU platform it is compiled for) and an archive of Polymesh runtimes and its checksum. You do not need the runtime files as they are already included in the binary. - Build your own binary from the release branch of our source code
The following resources should be allocated to each Polymesh node:
Resource | Minimum Value | Recommended Value |
---|---|---|
CPU | 2 CPU | 4 CPU |
RAM | 8 GB | 8+ GB |
Storage | 80 GB SSD | 100+ GB low latency SSD (e.g. local NVMe) |
The storage requirements will increase over time as the blockchain grows. Sufficient spare storage (or expandable volumes) and adequate monitoring measures should be put in place to ensure continued operations of the node. A long-running node will keep a large amount of write-ahead logs (WAL) in the database directory. These logs are compacted on node restart. It is recommended that you reserve an additional 40GB of disk space for the WAL.
It is not recommended that more than one node share the same resources, i.e. it is preferrable to run two 2 CPU/8 GB RAM instances with one Polymesh node each than running one 4 CPU/16 GB RAM instance with two Polymesh nodes.
Best practices for securing your instances should be followed at all times. These include (but are not limited to):
- Disabling password-based SSH access
- Setting up and enabling a network firewall
- Only opening ports that are needed
- Disabling unnecessary services
- Not using the root user and disabling root login
- Keeping your system up to date
- Turning on SELinux
- Monitoring logs and metrics for signs of malicious activity
- Running periodic CIS benchmarks against your systems
Be advised: due to the constantly changing landscape of cybersecurity the above list is not and cannot be comprehensive. Node operators are responsible that the security of their nodes is up to date with current best practices.
The recommended upgrade process for operator nodes is to perform a failover to the warm spare operator node. As mentioned in the High Availability section the unique key approach is preferable to the shared-key approach.
The warm spare operator node should be upgraded first. Since this node is not actively validating you can simply stop the Polymesh client running on it, perform the necessary upgrade, and then resume operation.
Once your warm spare operator node is upgraded and fully synchronised, you should make it the active node by submitting the change on the blockchain using your controller account.
To do so:
- (If not done already) Generate a new set of session keys for the warm spare operator node
- Go to Staking > Account Actions
- Click on "Set Session Key" against your bonding account
- Enter the session keys from the warm node in the field and click on "Set Session Key"
See Running an Operator Nodes for instructions on using the
author_rotateKeys
RPC method to generate node session keys.
The change in operator session keys only applies in the next session. For safety, we recommend
that you wait at least 2 sessions before continuing. In other words, if the current session is N
, you
should wait until session N + 2
before proceeding with the steps below.
At this point your warm spare and active operator nodes have switched roles: The previous warm spare is now the active operator node and vice-versa. Be sure to treat them accordingly henceforth. Alternatively you may perform the failover operation again to restore your original active node as the current active node and the original warm spare as the current warm spare.
On Testnet you may perform an in-place upgrade if you do not have a warm spare. We do not recommend in-place upgrades for Mainnet due to the risk of penalisation due to downtime in the case of a failed upgrade.
Since Polymesh is a public blockchain, you do not necessarily need to backup your nodes. You can always synchronize from scratch.
It takes quite a bit of time to synchronize a node from scratch. To minimise the time between node
creation and node readiness may choose to back up the full blockchain DB regularly. This process
does not need to be done on every node - a database backup performed on one node may be used
on another node as long as they have the same setting for the --pruning
parameter. Since
operator nodes run with an implicit --pruning archive
setting we recommend that you make that
parameter explicit on all nodes so that they can share a single database backup.
Backing up the database should be done on an offline node. A typical approach to do this would be:
- Stop the polymesh process on the backup node
- Snapshot the database directory
- Restart the polymesh process
- Sync the database snapshot to offsite storage
The database snapshot contains no confidential information as long as only the
/<base path>/chains/mainnet/db
directory is backed up.
Because of the nature of how the database is stored in files, stopping/starting the Polymesh process will create partial database files. Since an excessive amount of files in a directory can cause performance issues we recommend to either limit snapshots to a daily frequency or to periodically reset the backup node's database to a fresh sync from the chain.
All your nodes should automatically restart in the case of an intermittent failure.
For container-based nodes use your container runtime's features: restart_policy.condition: any
for docker-compose
, restartPolicy: Always
for kubernetes
, etc.
If running the node as a binary we recommend using a supervisor process to ensure that the
binary is restarted if terminated abnormally. Most contemporary Linux distributions use
systemd
for this purpose, so we will focus on that, but you are not limited to using it
if your infrastructure uses a different supervisor process.
To get started, create a new systemd unit file called polymesh.service
in the
/etc/systemd/system/
directory. The following content should be in this unit file
[Unit]
Description=Polymesh Node
[Service]
ExecStart=<path to polymesh binary> <polymesh parameters>
Restart=always
MemoryLimit=<2/3 the available system RAM, e.g. ~6GB for a system with 8GB RAM>
[Install]
WantedBy=multi-user.target
To enable this service to automatically start on bootup run
systemctl daemon-reload && systemctl enable polymesh.service
You can also start
, stop
, restart
, and check the status
of the service with the respective systemctl
commands, e.g.
systemctl start polymesh.service
The journalctl
command can be used to read the systemd unit logs:
journalctl -u polymesh
See the man pages for journalctl
for more details on how to use that command.
To run a polymesh node we recommend that you make use of the following options:
--name <name>
(optional but recommended): Human-readable name of the nodes that is reported to the telemetry services--pruning archive
: Ensure that the node maintains a full copy of the blockchain--chain mainnet
: Run a mainnet node. If this parameter is excluded, the default is to connect to the testnet network--wasm-execution compiled
: Use compiled wasm to improve performance--base-path <path>
(optional): Specify where Polymesh will look for its DB files and keystore--db-cache <cache size in MiB>
(optional): Improve the performance of the polymesh process by increasing its in-memory cache above the default128
MiB. On a node with 8GB RAM available a reasonable value is in the ballpark of4096
.
Note - the <name>
parameter above will be publicly visible when sending telemetry to Polymath's servers is enabled (on by default).
To run an operator node you will need to use the following options in additon to the common parameters:
--operator
: Enable the operator flag on the node.
If you wish to connect to just a trusted set of other nodes, you can use the below flags to control this:
--reserved-only
: Only connect to reserved peers.--reserved-nodes
: This parameter takes a space separated list of libp2p peer addresses in the form of/ip4/<PEER_IP_ADDRESS>/tcp/30333/p2p/<PEER_NODE_IDENTITY>
or/dns4/<PEER_RESOLVABLE_HOSTNAME>/tcp/30333/p2p/<PEER_NODE_IDENTITY>
to which the operator node will connect. If left out then the peers must be provided via thesystem_addReservedPeer
RPC method. Failure to provide peers via either this parameter or the RPC method will cause the operator node to remain disconnected from the chain.
Next we will generate the node's session keys.
The author_rotateKeys
method can be called against a running operator node to generate session keys.
$ curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "author_rotateKeys", "params":[]}' http://localhost:9933 | jq -r .result
You will get an output similar to:
0x2bd908203ae740b513f5907fdcc2e29a6bd2835618da917c03d2cfe65d96745\
b54d59fe4dc5a106c130be0e677596eb023164c314d6fb5cc62ead1bcaee6a443\
fe5df859fc1de372580abaa98a22fee962bcff580bf57138adc12955aa698a5fa\
a923978d9c16014205af96da9d2e213083aefcb53982927a2756ffa83d81658
Take note of this string: it contains the public portion of your session keys. The private
keys are stored in a keystore on your operator server in the /<base path>/chains/mainnet/keystore/
directory. The filenames of those keys are the public key portion of the respective session key, and
the contents of the files are the private key portion.
Please wait before activating your operator node until all your nodes are be fully synced with the chain and make sure that everything is production ready.
Our official container images contain a small binary to rotate the session keys without requiring the installation
of curl either in the container itself or in a sidecar. This binary is located in /usr/local/bin/rotate
and
when executed will produce a newline-terminated string containing the public session keys used for bonding.
There are two simple methods for getting the public identity of a node:
- From the operator node logs
- Via an RPC call
To get the node identity from the operator node logs start the node process and wait until the line containing the string Local node identity
is printed:
2020-03-02 11:19:20 Polymesh Node
2020-03-02 11:19:20 version 2.0.0-a8676cab-x86_64-linux-gnu
2020-03-02 11:19:20 by Polymath, 2018-2020
2020-03-02 11:19:20 Chain specification: Local Testnet
2020-03-02 11:19:20 Node name: dirty-vase-9822
2020-03-02 11:19:20 Roles: AUTHORITY
2020-03-02 11:19:20 Local node identity is: 12D3KoovCz7QpYsHMug7XLZynqKcueKVWWoTxFqBCRQ487YSrrDG
2020-03-02 11:19:20 Starting BABE Authorship worker
2020-03-02 11:19:20 Grafana data source server started at 127.0.0.1:9955
...
The above sample log tells us that that node's identity is 12D3KoovCz7QpYsHMug7XLZynqKcueKVWWoTxFqBCRQ487YSrrDG
-
your node's identity will be different. Please save this for later and terminate the operator node process.
To get the node identity via RPC call the system_localPeerId
method and read the result
value:
$ curl -s -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "system_localPeerId", "params":[]}' http://localhost:9933 | jq -r .result
12D3KoovCz7QpYsHMug7XLZynqKcueKVWWoTxFqBCRQ487YSrrDG
$
The recommended approach to getting metrics from the Polymesh node is via its built-in prometheus exporter endpoint. This endpoint can be scraped with a prometheus-compatible server or agent.
By default the prometheus exporter will
bind to localhost
on port 9615
. You can expose the exporter port to additional interfaces with the
--prometheus-external
flag to enable network based scraping or use a local agent such as telegraf
,
grafana-cloud-agent
, or victoria-metrics-agent
to collect the metrics and push them to the prometheus server.
The basic health of a node can be assessed by monitoring the following metrics:
Metric | Used for | Operational Range | Additional Notes |
---|---|---|---|
polymesh_block_height{status="finalized"} |
Finalised block number | +/- 3 from rest of the network | The block number for the rest of the network should be fetched from an external source. |
polymesh_block_height{status="best"} |
Best block time | 6s +/- 2s | The block time is the difference between the best block timestamps. The ideal mean time is 6 seconds, but some jitter (less than 2s) is acceptable due to network latency |
polymesh_ready_transactions_number |
Transactions in ready queue | 0-150 | A healthy node should have zero or near-zero transactions in its ready queue. A ready queue with a growing number of transactions can be an idicator of excessive node latency |
polymesh_sub_libp2p_peers_count |
Number of peers | Number of peers | Operator nodes that are well connected with other operator nodes will experience lower on-chain lag compared to operator nodes with a limited set of operator peers. Operator nodes should be connected to all other operator nodes with a maximum of three hops (two intermediate nodes between operators). |
We have published a Grafana dashboard to monitor the metrics exposed by the Polymesh node via its Prometheus exporter. You may download it here. In order to use this dashboard you will need to scrape the metrics from the Prometheus exporter and collect them in a Prometheus server to which Grafana will connect.
In addition to the Polymesh metrics you should also monitor basic node metrics available from generic node exporters or monitoring agents:
Metric | Operational Range | Additional Notes |
---|---|---|
Free disk space | 30 GB+ or > 20% volume capacity | There should always be some free disk space for the Polymesh node to consume. |
Free RAM | 1 GB+ | Spikes in RAM usage are acceptable but on average, there should be at least 1 GB of free RAM available on the system for the node to consume. |
CPU usage | 5-50% (overall) | This is the overall CPU usage and not per core usage. Occasional spikes above 50% are acceptable but more cores should be added if the CPU usage continuously stays above 50%. |
Network connectivity | >0.1 mbps bandwidth | Nodes should be online and reachable at all times. If they are being DDoS’d and can not respond to queries, new nodes should be deployed, or the operators connectivity limited to trusted nodes. |
You should ensure that your Polymesh nodes have synced with the chain and are healthy before proceeding with this section. Failure to do so may result in operator penalties.
To become an operator on Polymesh, you need to bond (lock) some POLYX in the system. The account that stores your bonded funds is called the stash account and the account that decides what to do with the bonded funds is called the controller account.
It is highly recommended that you make your controller and stash accounts be two separate accounts. For this, you will create two accounts and make sure each of them has at least enough funds to pay the fees for making transactions. Keep most of your funds in the stash account since it is meant to be the custodian of your staking funds.
For Testnet you can use the same account for the Stash account and the Controller account.
To bond your funds,
- Go to Staking section
- Click on "Account Actions"
- Click on the "+”Stash” button
- Stash account: Select your Stash account. In this example, we will bond 100 milli POLYX - make sure that your Stash account contains at least this much. You can, of course, stake more than this.
- Controller account: Select the Controller account created earlier. This account will also need a small amount of POLYX in order to start and stop validating.
- Value bonded: How much POLYX from the Stash account you want to bond/stake. You do not need to bond all of the POLYX in that account. Also, note that you can always bond more POLYX later. However, withdrawing any bonded amount requires to wait for the duration of the unbonding period.
- Payment destination: The account where the rewards from validating are sent.
Once everything is filled in properly, click Bond and sign the transaction with your Stash account.
After a few seconds, you should see an
ExtrinsicSuccess
message. You should now see a new card with all your accounts (note: you may need to refresh the screen). The bonded amount on the right corresponds to the funds bonded by the Stash account.
You need to tell the Polymesh blockchain what your session keys are. This is what associates your operator with your Controller account. If you ever want to switch your operator node, you just need to change your active session keys to the new session keys and wait for the change to become active in the next session.
Remember the session keys we generated while setting up the operator node? It’s now time to use those keys.
To set your Session Keys,
- Go to Staking section
- Click on "Account Actions"
- Click on the "Session Key" button on the bonding account you generated earlier
- Enter the result of
author_rotateKeys
that we saved earlier in the field and click "Set Session Key" - Submit this extrinsic and you are now ready to start validating
Before moving forward, please make sure that everything is set up properly via the telemetry we set up earlier. Once this step is complete, an improper setup may lead to penalties.
If everything looks good, go ahead and click on "Validate" in the UI.
Enter the reward commission percentage and click on Validate.
Congratulations! Your operator has been added in the queue and will become active in the next session.
To stop being an operator on the Polymesh chain,
- Go to Staking > Account Actions
- Click on "Stop Validating" against your bonding account
You will be removed from the operator set in the next session. You can then safely terminate all your operator nodes. failure to terminate safely (e.g. by terminating before the next session) may result in penalties.
Term | Definition |
---|---|
Controller key | Key used to manage bonded funds, vote with bonded funds and do similar actions on chain. |
Equivocation | When an operator node commits to two or more conflicting states. |
Era | An Era is a whole number of sessions. It is the period over which operator and nominator sets are calculated, and rewards paid out. |
Immortal transaction | A transaction that is valid at any time. |
Operator node | Operator nodes are permissioned network participants responsible for producing new blocks and finalising the block chain. |
Sentry node | Sentry nodes are full archive nodes which operator nodes use as a proxy to the wider network, limiting the operator nodes exposure to the public internet and providing data redundancy. |
Session | A session is a period of time that has a constant set of operators. Operators can only join or exit the operator set at a session change. |
Session keys | Keys that an operator node uses to sign data needed for consensus. |
Stash key | Account where the operator rewards are sent. |
Warm spare node | A node that is online and synced but not configured to be an operator. A warm spare requires manual intervention to become an active operator. |