From d3c4dfed4933d7d2fef338b65f1291f12416dc1d Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 29 Oct 2021 10:55:36 -0500 Subject: [PATCH 1/2] Added missing entries and reorganized rsyslog events/alerts tables --- content/rs/administering/logging/_index.md | 2 +- .../_index.md} | 38 ++------------ .../logging/rsyslog-logging/bdb-events.md | 51 +++++++++++++++++++ .../logging/rsyslog-logging/cluster-events.md | 51 +++++++++++++++++++ .../logging/rsyslog-logging/node-events.md | 36 +++++++++++++ .../logging/rsyslog-logging/user-events.md | 20 ++++++++ 6 files changed, 162 insertions(+), 36 deletions(-) rename content/rs/administering/logging/{rsyslog-logging.md => rsyslog-logging/_index.md} (74%) create mode 100644 content/rs/administering/logging/rsyslog-logging/bdb-events.md create mode 100644 content/rs/administering/logging/rsyslog-logging/cluster-events.md create mode 100644 content/rs/administering/logging/rsyslog-logging/node-events.md create mode 100644 content/rs/administering/logging/rsyslog-logging/user-events.md diff --git a/content/rs/administering/logging/_index.md b/content/rs/administering/logging/_index.md index 65305f452cd..d639c6a16f1 100644 --- a/content/rs/administering/logging/_index.md +++ b/content/rs/administering/logging/_index.md @@ -33,7 +33,7 @@ done, e.g. edited a DB configuration, this is where you could look. - [Redis slow log]({{< relref "/rs/administering/logging/redis-slow-log.md" >}}) -- [rsyslog logging]({{< relref "/rs/administering/logging/rsyslog-logging.md" >}}) +- [rsyslog logging]({{}}) ## Viewing logs in the admin console diff --git a/content/rs/administering/logging/rsyslog-logging.md b/content/rs/administering/logging/rsyslog-logging/_index.md similarity index 74% rename from content/rs/administering/logging/rsyslog-logging.md rename to content/rs/administering/logging/rsyslog-logging/_index.md index 14acdb20f28..5e5d1d4a589 100644 --- a/content/rs/administering/logging/rsyslog-logging.md +++ b/content/rs/administering/logging/rsyslog-logging/_index.md @@ -4,6 +4,9 @@ description: weight: $weight alwaysopen: false categories: ["RS"] +aliases: /rs/administering/logging/rsyslog-logging/ + /rs/administering/logging/rsyslog-logging.md + /rs/administering/logging/rsyslog-logging/_index.md --- This document explains the structure of Redis Enterprise Software log entries that go into `rsyslog` and how to use these log entries to identify events. @@ -309,38 +312,3 @@ false,"time":1434365471,"disk":705667072,"type": this specific event, see full mapping in the Mapping UI events and alerts to log entries section below - -## Mapping UI events and alerts to log entries - -### Cluster and node related events - -| **Event as shown in the UI** | **Event code­name** | **Object type** | **Category** | **Severity** | **Notes** | -|------------|-----------------|------------|-----------------|------------|-----------------| -| Node failed | failed | node | alert | critical | | -| Node joined | node_joined | cluster | event | info | | -| Node removed | node_remove_completed
node_remove_failed
node_remove_abort_completed
node_remove_abort_failed | cluster | event | info
error
info
error | The remove node is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | -| Node memory has reached % of its capacity | memory | node | alert | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -| Persistent storage has reached % of its capacity | persistent_storage | node | alert | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -| Ephemeral storage has reached | ephemeral_storage | node | alert | true: warning
false: info | Has global_threshold parameter in the % of its capacity key/value section of the log entry. | -| CPU utilization has reached % | cpu_utilization | node | alert | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -| Network throughput has reached MB/s | net_throughput | node | alert | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | -| Node has insufficient disk space for AOF rewrite | insufficient_disk_aofrw | node | alert | true: error
false: info | | -| Redis performance is degraded as result of disk I/O limits | aof_slow_disk_io | node | alert | true: error
false: info | | -| Cluster capacity is less than total memory allocated to its databases | ram_overcommit | cluster | alert | true: error
false: info | | -| Nodes rebalanced | rebalance_failed
rebalance_completed
rebalance_abort_failed
rebalance_abort_completed | cluster | event | error
info
error
info | The nodes rebalance is a process that can fail and can also be aborted. If aborted, the abort can succeed or fail. | -| Database replication requires at least two nodes in cluster | too_few_nodes_for_replication | cluster | alert | true: warning
false: info | | -| True high availability requires an odd number of nodes with a minimum of three nodes | even_node_count | cluster | alert | true: warning
false: info | | -| Not all nodes in the cluster are running the same Redis Enterprise Cluster version | inconsistent_rl_sw | cluster | alert | true: warning
false: info | -| Not all databases are running the same open source version | inconsistent_redis_sw | cluster | alert | true: warning
false: info | | - -### Database related events - -| **Event as shown in the UI** | **Event code-­name** | **Object type** | **Category** | **Severity** | **Notes** | -|------------|-----------------|------------|-----------------|------------|-----------------| -| Dataset size has reached % of the memory limit | size | bdb | alert | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| Throughput is higher than RPS (requests per second) | high_throughput | bdb | alert | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| Throughput is lower than RPS (requests per second) | low_throughput | bdb | alert | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| Latency is higher than msec | high_latency | bdb | alert | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | -| Periodic backup has been delayed for longer than minutes | backup_delayed | bdb | alert | true: warning
false: info | Has threshold parameter in the data: section of the log entry. | -| Replica Of ­database unable to sync with source | syncer_connection_error
syncer_general_error | bdb | alert | error
error | -| Replica Of sync lag is higher than seconds | high_syncer_lag | bdb | alert | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. | diff --git a/content/rs/administering/logging/rsyslog-logging/bdb-events.md b/content/rs/administering/logging/rsyslog-logging/bdb-events.md new file mode 100644 index 00000000000..5034b3175ac --- /dev/null +++ b/content/rs/administering/logging/rsyslog-logging/bdb-events.md @@ -0,0 +1,51 @@ +--- +Title: Logged database alerts and events +linkTitle: Database alerts/events +description: Logged database alerts and events +weight: 50 +alwaysopen: false +categories: ["RS"] +--- + +The following database (BDB) alerts and events can appear in `syslog`. + +## UI alerts + +Logged alerts that appear in the UI + +| Alert code name | Alert as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +backup_delayed | Periodic backup has been delayed for longer than minutes | true: warning
false: info | Has threshold parameter in the data section of the log entry. +high_latency | Latency is higher than msec | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +high_syncer_lag | Replica of - sync lag is higher than seconds | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +high_throughput | Throughput is higher than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +low_throughput | Throughput is lower than RPS (requests per second) | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +ram_dataset_overhead | RAM Dataset overhead in a shard has reached % of its RAM limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +ram_values | Percent of values in a shard’s RAM is lower than % of its key count | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +shard_num_ram_values | Number of values in a shard’s RAM is lower than values | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +size | Dataset size has reached % of the memory limit | true: warning
false: info | Has threshold parameter in the key/value section of the log entry. +syncer_connection_error | Replica of - database unable to sync with source | error | +syncer_general_error | Replica of - database unable to sync with source | error | + +## Non-UI events + +Logged events that do not appear in the UI + +| Event code name | Severity | Notes | +|-----------------|----------|-------| +| authentication_err | error | Replica of - Error authenticating with the source database | +| backup_failed | error | | +| backup_started | info | | +| backup_succeeded | info | | +| bdb_created | info | | +| bdb_deleted | info | | +| bdb_updated | info | Indicates that a BDB configuration has been updated | +| compression_unsup_err | error | Replica of - Compression not supported by sync destination | +| crossslot_err | error | Replica of - Sharded destination does not support operation executed on source | +| export_failed | error | | +| export_started | info | | +| export_succeeded | info | | +| import_failed | error | | +| import_started | info | | +| import_succeeded | info | | +| oom_err | error | Replica of - Replication source/target out of memory | \ No newline at end of file diff --git a/content/rs/administering/logging/rsyslog-logging/cluster-events.md b/content/rs/administering/logging/rsyslog-logging/cluster-events.md new file mode 100644 index 00000000000..099767683e6 --- /dev/null +++ b/content/rs/administering/logging/rsyslog-logging/cluster-events.md @@ -0,0 +1,51 @@ +--- +Title: Logged cluster alerts and events +linkTitle: Cluster alerts/events +description: Logged cluster alerts and events +weight: 50 +alwaysopen: false +categories: ["RS"] +--- + +The following cluster alerts and events can appear in `syslog`. + +## UI alerts + +Logged alerts that appear in the UI + +| Alert code name | Alert as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +even_node_count | True high availability requires an odd number of nodes with a minimum of three nodes | true: warning
false: info | +inconsistent_redis_sw | Not all databases are running the same open source version | true: warning
false: info | +inconsistent_rl_sw | Not all nodes in the cluster are running the same Redis Enterprise Cluster version | true: warning
false: info | +internal_bdb | Issues with internal cluster databases | true: warning
false: info | +multiple_nodes_down | Multiple cluster nodes are down - this might cause data loss | true: warning
false: info | +ram_overcommit | Cluster capacity is less than total memory allocated to its databases | true: error
false: info | +too_few_nodes_for_replication | Database replication requires at least two nodes in cluster | true: warning
false: info | + +## UI events + +Logged events that appear in the UI + +| Event code name | Event as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +| node_joined | Node joined | info | | +| node_remove_abort_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| node_remove_abort_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| node_remove_completed | Node removed | info | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| node_remove_failed | Node removed | error | The remove node is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| rebalance_abort_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| rebalance_abort_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| rebalance_completed | Nodes rebalanced | info | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | +| rebalance_failed | Nodes rebalanced | error | The nodes rebalance is a process that can fail and can also be cancelled. If cancelled, the cancellation process can succeed or fail. | + +## Non-UI events + +Logged events that do not appear in the UI + +| Event code name | Severity | Notes | +|-----------------|----------|-------| +| cluster_updated | info | Indicates that cluster settings have been updated | +| license_added | info | | +| license_deleted | info | | +| license_updated | info | | \ No newline at end of file diff --git a/content/rs/administering/logging/rsyslog-logging/node-events.md b/content/rs/administering/logging/rsyslog-logging/node-events.md new file mode 100644 index 00000000000..898342e3f70 --- /dev/null +++ b/content/rs/administering/logging/rsyslog-logging/node-events.md @@ -0,0 +1,36 @@ +--- +Title: Logged node alerts and events +linkTitle: Node alerts/events +description: Logged node alerts and events +weight: 50 +alwaysopen: false +categories: ["RS"] +--- + +The following node alerts and events can appear in `syslog`. + +## UI alerts + +Logged alerts that appear in the UI + +| Alert code name | Alert as shown in the UI | Severity | Notes | +|-----------------|--------------------------|----------|-------| +aof_slow_disk_io | Redis performance is degraded as result of disk I/O limits | true: error
false: info | +cpu_utilization | CPU utilization has reached % | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. +ephemeral_storage | Ephemeral storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | +failed | Node failed | critical | +free_flash | Flash storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. +insufficient_disk_aofrw | Node has insufficient disk space for AOF rewrite | true: error
false: info | +memory | Node memory has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | +net_throughput | Network throughput has reached MB/s | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | +persistent_storage | Persistent storage has reached % of its capacity | true: warning
false: info | Has global_threshold parameter in the key/value section of the log entry. | + +## Non-UI events + +Logged events that do not appear in the UI + +| Event code name | Severity | Notes | +|-----------------|----------|-------| +| checks_error | error | Indicates that one or more node checks have failed | +| node_abort_remove_request | info | | +| node_remove_request | info | | \ No newline at end of file diff --git a/content/rs/administering/logging/rsyslog-logging/user-events.md b/content/rs/administering/logging/rsyslog-logging/user-events.md new file mode 100644 index 00000000000..daf901e54ab --- /dev/null +++ b/content/rs/administering/logging/rsyslog-logging/user-events.md @@ -0,0 +1,20 @@ +--- +Title: Logged user events +linkTitle: User events +description: Logged user events +weight: 50 +alwaysopen: false +categories: ["RS"] +--- + +The following user events can appear in `syslog`. + +## Non-UI events + +Logged events that do not appear in the UI + +| Event code name | Severity | Notes | +|-----------------|----------|-------| +| user_created | info | | +| user_deleted | info | | +| user_updated | info | Indicates that a user configuration has been updated | From 504951a02a77b19a44a16dcab85b2a65e4d2d6ce Mon Sep 17 00:00:00 2001 From: Rachel Elledge Date: Fri, 29 Oct 2021 15:49:33 -0500 Subject: [PATCH 2/2] rsyslog logging edits --- .../logging/rsyslog-logging/_index.md | 486 ++++++++---------- 1 file changed, 220 insertions(+), 266 deletions(-) diff --git a/content/rs/administering/logging/rsyslog-logging/_index.md b/content/rs/administering/logging/rsyslog-logging/_index.md index 5e5d1d4a589..d32e74f5048 100644 --- a/content/rs/administering/logging/rsyslog-logging/_index.md +++ b/content/rs/administering/logging/rsyslog-logging/_index.md @@ -1,6 +1,6 @@ --- Title: rsyslog logging -description: +description: This document explains the structure of Redis Enterprise Software log entries in `rsyslog` and how to use these log entries to identify events. weight: $weight alwaysopen: false categories: ["RS"] @@ -8,307 +8,261 @@ aliases: /rs/administering/logging/rsyslog-logging/ /rs/administering/logging/rsyslog-logging.md /rs/administering/logging/rsyslog-logging/_index.md --- -This document explains the structure of Redis Enterprise Software log entries that go into `rsyslog` -and how to use these log entries to identify events. -Also, we recommend that you [secure your logs]({{< relref "/rs/security/logging.md" >}}) with a remote logging server and log rotation. +This document explains the structure of Redis Enterprise Software log entries in `rsyslog` and how to use these log entries to identify events. -## Logging concepts +{{}} +You can also [secure your logs]({{}}) with a remote logging server and log rotation. +{{}} -Redis Enterprise Software updates logs with entries from a variety of components in response to a host of actions and events that take place within the cluster. +## Log concepts -Individual events may generate multiple log entries, even though they stem from a single action. +Redis Enterprise Software logs information from a variety of components in response to actions and events that occur within the cluster. -For example, in order for the cluster to decide that a cluster node is -down there might be various log entries added by different cluster -components from various nodes with different descriptions, until the -cluster gets to a final decision that the node is actually down. In -other cases, similar entries might be added to the log and the cluster -eventually gets to a decision that the node is -actually not down. +In some cases, a single action, such as removing a node from the cluster, may actually consist of several events. These actions may generate multiple log entries. -In addition, some actions that might seem to the user as an atomic -action, like removing a node from the cluster, are actually made up of -several different events that take place in a sequence, and might also -fail in the process. +All log entries displayed in the admin console are also written to `syslog`. You can configure `rsyslog` to monitor `syslog`. Enabled alerts are logged to `syslog` and appear with other log entries. -As a result, all log entries displayed in the admin console are also written to `syslog`. You can configure `rsyslog` to monitor `syslog`. Enabled alerts are logged to `syslog` along and appear with other log entries. +### Types of log entries +Log entries are categorized into events and alerts. Both types of entries appear in the logs, but alert log entries also include a boolean `"state"` parameter that indicates whether the alert is enabled or disabled. -The log entries can be categorized into events and alerts. Events just -get logged, while alerts have a state attached to them. In the Mapping -UI events and alerts to log entries section below, there is a Category -column that calls out for each event whether it is an event or alert. RS -log entries include information about the specific event that occurred -as detailed below in the Log entry structure section. In addition, -rsyslog can be configured to add other information, like the event -severity for example. +Log entries include information about the specific event that occurred. See the log entry tables for [clusters]({{}}), [databases]({{}}), [nodes]({{}}), and [users]({{}}) for more details. -Since rsyslog entries do not include the severity information by -default, you can use the following instructions in order to log that -information (in Ubuntu): -Add the following line to /etc/rsyslog.conf -$template TraditionalFormatWithPRI,"%pri-text%:%timegenerated%:%HOSTNAME%:%syslogtag%:%msg:::drop-last-lf%\n" +### Severity -And modify $ActionFileDefaultTemplate to use your new template -$ActionFileDefaultTemplateTraditionalFormatWithPRI -Make sure to save the changes and restart rsyslog in order for the -changes to take effect. You can see the alerts and events under /var/log -in messages log file. +You can also configure `rsyslog` to add other information, such as the event severity. + +Since `rsyslog` entries do not include severity by default, you can follow these steps to enable it: + +1. Add the following line to `/etc/rsyslog.conf`: + ``` + $template TraditionalFormatWithPRI,"%pri-text%: %timegenerated% %HOSTNAME% %syslogtag%%msg:::drop-last-lf%\n" + ``` + +2. Modify `$ActionFileDefaultTemplate` to use your new template `$ActionFileDefaultTemplateTraditionalFormatWithPRI` + +3. Save these changes and restart `rsyslog` to apply them + +You can see the log entries for alerts and events in the `/var/log/messages` file. **Command components:** -- \%pri­text% ­adds the severity -- \%timegenerated% ­adds the timestamp -- \%HOSTNAME% ­adds the machine name -- \%syslogtag% ­the Redis Enterprise Software message as detailed below in the Log entry - structure section - below. -- \%msg:::drop­last­lf%n ­ removes duplicated log entries +- `%pri­text%` ­adds the severity +- `%timegenerated%` ­adds the timestamp +- `%HOSTNAME%` ­adds the machine name +- `%syslogtag%` adds ­the Redis Enterprise Software message. See the [log entry structure](#log-entry-structure) section for more details. +- `%msg:::drop­last­lf%n` ­removes duplicated log entries -### Log entry structure +## Log entry structure The log entries have the following basic structure: -event_log\[\]:{\} - -- event_log­ plain static text is always shown at the beginning - of the entry. -- processid­ the id of the process the logging in running under. -- listofkeyvaluepairsinanyorder­ a list of key value pairs describing - the - specific event. The key­values pairs can appear in any order. Some - key­value pairs are always shown, and some appear depending on the specific event. - - **Key­value pairs that always appear:** - - "type" A unique code­name identifying the event logged. For - the list of - codenames relevant for this purpose please review the event - code­name - column in the Mapping UI events and alerts to log entries - section below. - - "object" has the format of "\[:\]". Defines the - object type, and id if relevant, of the object this event is - related to. For - example cluster, node with id, bdb with id, etc'. - - "time" unix time, can be ignored in this context. - - **Key­value pairs that might appear depending on the specific - entry:** - - "state" boolean with value true or false. This is relevant - only for - entries from category alert. True means that the alert is - on. False means - that the alert is off. - - "global_threshold" a value of a threshold for alerts - related to the - "cluster"or "node"objects. - - "threshold" a value of a threshold for alerts related to the - "bdb" - object. - -### Log entry samples - -Below are examples of log entries that include the rsyslog configuration -mentioned above that add the severity, timestamp and machine name. + + event_log[]:{} + +- **event_log**:­ Plain static text is always shown at the beginning of the entry. +- **process id­**: The ID of the logging process +- **list of key-value pairs in any order**:­ A list of key-value pairs that describe the specific event. They can appear in any order. Some key­-value pairs are always shown, and some appear depending on the specific event. + - **Key-­value pairs that always appear:** + - `"type"`: A unique code­ name for the logged event. For the list of codenames, see the logged events and alerts tables for [clusters]({{}}), [databases]({{}}), [nodes]({{}}), and [users]({{}}). + - `"object"`: Defines the object type and ID (if relevant) of the object this event relates to, such as cluster, node with ID, BDB with ID, etc. Has the format of `[:]`. + - `"time"`: Unix epoch time but can be ignored in this context. + - **Key-­value pairs that might appear depending on the specific entry:** + - `"state"`: A boolean where `true` means the alert is enabled, and `false` means the alert is disabled. This is only relevant for alert log entries. + - `"global_threshold"`: The value of a threshold for alerts related to cluster or node objects. + - `"threshold"`: The value of a threshold for [alerts related to a BDB object]({{}}). + +## Log entry samples + +This section provides examples of log entries that include the [`rsyslog` configuration](#severity) to add the severity, timestamp, and machine name. ### Ephemeral storage passed threshold -**Alert on" log entry sample:** -daemon.warning:Jun1414:49:20node1event_log\[3464\]:{"storage_util": -90.061643120001,"global_threshold":"70″,"object":"node:1″,"state": -true,"time":1434282560,"type":"ephemeral_storage"} +#### "Alert on" log entry sample -The log entry above is an example of when the alert for node with id 1 -"Ephemeral storage has reached 70% of its capacity" has been raised as -result of storage utilization reaching the value of \~90%. +``` +daemon.warning: Jun 14 14:49:20 node1 event_log[3464]: +{ + "storage_util": 90.061643120001, + "global_threshold": "70", + "object": "node:1", + "state": true, + "time": 1434282560, + "type": "ephemeral_storage" +} +``` + +In this example, the storage utilization on node 1 reached the value of ~90%, which triggered the alert for "Ephemeral storage has reached 70% of its capacity." **Log entry components:** -- daemon.warning ­ severity of entry is warning -- Jun1414:49:20­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[3464\]­ process id -- "storage_util":90.061643120001­ current ephemeral storage - utilization, in this - case \~90% -- "global_threshold":"70″­ the user configured threshold above which - the alert is - raised, in this case it is 70% -- "object":"node:1″­ the object for which this alert has been raised - for, in this case it - is node with id 1 -- "state":true­ current state of the alert, in this case it is on -- "time":1434282560­ can be ignored -- "type":"ephemeral_storage"­ is the code name identifier of this - specific event, see - full mapping in the Mapping UI events and alerts to log entries - section below - -**Alert off" log entry sample:** - -daemon.info:Jun1414:51:35node1event_log\[3464\]:{"storage_util": -60.051723520008,"global_threshold":"70″,"object":"node:1″,"state": -false,"time":1434283480,"type":"ephemeral_storage"} - -The log entry above is an example of when the alert for node with id 1 -"Ephemeral storage has reached 70% of its capacity" has been turned off -as result of storage utilization reaching the value of \~60%. +- `daemon.warning` -­ Severity of entry is `warning` +- `Jun 14 14:49:20` -­ The timestamp of the event +- `node1`:­ Machine name +- `event_log` -­ Static text that always appears +- `[3464]­` - Process ID +- `"storage_util":90.061643120001` - Current ephemeral storage utilization +- `"global_threshold":"70"` - The user-configured threshold above which the alert is raised +- `"object":"node:1"`­ - The object related to this alert +- `"state":true­` - Current state of the alert +- `"time":1434282560­` - Can be ignored +- `"type":"ephemeral_storage"` - The code name of this specific event. See [logged node alerts and events]({{}}) for more details. + +#### "Alert off" log entry sample + +``` +daemon.info: Jun 14 14:51:35 node1 event_log[3464]: +{ + "storage_util":60.051723520008, + "global_threshold": "70", + "object": "node:1", + "state":false, + "time": 1434283480, + "type": "ephemeral_storage" +} +``` + +This log entry is an example of when the alert for the node with ID 1 "Ephemeral storage has reached 70% of its capacity" has been turned off as result of storage utilization reaching the value of ~60%. **Log entry components**: -- daemon.info ­ severity of entry is info -- Jun1414:51:35­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[3464\]­ process id -- "storage_util":60.051723520008­ current ephemeral storage - utilization, in this - case \~60% -- "global_threshold":"70″­ the user configured threshold above which - the alert is - raised, in this case it is 70% -- "object":"node:1″­ the object for which this alert has been raised - for, in this case it - is node with id 1 -- "state":false­ current state of the alert, in this case it is on -- "time":1434283480­ can be ignored -- "type":"ephemeral_storage"­ is the code name identifier of this - specific event, see - full mapping in the Mapping UI events and alerts to log entries - section below - Odd number of nodes with a minimum of three nodes alert - -**Alert on" log entry sample:** - -daemon.warning:Jun1415:25:00node1event_log\[8310\]:{"object": -"cluster","state":true,"time":1434284700,"node_count":1,"type": -"even_node_count"} - -The log entry above is an example of when the alert for "True high -availability requires an odd -number of nodes with a minimum of three nodes" has been turned on as -result of the cluster -having only one node. +- `daemon.info` -­ Severity of entry is `info` +- `Jun 14 14:51:35` -­ The timestamp of the event +- `node1` -­ Machine name +- `event_log` -­ Static text that always appears +- `[3464]` -­ Process ID +- `"storage_util":60.051723520008­` - Current ephemeral storage utilization +- `"global_threshold":"70"` - The user configured threshold above which the alert is raised (70% in this case) +- `"object":"node:1"` -­ The object related to this alert +- `"state":false­` - Current state of the alert +- `"time":1434283480­` - Can be ignored +- `"type":"ephemeral_storage"` -­ The code name identifier of this specific event. See [logged node alerts and events]({{}}) for more details. + +### Odd number of nodes with a minimum of three nodes alert + +#### "Alert on" log entry sample + +``` +daemon.warning: Jun 14 15:25:00 node1 event_log[8310]: +{ + "object":"cluster", + "state": true, + "time": 1434284700, + "node_count": 1, + "type":"even_node_count" +} +``` + +This log entry is an example of when the alert for "True high availability requires an odd number of nodes with a minimum of three nodes" has been turned on as result of the cluster having only one node. **Log entry components:** -- daemon.warning­ severity of entry is warning -- Jun1415:25:00­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[8310\]­ process id -- "object":"cluster"­ the object for which this alert has been raised - for, in this case - it is the cluster -- "state":true­ current state of the alert, in this case it is on -- "time":1434284700­ can be ignored -- "node_count":1­ the number of nodes in the cluster, in this case 1 -- "type":"even_node_count"­ is the code name identifier of this - specific event, see - full mapping in the Mapping UI events and alerts to log entries - section below - -**Alert off" log entry sample:** - -daemon.warning:Jun1415:30:40node1event_log\[8310\]:{"object": -"cluster","state":false,"time":1434285200,"node_count":3,"type": -"even_node_count"} - -The log entry above is an example of when the alert for "True high -availability requires an odd -number of nodes with a minimum of three nodes" has been turned off as -result of the cluster -having 3 nodes. +- `daemon.warning­` - Severity of entry is warning +- `Jun 14 15:25:00` - The timestamp of the event +- `node1­` - Machine name +- `event_log` -­ Static text that always appears +- `[8310]­` - Process ID +- `"object":"cluster"­` - The object related to this alert +- `"state":true` -­ Current state of the alert +- `"time":1434284700­` - Can be ignored +- `"node_count":1­` - The number of nodes in the cluster +- `"type":"even_node_count"­` - The code name identifier of this specific event. See [logged cluster alerts and events]({{}}) for more details. + +#### "Alert off" log entry sample + +``` +daemon.warning: Jun 14 15:30:40 node1 event_log[8310]: +{ + "object":"cluster", + "state": false, + "time": 1434285200, + "node_count": 3, + "type":"even_node_count" +} +``` + +This log entry is an example of when the alert for "True high availability requires an odd number of nodes with a minimum of three nodes" has been turned off as result of the cluster having 3 nodes. **Log entry components:** -- daemon.info­ severity of entry is warning -- Jun1415:30:40­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[8310\]­ process id -- "object":"cluster"­ the object for which this alert has been raised - for, in this case - it is the cluster -- "state":false­ current state of the alert, in this case it is off -- "time":1434285200­ can be ignored -- "node_count":3­ the number of nodes in the cluster, in this case 3 -- "type":"even_node_count"­ is the code name identifier of this - specific event, see - full mapping in the Mapping UI events and alerts to log entries - section below - Node has insufficient disk space for AOF rewrite - -**Alert on" log entry sample:** - -daemon.err:Jun1513:51:23node1event_log\[34252\]:{"used":23457188, -"missing":604602126,"object":"node:1″,"free":9867264,"needed": -637926578,"state":true,"time":1434365483,"disk":705667072,"type": -"insufficient_disk_aofrw"} - -The log entry above is an example of when the alert for "Node has -insufficient disk space for -AOF rewrite" has been turned on as result of not having enough -persistent storage disk space -for AOF rewrite purposes. It is missing 604602126 bytes. +- `daemon.warning` - Severity of entry is warning +- `Jun 14 15:30:40` -­ The timestamp of the event +- `node1­` - Machine name +- `event_log­` - Static text that always appears +- `[8310]` -­ Process ID +- `"object":"cluster"` -­ The object related to this alert +- `"state":false­` - Current state of the alert +- `"time":1434285200­` - Can be ignored +- `"node_count":3­` - The number of nodes in the cluster +- `"type":"even_node_count"` -­ The code name of this specific event. See [logged cluster alerts and events]({{}}) for more details. + +### Node has insufficient disk space for AOF rewrite + +#### "Alert on" log entry sample + +``` +daemon.err: Jun 15 13:51:23 node1 event_log[34252]: +{ + "used": 23457188, + "missing": 604602126, + "object": "node:1", + "free": 9867264, + "needed":637926578, + "state": true, + "time": 1434365483, + "disk": 705667072, + "type":"insufficient_disk_aofrw" +} +``` + +This log entry is an example of when the alert for "Node has insufficient disk space for AOF rewrite" has been turned on as result of not having enough persistent storage disk space for AOF rewrite purposes. It is missing 604602126 bytes. **Log entry components:** -- daemon.err­ severity of entry is err -- Jun1513:51:23­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[34252\]­ process id -- "used":23457188­ the amount of disk space in bytes currently used - for AOF files -- "missing":604602126­ the amount of disk space in bytes that is - currently missing for - AOF rewrite purposes -- "object":"node:1″­ the object for which this alert has been raised - for, in this case it - is node with id 1 -- "free":9867264­ the amount of disk space in bytes that is currently +- `daemon.err`­ - Severity of entry is error +- `Jun 15 13:51:23` - The timestamp of the event +- `node1­` - Machine name +- `event_log` -­ Static text that always appears +- `[34252]` -­ Process ID +- `"used":23457188­` - The amount of disk space in bytes currently used for AOF files +- `"missing":604602126­` - The amount of disk space in bytes that is currently missing for AOF rewrite purposes +- `"object":"node:1″` -­ The object related to this alert +- `"free":9867264­` - The amount of disk space in bytes that is currently free -- "needed":637926578­ the amount of total disk space in bytes that is - needed for AOF - rewrite purposes -- state":true­ current state of the alert, in this case it is on -- "time":1434365483­ can be ignored -- "disk":705667072­ the total size in bytes of the persistent storage -- "type":"insufficient_disk_aofrw"­ is the code name identifier of - this specific - event, see full mapping in the Mapping UI events and alerts to log - entries section below - -"Alert off" log entry sample: -daemon.info:Jun1513:51:11node1event_log\[34252\]:{"used":0,"missing": -‐21614592,"object":"node:1″,"free":21614592,"needed":0,"state": -false,"time":1434365471,"disk":705667072,"type": -"insufficient_disk_aofrw"} +- `"needed":637926578­` - The amount of total disk space in bytes that is needed for AOF rewrite purposes +- `"state":true­` - Current state of the alert +- `"time":1434365483` -­ Can be ignored +- `"disk":705667072­` - The total size in bytes of the persistent storage +- `"type":"insufficient_disk_aofrw"­` - The code name of this specific event. See [logged node alerts and events]({{}}) for more details. + +#### "Alert off" log entry sample + +``` +daemon.info: Jun 15 13:51:11 node1 event_log[34252]: +{ + "used": 0, "missing":-21614592, + "object": "node:1", + "free": 21614592, + "needed": 0, + "state":false, + "time": 1434365471, + "disk": 705667072, + "type":"insufficient_disk_aofrw" +} +``` **Log entry components:** -- daemon.info­ severity of entry is info -- Jun1513:51:11­ the timestamp of the event -- node1­ machine name -- event_log­ static text that always appears -- \[34252\]­ process id -- "used":0­ the amount of disk space in bytes currently used for AOF - files -- "missing":‐21614592­ the amount of disk space in bytes that is - currently missing for - AOF rewrite purposes, in this case it is not missing because the - number is negative -- "object":"node:1″­ the object for which this alert has been raised - for, in this case it - is node with id 1 -- "free":21614592­ the amount of disk space in bytes that is currently - free -- "needed":0­ the amount of total disk space in bytes that is needed - for AOF rewrite - purposes, in this case no space is needed -- "state":false­ current state of the alert, in this case it is off -- "time":1434365471­ can be ignored -- "disk":705667072­ the total size in bytes of the persistent storage -- "type":"insufficient_disk_aofrw"­ is the code name identifier of - this specific - event, see full mapping in the Mapping UI events and alerts to log - entries section below +- `daemon.info­` - Severity of entry is info +- `Jun 15 13:51:11` - The timestamp of the event +- `node1­` - Machine name +- `event_log` -­ Static text that always appears +- `[34252]­` - Process ID +- `"used":0­` - The amount of disk space in bytes currently used for AOF files +- `"missing":‐21614592­` - The amount of disk space in bytes that is currently missing for AOF rewrite purposes. In this case, it is not missing because the number is negative. +- `"object":"node:1″` -­ The object related to this alert +- `"free":21614592` -­ The amount of disk space in bytes that is currently free +- `"needed":0­` - The amount of total disk space in bytes that is needed for AOF rewrite purposes. In this case, no space is needed. +- `"state":false­` - Current state of the alert +- `"time":1434365471­` - Can be ignored +- `"disk":705667072­` - The total size in bytes of the persistent storage +- `"type":"insufficient_disk_aofrw"`­ - The code name of this specific event. See [logged node alerts and events]({{}}) for more details.