Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broadcast_errors metric #3710

Merged
merged 3 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- Add a new metric `broadcast_errors`` which
records the errors observed when broadcasting Txs
([\#3708](https://github.com/informalsystems/hermes/issues/3708))
14 changes: 14 additions & 0 deletions crates/relayer/src/chain/cosmos/retry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@ async fn do_send_tx_with_account_sequence_retry(
refreshing account sequence number and retrying once"
);

telemetry!(
broadcast_errors,
&account.address.to_string(),
response.code.into(),
&response.log,
);

refresh_account_and_retry_send_tx_with_account_sequence(
rpc_client, config, key_pair, account, tx_memo, messages,
)
Expand Down Expand Up @@ -147,6 +154,13 @@ async fn do_send_tx_with_account_sequence_retry(
"failed to broadcast tx with unrecoverable error"
);

telemetry!(
broadcast_errors,
&account.address.to_string(),
code.into(),
&response.log
);

Ok(response)
}
}
Expand Down
24 changes: 24 additions & 0 deletions crates/telemetry/src/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,9 @@ pub struct TelemetryState {

/// Sum of rewarded fees over the past FEE_LIFETIME seconds
period_fees: ObservableGauge<u64>,

/// Number of errors observed by Hermes when broadcasting a Tx
broadcast_errors: Counter<u64>,
}

impl TelemetryState {
Expand Down Expand Up @@ -371,6 +374,13 @@ impl TelemetryState {
.u64_observable_gauge("ics29_period_fees")
.with_description("Amount of ICS29 fees rewarded over the past 7 days")
.init(),

broadcast_errors: meter
.u64_counter("broadcast_errors")
.with_description(
"Number of errors observed by Hermes when broadcasting a Tx",
)
.init(),
}
}

Expand Down Expand Up @@ -1069,6 +1079,20 @@ impl TelemetryState {
pub fn add_visible_fee_address(&self, address: String) {
self.visible_fee_addresses.insert(address);
}

/// Add an error and its description to the list of errors observed after broadcasting
/// a Tx with a specific account.
pub fn broadcast_errors(&self, address: &String, error_code: u32, error_description: &String) {
let cx = Context::current();

let labels = &[
KeyValue::new("account", address.to_string()),
KeyValue::new("error_code", error_code.to_string()),
KeyValue::new("error_description", error_description.to_string()),
];

self.broadcast_errors.add(&cx, 1, labels);
}
}

use std::sync::Arc;
Expand Down
1 change: 1 addition & 0 deletions guide/src/documentation/telemetry/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ the `backlog_oldest_sequence` that is blocked.
| `tx_latency_submitted` | Latency for all transactions submitted to a chain (i.e., difference between the moment when Hermes received an event until the corresponding transaction(s) were submitted), per chain, counterparty chain, channel and port | `u64` ValueRecorder | None |
| `cleared_send_packet_count_total`  | Number of SendPacket events received during the initial and periodic clearing, per chain, counterparty chain, channel and port | `u64` Counter | Packet workers enabled, and periodic packet clearing or clear on start enabled |
| `cleared_acknowledgment_count_total` | Number of WriteAcknowledgement events received during the initial and periodic clearing, per chain, counterparty chain, channel and port | `u64` Counter | Packet workers enabled, and periodic packet clearing or clear on start enabled |
| `broadcast_errors_total` | Number of errors observed by Hermes when broadcasting a Tx, per error type and account | `u64` Counter | Packet workers enabled |

Notes:
- The two metrics `cleared_send_packet_count_total` and `cleared_acknowledgment_count_total` are only populated if `tx_confirmation = true`.
Expand Down
Loading