-
Couldn't load subscription status.
- Fork 8
Enhance metrics tracking and add dependencies #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced `prometheus` and `lazy_static` dependencies to facilitate metrics collection. - Added metrics recording for transaction durations in various components, including EIP-7702 and external bundler. - Updated the `metrics` module to include functions for tracking transaction states and processing times. - Enhanced the server to initialize and expose metrics via a dedicated endpoint. These changes improve observability and performance monitoring across the execution framework.
WalkthroughThis change set introduces Prometheus-based metrics instrumentation across executor components, adds a metrics HTTP endpoint to the server, and provides a full monitoring stack with Docker Compose, Prometheus, and Grafana configuration and dashboards. Executor logic is augmented to record transaction and job processing durations, and metrics are exposed via a new Changes
Sequence Diagram(s)Metrics Recording and Exposure FlowsequenceDiagram
participant Client
participant Executor
participant MetricsModule
participant Prometheus
participant Server
Executor->>MetricsModule: record_transaction_queued_to_sent/confirmed(...)
Executor->>MetricsModule: record_eoa_job_processing_time(...)
Prometheus->>Server: HTTP GET /metrics
Server->>MetricsModule: export_default_metrics()
MetricsModule-->>Server: Prometheus metrics text
Server-->>Prometheus: HTTP 200 with metrics
Prometheus-->>Grafana: Scrapes metrics for dashboard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
- Added a new helper function `calculate_duration_seconds_from_twmq` to compute duration in seconds using timestamps in TWMQ format. - Updated various components (EIP-7702 executor, external bundler, and EOA worker) to utilize the new TWMQ duration calculation for recording transaction metrics. - Enhanced test coverage for the new duration calculation function. These changes improve the accuracy of metrics tracking related to transaction durations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🔭 Outside diff range comments (1)
executors/src/eoa/store/submitted.rs (1)
307-310: Potential bug: passing tuple to ZREM
pipeline.zrem(key, member)is typically called with the member string, but hereto_legacy_redis_string_with_nonce()returns(String, u64)and is passed without destructuring. This likely fails or removes the wrong member.Apply:
- pipeline.zrem( - self.keys.submitted_transactions_zset_name(), - tx.to_legacy_redis_string_with_nonce(), - ); + let (legacy_member, _legacy_nonce) = tx.to_legacy_redis_string_with_nonce(); + pipeline.zrem( + self.keys.submitted_transactions_zset_name(), + legacy_member, + );
🧹 Nitpick comments (11)
monitoring/grafana/provisioning/dashboards/dashboard.yml (1)
6-6: Optional: Put dashboards under a named Grafana folderImproves organization in Grafana UI.
- folder: '' + folder: 'Thirdweb Engine'monitoring/prometheus.yml (1)
16-18: Trim trailing whitespace.YAML-lint reports trailing spaces at Line 18. Cleaning this avoids noisy CI lint failures.
executors/src/eip7702_executor/confirm.rs (1)
269-275: LGTM on conditional metric recording with presence checkThe optional timestamp guard is correct. After fixing units as above, this block is good.
Minor: consider reusing a single
confirmed_timestampfor logging/metrics if you log confirmation timing elsewhere to avoid tiny drift across calls.docker-compose.monitoring.yml (1)
10-10: Mount configs read-only to reduce tampering riskConfig mounts can be read-only for safety.
- - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml + - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro- - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards - - ./monitoring/grafana/provisioning:/etc/grafana/provisioning + - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards:ro + - ./monitoring/grafana/provisioning:/etc/grafana/provisioning:roAlso applies to: 28-29
executors/src/eip7702_executor/send.rs (1)
285-287: Clarify units in local variable namesent_timestamp is in milliseconds; rename to sent_timestamp_ms to avoid unit confusion when reviewing/maintaining.
- let sent_timestamp = current_timestamp_ms(); - let queued_to_sent_duration = calculate_duration_seconds_from_twmq(job.job.created_at, sent_timestamp); + let sent_timestamp_ms = current_timestamp_ms(); + let queued_to_sent_duration = + calculate_duration_seconds_from_twmq(job.job.created_at, sent_timestamp_ms);executors/src/external_bundler/send.rs (2)
508-510: Clarify units in local variable nameAlign with ms-based helper to avoid confusion.
- let sent_timestamp = current_timestamp_ms(); - let queued_to_sent_duration = calculate_duration_seconds_from_twmq(job.job.created_at, sent_timestamp); + let sent_timestamp_ms = current_timestamp_ms(); + let queued_to_sent_duration = + calculate_duration_seconds_from_twmq(job.job.created_at, sent_timestamp_ms);
510-510: Consider consolidating executor type labels"erc4337-external" is a sensible label to distinguish paths, but it's easy to drift. Consider centralizing executor type label strings (e.g., associated consts on handlers or a shared constants module) to keep labels consistent across modules and dashboards.
server/src/http/routes/admin/metrics.rs (1)
24-29: Add Cache-Control to avoid intermediary caching of metricsMetrics should generally not be cached by proxies.
- Ok(( - StatusCode::OK, - [("content-type", "text/plain; version=0.0.4; charset=utf-8")], - metrics_output, - ).into_response()) + Ok(( + StatusCode::OK, + [ + ("content-type", "text/plain; version=0.0.4; charset=utf-8"), + ("cache-control", "no-cache, no-store, must-revalidate"), + ], + metrics_output, + ).into_response())monitoring/grafana/dashboards/thirdweb-engine.json (1)
326-331: Optional: stabilize throughput rates by aggregatingIf you want one time series per executor_type and chain_id (as the legend implies), explicitly sum by those labels. It avoids duplicate overlapping series per ephemeral label combinations.
- "expr": "rate(tw_engine_executor_transaction_queued_to_sent_duration_seconds_count[5m])", + "expr": "sum by (executor_type, chain_id) (rate(tw_engine_executor_transaction_queued_to_sent_duration_seconds_count[5m]))", ... - "expr": "rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_count[5m])", + "expr": "sum by (executor_type, chain_id) (rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_count[5m]))",Also applies to: 338-344
executors/src/metrics.rs (2)
64-70: Make initialize_metrics idempotent to avoid accidental double initDouble initialization can cause split metrics across registries or confusion. Consider ignoring subsequent inits with a warning.
pub fn initialize_metrics(metrics: ExecutorMetrics) { let mut instance = EXECUTOR_METRICS_INSTANCE.write().unwrap(); - *instance = Some(Arc::new(metrics)); + if instance.is_some() { + // Avoid re-initialization; keep the first registry/instance + #[allow(unused_must_use)] + { tracing::warn!("Executor metrics already initialized; ignoring subsequent initialization"); } + return; + } + *instance = Some(Arc::new(metrics)); }Note: this uses tracing; if not available here, you can replace with eprintln! or remove the log.
161-198: Tests mutate global state; can flake under parallel test executioninitialize_metrics sets a global used by other tests. Consider serializing these tests or resetting the global between tests to avoid interference. Using serial_test or running these tests in a single-threaded group will avoid flakiness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
docker-compose.monitoring.yml(1 hunks)executors/Cargo.toml(1 hunks)executors/src/eip7702_executor/confirm.rs(3 hunks)executors/src/eip7702_executor/send.rs(3 hunks)executors/src/eoa/store/borrowed.rs(2 hunks)executors/src/eoa/store/submitted.rs(2 hunks)executors/src/eoa/worker/confirm.rs(0 hunks)executors/src/eoa/worker/mod.rs(2 hunks)executors/src/external_bundler/confirm.rs(3 hunks)executors/src/external_bundler/send.rs(3 hunks)executors/src/lib.rs(1 hunks)executors/src/metrics.rs(1 hunks)monitoring/grafana/dashboards/thirdweb-engine.json(1 hunks)monitoring/grafana/provisioning/dashboards/dashboard.yml(1 hunks)monitoring/grafana/provisioning/datasources/prometheus.yml(1 hunks)monitoring/prometheus.yml(1 hunks)server/Cargo.toml(1 hunks)server/src/http/routes/admin/metrics.rs(1 hunks)server/src/http/routes/admin/mod.rs(1 hunks)server/src/http/server.rs(2 hunks)server/src/main.rs(3 hunks)
💤 Files with no reviewable changes (1)
- executors/src/eoa/worker/confirm.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-06T15:44:13.701Z
Learnt from: d4mr
PR: thirdweb-dev/engine-core#5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: The EOA executor store uses comprehensive WATCH-based coordination where every Redis state mutation watches the lock key and validates ownership before proceeding. If the lock is lost during any operation, the transaction fails with LockLost error. This makes aggressive lock acquisition safe because only the actual lock holder can successfully perform state mutations, regardless of who claims the lock.
Applied to files:
executors/src/eoa/worker/mod.rs
🧬 Code Graph Analysis (7)
executors/src/eoa/worker/mod.rs (1)
executors/src/metrics.rs (4)
calculate_duration_seconds(120-122)calculate_duration_seconds_from_twmq(125-128)current_timestamp_ms(131-133)record_eoa_job_processing_time(112-117)
executors/src/external_bundler/confirm.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_confirmed(104-109)current_timestamp_ms(131-133)calculate_duration_seconds_from_twmq(125-128)
executors/src/eoa/store/borrowed.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_sent(97-102)current_timestamp_ms(131-133)calculate_duration_seconds(120-122)
executors/src/eip7702_executor/send.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_sent(97-102)current_timestamp_ms(131-133)calculate_duration_seconds_from_twmq(125-128)
executors/src/eip7702_executor/confirm.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_confirmed(104-109)current_timestamp_ms(131-133)calculate_duration_seconds_from_twmq(125-128)
executors/src/external_bundler/send.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_sent(97-102)current_timestamp_ms(131-133)calculate_duration_seconds_from_twmq(125-128)
executors/src/eoa/store/submitted.rs (1)
executors/src/metrics.rs (3)
record_transaction_queued_to_confirmed(104-109)current_timestamp_ms(131-133)calculate_duration_seconds(120-122)
🪛 YAMLlint (1.37.1)
monitoring/prometheus.yml
[error] 18-18: trailing spaces
(trailing-spaces)
🔇 Additional comments (19)
executors/src/lib.rs (1)
4-4: LGTM on adding metrics modulePublic exposure aligns with the PR’s goal of cross-crate instrumentation. Keep as-is if the server depends on these types.
server/Cargo.toml (1)
46-46: LGTM on adding engine-eip7702-coreKeeps server aligned with executor capabilities.
server/src/http/routes/admin/mod.rs (1)
1-3: LGTM on adding the metrics admin moduleModule wiring looks correct.
monitoring/grafana/provisioning/dashboards/dashboard.yml (2)
1-12: LGTM on provisioning setupProvider configuration and polling interval look sane.
11-12: Writable Dashboards Mount VerifiedWe’ve confirmed that in
docker-compose.monitoring.ymlthe dashboards directory is bind-mounted without a:roflag, so/var/lib/grafana/dashboardsis writable by default. If you prefer explicitness, you can append:rw:services: grafana: volumes: - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards:rwOptional improvement: to keep things organized, consider grouping your dashboard JSON files into subfolders under
monitoring/grafana/dashboards(e.g.dashboards/app1/,dashboards/app2/) and update theoptions.pathinmonitoring/grafana/provisioning/dashboards/dashboard.ymlaccordingly.monitoring/grafana/provisioning/datasources/prometheus.yml (1)
1-9: No issues found – looks good.executors/src/eoa/worker/mod.rs (1)
26-26: Potentially unused import.
calculate_duration_seconds_from_twmqis brought into scope but not referenced in this module; if none of the child sub-modules use it viasuper::, the build will warn. Please verify.server/src/main.rs (1)
102-111: Metrics bootstrap LGTM.Registry creation and global initialization are straightforward and safe.
executors/src/eoa/store/borrowed.rs (1)
15-15: LGTM on metrics importImport set matches usage. Keep executor type label stable (“eoa”) across the codebase.
If not already done, verify label consistency across executors to avoid fragmented panels. See script in a later comment.
executors/src/eoa/store/submitted.rs (1)
19-19: LGTM on metrics importImport aligns with usage and metrics module API.
executors/src/external_bundler/confirm.rs (2)
218-224: Verify executor label naming consistencyLabel used is "erc4337-external". Ensure this aligns with other ERC-4337 instrumentation (e.g., "erc4337" vs "erc4337-external") to avoid split dashboards and cardinality bloat.
I can run a repo-wide check for label values to confirm consistency (see script below).
18-18: LGTM on metrics importImports map to the used helpers and keep the handler lean.
executors/src/eip7702_executor/confirm.rs (1)
21-21: LGTM on metrics importImports align with usage; executor label “eip7702” looks consistent with the module.
executors/src/eip7702_executor/send.rs (2)
284-288: Good addition: queued→sent duration recorded using consistent helpersThe placement after successful bundler send is correct, using TWMQ seconds start and ms end via the provided helpers.
336-336: Propagating original queued timestamp looks rightPassing original_queued_timestamp from TWMQ (seconds) enables queued→confirmed duration downstream. Ensure the confirmation handler treats it as seconds, consistent with calculate_duration_seconds_from_twmq.
executors/src/external_bundler/send.rs (2)
507-511: Good instrumentation: queued→sent duration for external bundlerCorrect usage of helpers and consistent labeling. This will populate tw_engine_executor_transaction_queued_to_sent_duration_seconds for "erc4337-external".
546-547: Nice: carry original queued timestamp to confirm stageThis keeps queued→confirmed metrics accurate across the pipeline. Confirm the confirm handler interprets this as seconds (TWMQ) and not ms.
server/src/http/routes/admin/metrics.rs (2)
24-29: Correct Prometheus exposition responseTextEncoder + proper content-type is correct for the exposition format. Looks good.
40-51: Error handling is fine; logs are usefulIntoResponse maps to 500 with clear message; good operational behavior.
|
|
||
| services: | ||
| prometheus: | ||
| image: prom/prometheus:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pin images to specific versions instead of latest
Using :latest hurts reproducibility and can break environments unexpectedly.
Pin to known-good tags and document update cadence, e.g.:
- image: prom/prometheus:latest
+ image: prom/prometheus:v2.53.0- image: grafana/grafana:latest
+ image: grafana/grafana:10.4.3Also applies to: 22-22
🤖 Prompt for AI Agents
In docker-compose.monitoring.yml at lines 5 and 22, the image tags use
`:latest`, which reduces reproducibility and can cause unexpected breaks.
Replace `:latest` with specific, known-good version tags for the prom/prometheus
image and any other images using `:latest`. Additionally, add comments to
document the update cadence for these images to maintain clarity on when and how
to update them.
| grafana: | ||
| image: grafana/grafana:latest | ||
| container_name: tw-engine-grafana | ||
| ports: | ||
| - "3000:3000" | ||
| volumes: | ||
| - grafana-data:/var/lib/grafana | ||
| - ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards | ||
| - ./monitoring/grafana/provisioning:/etc/grafana/provisioning | ||
| environment: | ||
| - GF_SECURITY_ADMIN_USER=admin | ||
| - GF_SECURITY_ADMIN_PASSWORD=admin | ||
| - GF_USERS_ALLOW_SIGN_UP=false | ||
| restart: unless-stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Do not ship default Grafana admin credentials
Hardcoding GF_SECURITY_ADMIN_USER=admin and GF_SECURITY_ADMIN_PASSWORD=admin is unsafe even for demos (accidental exposure is common).
- Move credentials to an
.envfile (gitignored) or use secrets. - Alternatively, require interactive setup or use
GF_SECURITY_ADMIN_PASSWORD__FILE.
Example:
- environment:
- - GF_SECURITY_ADMIN_USER=admin
- - GF_SECURITY_ADMIN_PASSWORD=admin
- - GF_USERS_ALLOW_SIGN_UP=false
+ env_file:
+ - .env.monitoring
+ environment:
+ - GF_USERS_ALLOW_SIGN_UP=false.env.monitoring (not committed):
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=strong-unique-password
🤖 Prompt for AI Agents
In docker-compose.monitoring.yml around lines 21 to 34, the Grafana service uses
hardcoded default admin credentials which is insecure. Remove the hardcoded
GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD environment variables and
instead load these values from an external .env file or use Docker secrets.
Update the environment section to reference these variables without exposing
them in the compose file, ensuring the .env file is gitignored to prevent
accidental credential leaks.
| prometheus = "0.13.4" | ||
| lazy_static = "1.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Unify Prometheus across the workspace to avoid type duplication and registry fragmentation
Pin Prometheus at the workspace level and consume it via { workspace = true } in member crates. This prevents multiple versions in the graph and ensures metrics registered in executors are visible to the server.
Apply this diff in this Cargo.toml (requires adding the deps to [workspace.dependencies] in the root):
-prometheus = "0.13.4"
-lazy_static = "1.5.0"
+prometheus = { workspace = true }
+lazy_static = { workspace = true }Then, in the root Cargo.toml:
[workspace.dependencies]
prometheus = "0.13.4"
lazy_static = "1.5.0"Verify all crates use the same Prometheus version:
🏁 Script executed:
#!/bin/bash
# List all prometheus and lazy_static specs to ensure single sourced versions
rg -n --glob '**/Cargo.toml' $'prometheus\\s*=\\s*' -A 1
rg -n --glob '**/Cargo.toml' $'lazy_static\\s*=\\s*' -A 1Length of output: 338
Workspace-level dependency pinning and lazy_static modernization
We’ve verified that both executors and server currently depend on prometheus = "0.13.4", so there’s no drift today. To guard against future mismatches and ensure a single Prometheus registry, pin it at the workspace level:
– In your root Cargo.toml:
[workspace.dependencies]
prometheus = "0.13.4"– In each member Cargo.toml:
-[dependencies]
-prometheus = "0.13.4"
+prometheus = { workspace = true }Since lazy_static only appears in executors, workspace-pinning it is optional. However, you’re on Rust 2021, so consider migrating to the standard LazyLock or the lighter once_cell::sync::Lazy:
-use lazy_static::lazy_static;
-
-lazy_static! {
- static ref METRIC: Metric = build_metric();
-}
+use std::sync::LazyLock; // or once_cell::sync::Lazy
+
+static METRIC: LazyLock<Metric> = LazyLock::new(build_metric);Next steps:
- Add
prometheusto[workspace.dependencies]in the root. - Update each crate’s
prometheusentry to{ workspace = true }. - Replace
lazy_staticusages withstd::sync::LazyLockoronce_cell::sync::Lazy.
🤖 Prompt for AI Agents
In executors/Cargo.toml around lines 28 to 29, you need to remove the direct
version pinning of prometheus and instead reference it from the workspace by
setting prometheus = { workspace = true }. Then, in the root Cargo.toml, add
prometheus = "0.13.4" under [workspace.dependencies] to centralize the version.
For lazy_static, replace its usage with std::sync::LazyLock or
once_cell::sync::Lazy to modernize the code for Rust 2021, and remove
lazy_static from dependencies if no longer used.
| futures = "0.3.31" | ||
| moka = { version = "0.12.10", features = ["future"] } | ||
| prometheus = "0.13.4" | ||
| lazy_static = "1.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Prefer std::sync::LazyLock or once_cell over lazy_static in Rust 2024
With Edition 2024, you can avoid the macro dependency. Use std::sync::LazyLock (or OnceLock) or the once_cell crate for statics.
If you switch, remove the dependency:
-lazy_static = { workspace = true }And update metrics statics, e.g.:
use std::sync::LazyLock;
pub static HISTO: LazyLock<prometheus::Histogram> = LazyLock::new(|| {
prometheus::Histogram::with_opts(
prometheus::HistogramOpts::new("tx_duration_seconds", "…")
).unwrap()
});Find current usages that would need refactoring:
🏁 Script executed:
#!/bin/bash
rg -n $'lazy_static!\\(' -A 2Length of output: 29
To catch the usual brace‐style invocations, let’s re‐run the grep with a more appropriate pattern:
🏁 Script executed:
#!/bin/bash
# Find macro invocations of lazy_static (brace style)
rg -n 'lazy_static!' -A 2
# Find any explicit extern crate imports
rg -n '^extern crate lazy_static' -A 2Length of output: 336
Replace lazy_static with std::sync::LazyLock for your metrics registry
We detected one lazy_static! invocation in executors/src/metrics.rs around line 50. To adopt Rust 2024 without the lazy_static macro:
• In executors/Cargo.toml, remove the lazy_static dependency:
--- a/executors/Cargo.toml
@@ -29,7 +29,6 @@
…
-lazy_static = "1.5.0"• In executors/src/metrics.rs, replace the macro block with a LazyLock static:
--- a/executors/src/metrics.rs
@@ -49,7 +49,9 @@
-lazy_static! {
- /// Default metrics registry for executors (fallback if no external registry provided)
- static ref DEFAULT_EXECUTOR_METRICS_REGISTRY: Registry = Registry::new();
-}
+use std::sync::LazyLock;
+
+/// Default metrics registry for executors (fallback if no external registry provided)
+static DEFAULT_EXECUTOR_METRICS_REGISTRY: LazyLock<Registry> =
+ LazyLock::new(|| Registry::new());Adjust the static visibility (pub) if it was previously public. This removes the extra macro dependency and leverages the standard library’s lazy initializer.
🤖 Prompt for AI Agents
In executors/Cargo.toml at line 29, remove the lazy_static dependency entirely.
Then in executors/src/metrics.rs around line 50, replace the lazy_static! macro
usage with a static variable using std::sync::LazyLock for lazy initialization.
Adjust the static's visibility to match the previous one if it was public. This
eliminates the need for the lazy_static crate and uses the Rust 2024 standard
library feature instead.
| /// Original timestamp when the transaction was first queued (unix timestamp in milliseconds) | ||
| #[serde(default)] | ||
| pub original_queued_timestamp: Option<u64>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Timestamp unit mismatch between docstring and usage
Doc says milliseconds, but calculate_duration_seconds_from_twmq expects seconds for the start timestamp.
Pick one and align:
- If the field is milliseconds, switch to
calculate_duration_secondsin the metrics block. - If the field is seconds, correct the docstring to “…seconds”.
Include a unit test to prevent future regressions.
🤖 Prompt for AI Agents
In executors/src/eip7702_executor/confirm.rs around lines 41 to 44, the
docstring states the timestamp is in milliseconds but the function
calculate_duration_seconds_from_twmq expects seconds, causing a unit mismatch.
Decide whether to keep the field as milliseconds or seconds; if milliseconds,
update the metrics code to use calculate_duration_seconds instead of
calculate_duration_seconds_from_twmq; if seconds, correct the docstring to
reflect seconds. Additionally, add a unit test verifying the timestamp unit and
duration calculation to prevent future regressions.
| "expr": "histogram_quantile(0.95, rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_bucket[5m]))", | ||
| "instant": false, | ||
| "legendFormat": "p95 Queued to Confirmed - {{executor_type}}", | ||
| "range": true, | ||
| "refId": "A" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue for queued→confirmed
Aggregate by le (and executor_type) before histogram_quantile.
- "expr": "histogram_quantile(0.95, rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_bucket[5m]))",
+ "expr": "histogram_quantile(0.95, sum by (le, executor_type) (rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_bucket[5m])))",
...
- "expr": "histogram_quantile(0.50, rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_bucket[5m]))",
+ "expr": "histogram_quantile(0.50, sum by (le, executor_type) (rate(tw_engine_executor_transaction_queued_to_confirmed_duration_seconds_bucket[5m])))",Also applies to: 230-236
🤖 Prompt for AI Agents
In monitoring/grafana/dashboards/thirdweb-engine.json around lines 218 to 223
and 230 to 236, the Prometheus query for histogram_quantile does not aggregate
by the 'le' label and 'executor_type' before applying histogram_quantile. Modify
the query to include a sum by 'le' and 'executor_type' over the rate of the
histogram buckets before calling histogram_quantile, ensuring correct
aggregation for accurate quantile calculation.
| "expr": "histogram_quantile(0.95, rate(tw_engine_eoa_executor_job_processing_duration_seconds_bucket[5m]))", | ||
| "instant": false, | ||
| "legendFormat": "p95 EOA Job Processing - Chain {{chain_id}}", | ||
| "range": true, | ||
| "refId": "A" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOA job processing histograms: add aggregation by le and chain_id
Same histogram rule applies.
- "expr": "histogram_quantile(0.95, rate(tw_engine_eoa_executor_job_processing_duration_seconds_bucket[5m]))",
+ "expr": "histogram_quantile(0.95, sum by (le, chain_id) (rate(tw_engine_eoa_executor_job_processing_duration_seconds_bucket[5m])))",
...
- "expr": "histogram_quantile(0.50, rate(tw_engine_eoa_executor_job_processing_duration_seconds_bucket[5m]))",
+ "expr": "histogram_quantile(0.50, sum by (le, chain_id) (rate(tw_engine_eoa_executor_job_processing_duration_seconds_bucket[5m])))",Also applies to: 446-452
🤖 Prompt for AI Agents
In monitoring/grafana/dashboards/thirdweb-engine.json around lines 434 to 439
and 446 to 452, the histogram_quantile expression lacks aggregation by le and
chain_id labels. Modify the Prometheus query to include sum by (le, chain_id)
over the rate of the histogram buckets to correctly aggregate the histogram data
per chain_id and bucket boundary. This ensures accurate percentile calculation
grouped by chain_id.
| prometheus = "0.13.4" | ||
| thiserror = "2.0.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Single-source prometheus and thiserror via workspace to avoid version drift
Mirror the executors crate and consume from [workspace.dependencies] for consistency.
-prometheus = "0.13.4"
-thiserror = "2.0.12"
+prometheus = { workspace = true }
+thiserror = { workspace = true }Add to root Cargo.toml if not present:
[workspace.dependencies]
prometheus = "0.13.4"
thiserror = "2.0.12"Confirm there’s only one version in the tree and that thiserror is used:
🏁 Script executed:
#!/bin/bash
rg -n --glob '**/Cargo.toml' $'prometheus\\s*=\\s*' -A 1
rg -n --glob '**/Cargo.toml' $'thiserror\\s*=\\s*' -A 1
# Check usage of thiserror
rg -n $'derive\\(.*Error.*\\)' server -A 2
rg -n $'thiserror::Error' server -A 2Length of output: 1434
Workspace-pin prometheus and thiserror to avoid version drift
Move the version declarations into your root Cargo.toml under [workspace.dependencies] and switch all per-crate entries to workspace = true so there’s a single source of truth.
-
Add to root Cargo.toml (if not already present):
[workspace.dependencies] prometheus = "0.13.4" thiserror = "2.0.12"
-
In each crate’s
Cargo.toml, replace the hard-pinned versions with workspace references:-
executors/Cargo.toml:
- prometheus = "0.13.4" + prometheus = { workspace = true } - thiserror = "2.0.12" + thiserror = { workspace = true }
-
server/Cargo.toml:
- prometheus = "0.13.4" + prometheus = { workspace = true } - thiserror = "2.0.12" + thiserror = { workspace = true }
-
eip7702-core/Cargo.toml (line 14):
- thiserror = "2.0" + thiserror = { workspace = true }
-
thirdweb-core/Cargo.toml (line 20):
- thiserror = "2.0.12" + thiserror = { workspace = true }
-
twmq/Cargo.toml (line 13):
- thiserror = "2.0.12" + thiserror = { workspace = true }
-
core/Cargo.toml (line 12):
- thiserror = "2.0.12" + thiserror = { workspace = true }
-
aa-types/Cargo.toml (line 10):
- thiserror = "2.0.12" + thiserror = { workspace = true }
-
-
Confirmed usage of
thiserrorinserver/src/http/routes/admin/metrics.rs.
This ensures a single version of both crates across your entire workspace.
🤖 Prompt for AI Agents
In server/Cargo.toml at lines 47-48, prometheus and thiserror versions are
hardcoded, causing potential version drift. Move these version declarations to
the root Cargo.toml under [workspace.dependencies] with prometheus = "0.13.4"
and thiserror = "2.0.12". Then, in each crate's Cargo.toml (including
server/Cargo.toml), replace the version numbers with workspace = true to
reference the workspace versions, ensuring a single source of truth for these
dependencies across the workspace.
| pub mod queue; | ||
| pub mod eoa_diagnostics; | ||
| pub mod eoa_diagnostics; | ||
| pub mod metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Gate the metrics admin module behind a feature flag if exposure must be optional
This keeps /metrics opt-in for environments where it shouldn’t be exposed.
-pub mod metrics;
+#[cfg(feature = "metrics")]
+pub mod metrics;Ensure the router adds the route conditionally as well:
🏁 Script executed:
#!/bin/bash
rg -n $'/metrics' -A 3 server/srcLength of output: 367
Add cfg(feature = "metrics") to the metrics admin module and its route registration
Make the /metrics endpoint fully opt-in by guarding both the module and the router calls behind the metrics feature flag.
Locations to update:
server/src/http/routes/admin/mod.rs– gate themetricsmodule declarationserver/src/http/server.rs– conditionally register the/metricsroute and state
Proposed diffs:
--- a/server/src/http/routes/admin/mod.rs
+++ b/server/src/http/routes/admin/mod.rs
@@ Lines 1-3
- pub mod metrics;
+ #[cfg(feature = "metrics")]
+ pub mod metrics;--- a/server/src/http/server.rs
+++ b/server/src/http/server.rs
@@ Around line 82
- .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics))
- .with_state(state.metrics_registry.clone());
+ #[cfg(feature = "metrics")]
+ .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics))
+ .with_state(state.metrics_registry.clone());With these changes, environments without --features metrics won’t expose or compile the /metrics endpoint. LGTM once gated appropriately.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In server/src/http/routes/admin/mod.rs at line 2, the metrics module is declared
without a feature gate. Add a #[cfg(feature = "metrics")] attribute above the
pub mod metrics; line to conditionally compile this module only when the
"metrics" feature is enabled. Also, in server/src/http/server.rs, wrap the
registration of the /metrics route and any related state initialization with the
same #[cfg(feature = "metrics")] attribute to ensure the endpoint is only
included when the feature is enabled.
| let metrics_router = Router::new() | ||
| .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics)) | ||
| .with_state(state.metrics_registry.clone()); | ||
|
|
||
| let (router, api) = OpenApiRouter::with_openapi(ApiDoc::openapi()) | ||
| .nest("/v1", v1_router) | ||
| .split_for_parts(); | ||
|
|
||
| // Merge the hidden diagnostic routes after OpenAPI split | ||
| let router = router.merge(eoa_diagnostics_router); | ||
| let router = router | ||
| .merge(eoa_diagnostics_router) | ||
| .merge(metrics_router); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compile-time break: router state mismatch.
metrics_router is defined with state Arc<prometheus::Registry> but is merged into a router whose state is EngineServerState.
axum::Router::merge requires both routers to share the same state type, so this will fail to compile.
Unify the state, e.g. expose the registry via EngineServerState and keep all routers on that type:
- let metrics_router = Router::new()
- .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics))
- .with_state(state.metrics_registry.clone());
+ let metrics_router = Router::new()
+ .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics))
+ .with_state(state.clone()); // handler should extract registry from stateUpdate get_metrics to accept State<EngineServerState> and read state.metrics_registry.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let metrics_router = Router::new() | |
| .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics)) | |
| .with_state(state.metrics_registry.clone()); | |
| let (router, api) = OpenApiRouter::with_openapi(ApiDoc::openapi()) | |
| .nest("/v1", v1_router) | |
| .split_for_parts(); | |
| // Merge the hidden diagnostic routes after OpenAPI split | |
| let router = router.merge(eoa_diagnostics_router); | |
| let router = router | |
| .merge(eoa_diagnostics_router) | |
| .merge(metrics_router); | |
| let metrics_router = Router::new() | |
| .route("/metrics", get(crate::http::routes::admin::metrics::get_metrics)) | |
| .with_state(state.clone()); // handler should extract registry from state | |
| let (router, api) = OpenApiRouter::with_openapi(ApiDoc::openapi()) | |
| .nest("/v1", v1_router) | |
| .split_for_parts(); | |
| // Merge the hidden diagnostic routes after OpenAPI split | |
| let router = router | |
| .merge(eoa_diagnostics_router) | |
| .merge(metrics_router); |
🤖 Prompt for AI Agents
In server/src/http/server.rs around lines 81 to 92, the merge of metrics_router
fails because metrics_router uses state Arc<prometheus::Registry> while the main
router uses EngineServerState. To fix this, unify the state by modifying
metrics_router to use EngineServerState as its state type. Update the
get_metrics handler to accept State<EngineServerState> and access the
metrics_registry from that state. This ensures all routers share the same state
type and compiles correctly.
prometheusandlazy_staticdependencies to facilitate metrics collection.metricsmodule to include functions for tracking transaction states and processing times.These changes improve observability and performance monitoring across the execution framework.
Summary by CodeRabbit
New Features
/metricsendpoint for real-time monitoring.Enhancements
Chores