-
-
Notifications
You must be signed in to change notification settings - Fork 153
Improve metrics collection #1415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metrics collection #1415
Conversation
WalkthroughSwitches several per-date metrics from gauges to counters and adds many by-date billing/cluster counters and helpers; removes legacy per-provider storage metrics and StorageMetrics trait; introduces a provider-aware MetricLayer and replaces per-call timing with per-date counters across object stores; simplifies listing flows and updates query/ingest code to emit new counters; renames a log format and updates tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant QueryAPI
participant SchemaProvider
participant Storage
participant Metrics
Client->>QueryAPI: GET /query
activate QueryAPI
QueryAPI->>Metrics: increment_query_calls_by_date(date)
QueryAPI->>SchemaProvider: plan & scan(table, filters)
activate SchemaProvider
SchemaProvider->>Storage: list_dirs_relative(...)
Storage-->>SchemaProvider: prefixes/files
SchemaProvider->>Metrics: increment_files_scanned_in_query_by_date(count,date)
SchemaProvider->>Metrics: increment_bytes_scanned_in_query_by_date(bytes,date)
SchemaProvider-->>QueryAPI: plan
deactivate SchemaProvider
QueryAPI-->>Client: results
deactivate QueryAPI
sequenceDiagram
autonumber
participant Ingest
participant EventProc
participant Metrics
Ingest->>EventProc: Event::process(rows, size, ts)
activate EventProc
EventProc->>Metrics: update_stats(...)
EventProc->>Metrics: increment_events_ingested_by_date(rows, date)
EventProc->>Metrics: increment_events_ingested_size_by_date(size, date)
EventProc-->>Ingest: Ok
deactivate EventProc
sequenceDiagram
autonumber
participant Caller
participant MetricLayer
participant Provider
participant Metrics
Caller->>MetricLayer: op(args)
activate MetricLayer
MetricLayer->>Provider: op(args)
Provider-->>MetricLayer: Result<T, Error>
MetricLayer->>Metrics: STORAGE_REQUEST_RESPONSE_TIME{provider,op,status}.observe(duration)
MetricLayer-->>Caller: Result<T, Error>
deactivate MetricLayer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/query/stream_schema_provider.rs (1)
578-585
: Fix type mismatch: ObjectStoreUrl::parse expects &str, not url::Url.This won’t compile. Pass a string slice (or format!) instead.
- ObjectStoreUrl::parse(storage.store_url()).unwrap(), + ObjectStoreUrl::parse(storage.store_url().as_str()).unwrap(),src/storage/gcs.rs (1)
707-756
: Duplicate GET timing and counting inside get_objects; rely on _get_object metrics.
self.get_object(...)
already records GET timing and counts. The subsequentREQUEST_RESPONSE_TIME
GET observation andFILES_SCANNED
GET inc here double-count and the timing uses the LIST stopwatch, not the GET duration.- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - STORAGE_FILES_SCANNED - .with_label_values(&["gcs", "GET"]) - .inc();src/storage/metrics_layer.rs (1)
294-303
: Fix label cardinality mismatch in StreamMetricWrapper (will panic at runtime).STORAGE_REQUEST_RESPONSE_TIME expects 3 labels [provider, method, status], but StreamMetricWrapper supplies only ["LIST","200"]. This will trigger a panic on observe. Also, the wrapper stores &'static str labels, preventing passing the dynamic provider string.
Refactor the wrapper to carry provider and emit all three labels.
Apply this diff:
@@ - fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + method: "LIST", + status: "200", + inner, + }; Box::pin(res) } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + method: "LIST_OFFSET", + status: "200", + inner, + }; Box::pin(res) } @@ -struct StreamMetricWrapper<'a, const N: usize, T> { +struct StreamMetricWrapper<'a, T> { time: time::Instant, - labels: [&'static str; N], + provider: &'a str, + method: &'static str, + status: &'static str, inner: BoxStream<'a, T>, } -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) - .observe(self.time.elapsed().as_secs_f64()); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[self.provider, self.method, self.status]) + .observe(self.time.elapsed().as_secs_f64()); tAlso applies to: 310-319, 402-426
src/storage/azure_blob.rs (1)
763-812
: Remove duplicate and inaccurate GET metrics inside get_objects().Each self.get_object() already records GET metrics. Additionally, using list_start to time GET inflates durations with the loop’s total runtime.
Apply this diff:
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - STORAGE_FILES_SCANNED - .with_label_values(&["azure_blob", "GET"]) - .inc();Also applies to: 792-799
src/storage/s3.rs (2)
343-365
: Avoid panic on GET: handle body read errors and measure full GET latency.Same issue as Azure: unwrap on resp.bytes() can panic and elapsed is computed before reading the body.
Apply this diff:
async fn _get_object(&self, path: &RelativePath) -> Result<Bytes, ObjectStorageError> { - let time = std::time::Instant::now(); - let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let time = std::time::Instant::now(); + let resp = self.client.get(&to_object_store_path(path)).await; STORAGE_FILES_SCANNED .with_label_values(&["s3", "GET"]) .inc(); - match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(elapsed); - Ok(body) - } - Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", status_code]) - .observe(elapsed); - Err(err.into()) - } - } + match resp { + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", "200"]) + .observe(elapsed); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }, + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } }
874-923
: Remove duplicate and inaccurate GET metrics inside get_objects().Each self.get_object() already records GET metrics. Using list_start for GET inflates durations.
Apply this diff:
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - STORAGE_FILES_SCANNED - .with_label_values(&["s3", "GET"]) - .inc();Also applies to: 903-909
🧹 Nitpick comments (18)
src/catalog/mod.rs (1)
190-206
: Result.map(...).unwrap_or(0) on counter get(): LGTM, but consider unwrap_or_default().
- Using .map(|m| m.get()).unwrap_or(0) over the prometheus Result is concise and correct with IntCounterVec (u64).
- Minor nit: unwrap_or_default() reads slightly clearer and avoids literal zeros.
No functional issues.
src/stats.rs (1)
205-221
: delete_with_label_prefix now targets IntCounterVec: LGTM, but be aware of cost.Iterating MetricFamily and removing by exact label map is correct and avoids guessing dates. This is O(N labels) and fine for admin paths; keep it off hot paths.
src/storage/object_storage.rs (1)
148-150
: Filename parsing assumes dot-separated convention; consider a defensive fallback (context-aware).This split/indexing will panic if the filename deviates. Based on the retrieved learning (Parseable’s server guarantees the
date=YYYY-MM-DD.hour=HH.minute=MM
pattern), this is safe in normal flows. Still, a debug assert or graceful fallback would make this more robust during migrations or manual uploads.- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; + let mut file_date_part = filename.split('.').next().unwrap_or(filename); + file_date_part = file_date_part + .split('=') + .nth(1) + .unwrap_or_else(|| { + debug_assert!(false, "Unexpected parquet filename format: {filename}"); + "unknown-date" + });src/query/listing_table_builder.rs (1)
98-109
: Best-effort listing instead of hard fail (optional).A transient failure on one prefix aborts the whole listing. Consider logging and continuing, so older segments still get scanned.
- Err(e) => { - return Err(DataFusionError::External(Box::new(e))); - } + Err(e) => { + tracing::warn!("list_dirs_relative failed for {}: {e}", prefix); + continue; + }src/query/stream_schema_provider.rs (2)
569-573
: Metric semantics: “files scanned” vs planned parquet count.You increment
STORAGE_FILES_SCANNED[.., "GET"]
by the number of parquet files from manifests. These are “planned to scan,” not actual GET/HEADs (which DataFusion will perform later). Either move this count where I/O happens or rename/use a planning metric to avoid inflating GET counts.
843-866
: Duplicate manifest collection logic; unify to one helper.This is nearly identical to enterprise’s
collect_manifest_files
(see src/enterprise/utils.rs). Consider centralizing it to avoid divergence in error handling and performance.src/storage/gcs.rs (2)
665-690
: HEAD double-count:STORAGE_FILES_SCANNED
incremented twice.You inc() at Lines 669-671 and again at 677-679 within the Ok branch. Keep only one increment per HEAD to avoid inflation.
- // Record single file accessed - STORAGE_FILES_SCANNED - .with_label_values(&["gcs", "HEAD"]) - .inc();
348-386
: LIST metrics only capture stream creation time, not stream consumption (optional).You observe LIST timing immediately after creating the stream. For truer latency, consider measuring from start to end of consumption or supplement with per-chunk timings (you already do that for GET/DELETE). Not critical if consistency across providers is the goal.
src/storage/localfs.rs (3)
133-137
: Consider not incrementing HEAD for an unimplemented operation.Incrementing “files scanned” for HEAD even though it immediately errors skews counts. Either remove it here or add a dedicated “UNSUPPORTED” status dimension (if you want to track attempts).
- STORAGE_FILES_SCANNED - .with_label_values(&["localfs", "HEAD"]) - .inc();
304-373
: Streaming LIST/GET accounting is consistent; minor nit on 404 mapping.This path reports LIST 200 when the directory exists and per-file GET 200/404 as appropriate. If directory doesn’t exist you surface LIST 404 and return early—fine for now. Consider treating missing dir as empty listing to be more permissive, but not required.
416-440
: DELETE prefix: observe status but no “files scanned” increment (OK).Given this removes a directory, not a file object, omitting a DELETE counter increment makes sense. If you want parity with object-store providers, you could increment by the number of entries removed (requires walking), but that’s optional.
src/storage/metrics_layer.rs (2)
39-66
: Broaden and correct error-to-status mapping.
- Mapping Generic to 400 likely misclassifies server/transport failures as client errors; 500 is safer.
- Consider handling other common variants (e.g., InvalidPath -> 400, InvalidRange -> 416, Timeout/DeadlineExceeded -> 504/408) for better fidelity.
Apply this diff (adjust variants per object_store version you use):
pub fn error_to_status_code(err: &object_store::Error) -> &'static str { match err { - // 400 Bad Request - Client errors - object_store::Error::Generic { .. } => "400", + // 400 Bad Request - obvious client errors + object_store::Error::InvalidPath { .. } => "400", + // 416 Range Not Satisfiable + object_store::Error::InvalidRange { .. } => "416", + // Treat generic/unknown as server-side + object_store::Error::Generic { .. } => "500", // 401 Unauthorized - Authentication required object_store::Error::Unauthenticated { .. } => "401", // 404 Not Found - Resource doesn't exist object_store::Error::NotFound { .. } => "404", // 409 Conflict - Resource already exists object_store::Error::AlreadyExists { .. } => "409", // 412 Precondition Failed - If-Match, If-None-Match, etc. failed object_store::Error::Precondition { .. } => "412", // 304 Not Modified object_store::Error::NotModified { .. } => "304", // 501 Not Implemented - Feature not supported object_store::Error::NotSupported { .. } => "501", + // 504 Gateway Timeout (or choose 408 Request Timeout) + object_store::Error::Timeout { .. } => "504", // 500 Internal Server Error - All other errors _ => "500", } }If the exact variant names differ in your object_store version, I can tailor this.
369-399
: Standardize method labels for conditional ops.Use operation names that match method names to ease querying and dashboards.
Apply this diff:
- .with_label_values(&[&self.provider, "COPY_IF", status]) + .with_label_values(&[&self.provider, "COPY_IF_NOT_EXISTS", status]) @@ - .with_label_values(&[&self.provider, "RENAME_IF", status]) + .with_label_values(&[&self.provider, "RENAME_IF_NOT_EXISTS", status])src/storage/azure_blob.rs (2)
723-746
: HEAD files-scanned should increment regardless of success.Other paths (e.g., check) count attempts, not just successes. Increment once per HEAD call to keep semantics consistent.
Apply this diff:
- match &result { + match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "HEAD", "200"]) .observe(head_elapsed); - // Record single file accessed - STORAGE_FILES_SCANNED - .with_label_values(&["azure_blob", "HEAD"]) - .inc(); } Err(err) => { let status_code = error_to_status_code(err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "HEAD", status_code]) .observe(head_elapsed); } } - Ok(result?) + // Count the attempt once + STORAGE_FILES_SCANNED + .with_label_values(&["azure_blob", "HEAD"]) + .inc(); + Ok(result?)
1006-1056
: Consider counting HEAD attempts in list_old_streams() checks.You record HEAD latency per stream but don’t increment STORAGE_FILES_SCANNED for HEAD. Add a single increment per attempt inside the task.
Apply this diff:
let task = async move { let head_start = Instant::now(); let result = self.client.head(&StorePath::from(key)).await; let head_elapsed = head_start.elapsed().as_secs_f64(); match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "HEAD", "200"]) .observe(head_elapsed); } Err(err) => { let status_code = error_to_status_code(err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "HEAD", status_code]) .observe(head_elapsed); } } + STORAGE_FILES_SCANNED + .with_label_values(&["azure_blob", "HEAD"]) + .inc(); result.map(|_| ()) };src/storage/s3.rs (2)
666-686
: Standardize multipart INIT label to match Azure/GCS.Use PUT_MULTIPART_INIT for the initiation step for consistency across providers.
Apply this diff:
- let mut async_writer = match async_writer { + let mut async_writer = match async_writer { Ok(writer) => { STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", "200"]) + .with_label_values(&["s3", "PUT_MULTIPART_INIT", "200"]) .observe(multipart_elapsed); writer } Err(err) => { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", status_code]) + .with_label_values(&["s3", "PUT_MULTIPART_INIT", status_code]) .observe(multipart_elapsed); return Err(err.into()); } };Also applies to: 672-676
1117-1167
: Consider counting HEAD attempts in list_old_streams() checks.Latency is recorded per HEAD, but files-scanned is not incremented.
Apply this diff:
let task = async move { let head_start = Instant::now(); let result = self.client.head(&StorePath::from(key)).await; let head_elapsed = head_start.elapsed().as_secs_f64(); @@ } - result.map(|_| ()) + STORAGE_FILES_SCANNED + .with_label_values(&["s3", "HEAD"]) + .inc(); + result.map(|_| ()) };Also applies to: 1140-1159
src/metrics/storage.rs (1)
25-33
: Global histogram looks good; consider explicit buckets.Default buckets may not fit object-store latencies. Consider Prometheus-style buckets for sub-ms to tens-of-seconds.
Apply this diff:
pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) + HistogramVec::new( + HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + .namespace(METRICS_NAMESPACE) + .buckets(vec![ + 0.005, 0.01, 0.025, 0.05, 0.1, + 0.25, 0.5, 1.0, 2.5, 5.0, + 10.0, 30.0, 60.0, + ]), + &["provider", "method", "status"], + ) .expect("metric can be created") });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
src/catalog/mod.rs
(1 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(6 hunks)src/metrics/storage.rs
(1 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/mod.rs
(1 hunks)src/query/stream_schema_provider.rs
(12 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(24 hunks)src/storage/gcs.rs
(24 hunks)src/storage/localfs.rs
(15 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(2 hunks)src/storage/s3.rs
(24 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-06-18T06:39:04.775Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Applied to files:
src/query/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/localfs.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/stream_schema_provider.rs
🧬 Code graph analysis (9)
src/catalog/mod.rs (1)
src/stats.rs (1)
event_labels
(223-225)
src/metadata.rs (1)
src/catalog/mod.rs (4)
num_rows
(61-61)num_rows
(78-80)ingestion_size
(58-58)ingestion_size
(70-72)
src/storage/gcs.rs (3)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/object_storage.rs (3)
parseable_json_path
(1281-1283)new
(82-91)head
(209-209)src/storage/s3.rs (3)
from
(1345-1353)from
(1357-1359)head
(831-857)
src/storage/azure_blob.rs (2)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/gcs.rs (12)
resp
(296-302)resp
(420-426)resp
(1130-1135)resp
(1164-1169)_delete_prefix
(233-282)_list_streams
(284-346)name
(126-128)_list_dates
(348-386)_upload_file
(469-494)head
(664-690)get_ingestor_meta_file_paths
(758-793)check
(885-911)
src/storage/localfs.rs (5)
src/storage/object_storage.rs (7)
get_ingestor_meta_file_paths
(339-341)delete_prefix
(227-227)delete_object
(338-338)check
(228-228)delete_stream
(229-229)try_delete_node_meta
(346-346)list_streams
(230-230)src/storage/metrics_layer.rs (2)
delete_stream
(287-292)copy
(337-351)src/storage/azure_blob.rs (7)
get_ingestor_meta_file_paths
(814-849)delete_prefix
(909-913)delete_object
(915-939)check
(941-967)delete_stream
(969-973)try_delete_node_meta
(975-1000)list_streams
(1002-1004)src/storage/gcs.rs (7)
get_ingestor_meta_file_paths
(758-793)delete_prefix
(853-857)delete_object
(859-883)check
(885-911)delete_stream
(913-917)try_delete_node_meta
(919-944)list_streams
(946-948)src/storage/s3.rs (9)
get_ingestor_meta_file_paths
(925-960)from
(1345-1353)from
(1357-1359)delete_prefix
(1020-1024)delete_object
(1026-1050)check
(1052-1078)delete_stream
(1080-1084)try_delete_node_meta
(1086-1111)list_streams
(1113-1115)
src/storage/s3.rs (3)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/object_storage.rs (4)
parseable_json_path
(1281-1283)new
(82-91)name
(200-200)head
(209-209)src/storage/azure_blob.rs (8)
resp
(331-337)resp
(455-461)resp
(1190-1195)resp
(1224-1229)_delete_prefix
(268-317)_list_streams
(319-381)name
(165-167)head
(723-746)
src/metrics/storage.rs (1)
src/storage/metrics_layer.rs (1)
new
(75-80)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(723-746)src/storage/gcs.rs (1)
head
(664-690)src/storage/s3.rs (1)
head
(831-857)
src/query/stream_schema_provider.rs (3)
src/parseable/mod.rs (4)
storage
(244-246)new
(146-158)serde_json
(300-300)serde_json
(306-306)src/enterprise/utils.rs (1)
collect_manifest_files
(159-182)src/storage/s3.rs (2)
from
(1345-1353)from
(1357-1359)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (17)
src/metrics/mod.rs (3)
268-276
: Registry updates for new totals: LGTM.The new TOTAL_*_DATE metrics are registered. Once labels include stream (see prior comment), this remains correct.
33-33
: Help text clarifications: LGTM.Descriptions now explicitly say “for a stream”, improving clarity and aligning with label cardinality.
Also applies to: 41-46, 53-53, 61-61, 69-75, 83-83, 93-97, 107-107, 119-119
127-137
: The script will print the relevant section for review. Please share the output so we can verify whether the catalog module correctly casts metrics to u64.src/query/mod.rs (1)
555-566
: The script will dump thecollect_manifest_files
implementation so we can verify how it handles absolute URLs.src/stats.rs (1)
25-25
: Imports for counters and TOTAL_ metrics: LGTM.*Prepares this module to manage per-date counters and totals.
Also applies to: 32-34
src/metadata.rs (1)
49-57
: Counters: correct shift to inc_by for per-date metrics.
- update_stats and load_daily_metrics now use inc_by(...) for per-date counters. This matches the new metric types and avoids set(...) races.
One caution: load_daily_metrics should be called only once per process start (or after clearing/removing label series) to avoid double accumulation.
Provide the call chain where load_daily_metrics is invoked to ensure idempotent usage.
Also applies to: 181-189
src/storage/object_storage.rs (1)
158-166
: Metric type consistency: counter vs gauge.
EVENTS_STORAGE_SIZE_DATE
switched toinc_by
(counter), whileSTORAGE_SIZE
/LIFETIME_EVENTS_STORAGE_SIZE
still use.add()
(gauge). If that’s intentional (lifetime and current gauges vs per-date counters), all good. Please confirmEVENTS_STORAGE_SIZE_DATE
is now anIntCounterVec
or this won’t compile.src/query/stream_schema_provider.rs (1)
281-289
: API shift to storage-only listing looks good.The migration to storage-based listing and removal of the ObjectStore dependency keeps concerns localized. No issues spotted here.
src/storage/localfs.rs (3)
148-176
: GET instrumentation looks solid; status mapping covers 200/404/500.Good balance between observability and behavior: no panics, and single-file GET increments are accurate.
389-408
: PUT metrics and single-file count are correct.Instrumentation patterns align with the rest of the providers.
557-602
: list_streams observability matches provider patterns; LGTM.The LIST status timing and error-to-status mapping are consistent. No issues spotted.
src/storage/metrics_layer.rs (2)
71-80
: Provider label addition looks good.Storing provider on MetricLayer and extending new(inner, provider) is the right move for consistent tagging.
96-110
: Instrumentation approach is clean and consistent.Measuring elapsed time per call, mapping to status, and returning inner results directly keeps behavior unchanged while improving observability.
Also applies to: 118-131, 154-167, 172-185, 187-201, 203-217, 219-233, 235-253, 255-269, 271-285, 321-335, 337-351, 353-367, 369-383, 385-399
src/storage/azure_blob.rs (2)
245-265
: PUT metrics and counting look good.Timing and status labeling are correct and the files-scanned counter is incremented once per PUT attempt.
267-316
: Instrumentation and counters look consistent across operations.Timing + provider/method/status labels are applied consistently; per-operation files-scanned counters are incremented appropriately.
Also applies to: 319-381, 383-421, 423-502, 504-529, 915-939, 941-967, 977-999, 1069-1095, 1103-1129, 1169-1235
src/storage/s3.rs (1)
372-392
: Instrumentation is broadly consistent and aligns with PR goals.Per-operation timings, provider/method/status labeling, and files-scanned counters are applied as expected.
Also applies to: 395-444, 446-508, 510-548, 550-629, 631-656, 693-786, 788-821, 925-960, 962-1006, 1026-1050, 1052-1078, 1086-1111, 1179-1241, 1277-1295, 1311-1329
src/metrics/storage.rs (1)
35-45
: Files-scanned counter is a good addition.Labeling by [provider, operation] should make the dashboards for “files touched” straightforward.
6850fb2
to
c88615f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/query/stream_schema_provider.rs (1)
587-596
: Avoid unwrap on ObjectStoreUrl::parse to prevent rare panicsIf storage.store_url() is ever malformed (misconfiguration), unwrap will panic the query path. Map to a DataFusion error instead.
Apply this diff:
- ObjectStoreUrl::parse(storage.store_url()).unwrap(), + ObjectStoreUrl::parse(storage.store_url()) + .map_err(|e| DataFusionError::External(Box::new(e)))?,src/storage/metrics_layer.rs (1)
295-303
: Fix label cardinality bug in StreamMetricWrapper (2 labels supplied to a 3-label histogram).STORAGE_REQUEST_RESPONSE_TIME expects 3 labels: provider, operation, status. The StreamMetricWrapper supplies only ["LIST","200"], causing a runtime label-cardinality panic and losing metrics for streaming list operations.
Refactor wrapper to carry provider and emit 3 labels:
@@ - fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + op: "LIST", + status: "200", + inner, + }; Box::pin(res) } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + op: "LIST_OFFSET", + status: "200", + inner, + }; Box::pin(res) } @@ -struct StreamMetricWrapper<'a, const N: usize, T> { - time: time::Instant, - labels: [&'static str; N], - inner: BoxStream<'a, T>, -} +struct StreamMetricWrapper<'a, T> { + time: time::Instant, + provider: &'a str, + op: &'static str, + status: &'static str, + inner: BoxStream<'a, T>, +} @@ -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) - .observe(self.time.elapsed().as_secs_f64()); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[self.provider, self.op, self.status]) + .observe(self.time.elapsed().as_secs_f64()); tAlso applies to: 305-320, 402-426
src/storage/gcs.rs (1)
760-775
: Remove duplicate GET timing and counting from get_objects loop.You already time GETs and increment GET files-scanned inside _get_object(). The loop here re-records GET using the list_start timer and increments GET counts again, inflating numbers and mis-measuring latency.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - STORAGE_FILES_SCANNED - .with_label_values(&["gcs", "GET"]) - .inc(); - STORAGE_FILES_SCANNED_DATE - .with_label_values(&["gcs", "GET", &Utc::now().date_naive().to_string()]) - .inc();
♻️ Duplicate comments (7)
src/storage/gcs.rs (3)
559-577
: Consistency: count small-file PUTs in multipart path.In the <5MB branch you perform a single PUT but don’t bump files-scanned. Align with other PUT paths.
let put_start = Instant::now(); let result = self.client.put(location, data.into()).await; let put_elapsed = put_start.elapsed().as_secs_f64(); + STORAGE_FILES_SCANNED + .with_label_values(&["gcs", "PUT"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["gcs", "PUT", &Utc::now().date_naive().to_string()]) + .inc();
185-207
: Avoid panic and measure full GET latency (don’t unwrap body, record after bytes()).resp.bytes().await.unwrap() can panic, and you record latency before the body read. Record GET timing after the body is fully read; map body-read errors to status codes.
async fn _get_object(&self, path: &RelativePath) -> Result<Bytes, ObjectStorageError> { - let time = std::time::Instant::now(); - let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let time = std::time::Instant::now(); + let resp = self.client.get(&to_object_store_path(path)).await; STORAGE_FILES_SCANNED .with_label_values(&["gcs", "GET"]) .inc(); STORAGE_FILES_SCANNED_DATE .with_label_values(&["gcs", "GET", &Utc::now().date_naive().to_string()]) .inc(); match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(elapsed); - Ok(body) - } + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }, Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", status_code]) - .observe(elapsed); + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); Err(err.into()) } } }
256-276
: Count deletes on success only; move increment after successful delete.files_deleted is incremented before attempting the delete; this overcounts when deletes fail. Increment only on Ok(_).
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed);src/storage/azure_blob.rs (3)
594-617
: Consistency: count small-file PUTs in multipart path.Add STORAGE_FILES_SCANNED increments for the single PUT path to match other PUT flows.
let put_start = Instant::now(); let result = self.client.put(location, data.into()).await; let put_elapsed = put_start.elapsed().as_secs_f64(); + STORAGE_FILES_SCANNED + .with_label_values(&["azure_blob", "PUT"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["azure_blob", "PUT", &Utc::now().date_naive().to_string()]) + .inc();
220-244
: Avoid panic and measure full GET latency (don’t unwrap body, record after bytes()).Same issue as GCS: unwrap can panic; also you measure elapsed before reading the body. Record GET timing after bytes() and handle errors.
async fn _get_object(&self, path: &RelativePath) -> Result<Bytes, ObjectStorageError> { - let time = std::time::Instant::now(); - let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let time = std::time::Instant::now(); + let resp = self.client.get(&to_object_store_path(path)).await; STORAGE_FILES_SCANNED .with_label_values(&["azure_blob", "GET"]) .inc(); STORAGE_FILES_SCANNED_DATE .with_label_values(&["azure_blob", "GET", &Utc::now().date_naive().to_string()]) .inc(); - match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); - Ok(body) - } + match resp { + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }, Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", status_code]) - .observe(elapsed); + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); Err(err.into()) } } }
292-312
: Count deletes on success only; move increment after successful delete.files_deleted is incremented before the delete attempt. Increment only when delete returns Ok(_).
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "DELETE", "200"]) .observe(delete_elapsed);src/storage/s3.rs (1)
827-835
: HEAD “files scanned” double-count from previous review is now resolvedIncrement occurs exactly once per HEAD attempt and not again in the Ok branch. This aligns with the guidance in the prior review comment.
Also applies to: 865-874
🧹 Nitpick comments (17)
src/query/stream_schema_provider.rs (7)
327-401
: Avoid over-partitioning when the file list is smallUsing all CPUs for partition fan-out can create a lot of empty file groups when there are only a few files, with minor scheduling overhead. Cap partitions by the number of files while keeping a floor of 1.
Apply this diff:
- let target_partition: usize = num_cpus::get(); + let cpu = num_cpus::get(); + // Avoid many empty partitions when the manifest file list is small + let target_partition: usize = std::cmp::max(1, std::cmp::min(cpu, manifest_files.len()));
474-516
: Prefer preserving the original storage error instead of stringifying itToday get_object_store_format errors are mapped to DataFusionError::Plan with err.to_string(). Consider carrying the source error to keep context (provider, status, etc.) intact.
Apply this diff:
- .await - .map_err(|err| DataFusionError::Plan(err.to_string()))?; + .await + .map_err(|err| DataFusionError::External(Box::new(err)))?;
520-536
: Nit: fix variable name typo for readability (listing_time_fiters → listing_time_filters)Small readability tweak; avoids future confusion.
Apply this diff:
- if is_overlapping_query(&merged_snapshot.manifest_list, &time_filters) { - let listing_time_fiters = - return_listing_time_filters(&merged_snapshot.manifest_list, &mut time_filters); + if is_overlapping_query(&merged_snapshot.manifest_list, &time_filters) { + let listing_time_filters = + return_listing_time_filters(&merged_snapshot.manifest_list, &mut time_filters); - if let Some(listing_time_filter) = listing_time_fiters { + if let Some(listing_time_filter) = listing_time_filters {
572-583
: Files-scanned counters: confirm intended semantics and avoid potential double-countingIncrementing STORAGE_FILES_SCANNED(_DATE) by the number of parquet files in the plan captures DataFusion-driven GETs that bypass the storage abstraction. That’s useful, but note:
- Planning may overcount if pruning/early-limit prevents opening some files.
- If we later wrap DataFusion’s object_store client for metrics, we may double count.
If the intent is “planned parquet files to scan,” consider documenting that in the metric help text and/or using a distinct method label (e.g., SCAN) to distinguish from storage-layer GET calls. Otherwise, this looks fine.
Would you like me to open a follow-up to align metric help/labels so dashboards don’t mix “planned scans” with “actual GETs”?
87-99
: Optional: Auto-create stream/schema from storage in Query/Prism modeBased on the retrieved learnings from PR #1185, when a table is not present in memory, Query/Prism modes should attempt to create the stream and schema from storage before returning None. This avoids false negatives when the process starts without preloaded stream metadata.
Using the pattern from the learnings:
// Pseudocode inside GlobalSchemaProvider::table(...) if !self.table_exist(name) && (PARSEABLE.options.mode == Mode::Query || PARSEABLE.options.mode == Mode::Prism) { if PARSEABLE.create_stream_and_schema_from_storage(name).await.unwrap_or(false) { // proceed to return the table as in the happy path } else { return Ok(None); } }I can send a concrete diff if you confirm create_stream_and_schema_from_storage is available here.
Note: This suggestion leverages the “retrieved learnings” provided for this repository.
495-513
: Query/Prism: merging snapshots from stream.json is fine; consider error/metrics postureYou’re permissive on errors (ignore failures to read/parse stream.json) which favors availability. If you want observability, at minimum log debug/warn with the failing key and error cause. Also, if the storage layer doesn’t already emit metrics for get_objects here, consider relying on it rather than adding counters locally, to avoid duplication.
I can add structured logs here with provider/path labels if you want to keep the “best effort” behavior but make failures visible in traces.
586-596
: Minor: pass time_partition by reference consistentlyA few call sites clone Option; passing as_ref()/cloned() is already mixed above. Not a blocker, just a reminder to keep it consistent for readability.
src/metrics/mod.rs (1)
163-173
: Nit: clarify help text to reflect “global totals across all streams”.Per the PR discussion and retrieved learnings, TOTAL_*_DATE metrics intentionally aggregate across all streams using labels ["format","date"]. To avoid future confusion with per-stream metrics, tweak the help text to explicitly say “across all streams.”
Opts::new( "total_events_ingested_date", - "total events ingested on a particular date", + "Total events ingested across all streams on a particular date", ) ... Opts::new( "total_events_ingested_size_date", - "Total events ingested size in bytes on a particular date", + "Total events ingested size in bytes across all streams on a particular date", ) ... Opts::new( "total_events_storage_size_date", - "Total events storage size in bytes on a particular date", + "Total events storage size in bytes across all streams on a particular date", )Also applies to: 175-185, 187-197
src/storage/metrics_layer.rs (1)
187-201
: Optional: consider counting “files scanned” in the layer for DataFusion code paths.Operations invoked via DataFusion use this MetricLayer, not the provider modules. If you want files-scanned to be complete “for all object store APIs,” add STORAGE_FILES_SCANNED/STORAGE_FILES_SCANNED_DATE increments here as well (mirroring provider modules). If intentional to keep counts only in provider code, feel free to ignore.
Also applies to: 203-217, 219-233, 235-253, 255-269, 271-285, 321-335, 337-351, 353-367, 369-399
src/storage/azure_blob.rs (1)
708-737
: HEAD “files scanned” should count attempts, not only successes.Currently HEAD increments STORAGE_FILES_SCANNED only on Ok. GCS HEAD increments regardless of outcome. For consistency and to match “files-scanned count for all object store APIs,” increment before match.
async fn head(&self, path: &RelativePath) -> Result<ObjectMeta, ObjectStorageError> { let head_start = Instant::now(); let result = self.client.head(&to_object_store_path(path)).await; let head_elapsed = head_start.elapsed().as_secs_f64(); - match &result { + // Count the attempted HEAD + STORAGE_FILES_SCANNED + .with_label_values(&["azure_blob", "HEAD"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["azure_blob", "HEAD", &Utc::now().date_naive().to_string()]) + .inc(); + match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "HEAD", "200"]) .observe(head_elapsed); - // Record single file accessed - STORAGE_FILES_SCANNED - .with_label_values(&["azure_blob", "HEAD"]) - .inc(); - STORAGE_FILES_SCANNED_DATE - .with_label_values(&[ - "azure_blob", - "HEAD", - &Utc::now().date_naive().to_string(), - ]) - .inc(); }src/storage/localfs.rs (2)
404-426
: Optional: count PUT attempts regardless of outcome for consistency.Other providers increment STORAGE_FILES_SCANNED for PUT before the result. Here it’s only on Ok. Consider moving the increment above the match.
let res = fs::write(path, resource).await; let put_elapsed = put_start.elapsed().as_secs_f64(); + // Count attempted PUT + STORAGE_FILES_SCANNED + .with_label_values(&["localfs", "PUT"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["localfs", "PUT", &Utc::now().date_naive().to_string()]) + .inc(); match &res { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "PUT", "200"]) .observe(put_elapsed); - // Record single file written successfully - STORAGE_FILES_SCANNED - .with_label_values(&["localfs", "PUT"]) - .inc(); - STORAGE_FILES_SCANNED_DATE - .with_label_values(&["localfs", "PUT", &Utc::now().date_naive().to_string()]) - .inc(); }
838-867
: Avoid blocking the async runtime when copying large files.fs_extra::file::copy is blocking and can stall the Tokio executor under load. Wrap it in spawn_blocking to avoid starving other tasks.
- let result = fs_extra::file::copy(path, to_path, &op); + let to_path_clone = to_path.clone(); + let op_clone = op.clone(); + let result = tokio::task::spawn_blocking(move || { + fs_extra::file::copy(path, to_path_clone, &op_clone) + }) + .await + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?;src/storage/s3.rs (3)
1076-1096
: Count DELETE attempts, not only successes, for consistent “files scanned” semanticsElsewhere (GET/HEAD), you increment
STORAGE_FILES_SCANNED
before knowing the outcome, i.e., per attempt. Here it increments only on success. Move the counters before the match so DELETE aligns with the “attempts” model from the PR objectives.Apply this diff:
- match &result { - Ok(_) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "DELETE", "200"]) - .observe(delete_elapsed); - // Record single file deleted - STORAGE_FILES_SCANNED - .with_label_values(&["s3", "DELETE"]) - .inc(); - STORAGE_FILES_SCANNED_DATE - .with_label_values(&["s3", "DELETE", &Utc::now().date_naive().to_string()]) - .inc(); - } + // Count the DELETE attempt regardless of outcome + STORAGE_FILES_SCANNED + .with_label_values(&["s3", "DELETE"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["s3", "DELETE", &Utc::now().date_naive().to_string()]) + .inc(); + match &result { + Ok(_) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "DELETE", "200"]) + .observe(delete_elapsed); + }
694-713
: Multipart uploads: confirm whether files-scanned should be recorded for PUT_MULTIPART and COMPLETESmall-file fallback increments
PUT
counters, but true multipart flows record only latency (no files-scanned). If the intent is “files-scanned count for all object store APIs,” add a single increment forPUT_MULTIPART
initiation and one forPUT_MULTIPART_COMPLETE
(attempts, not per-part).Suggested adjustments:
let async_writer = self.client.put_multipart(location).await; let multipart_elapsed = multipart_start.elapsed().as_secs_f64(); + // Count the multipart initiation attempt + STORAGE_FILES_SCANNED + .with_label_values(&["s3", "PUT_MULTIPART"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["s3", "PUT_MULTIPART", &Utc::now().date_naive().to_string()]) + .inc(); @@ - if let Err(err) = complete_result { + // Count the multipart completion attempt + STORAGE_FILES_SCANNED + .with_label_values(&["s3", "PUT_MULTIPART_COMPLETE"]) + .inc(); + STORAGE_FILES_SCANNED_DATE + .with_label_values(&["s3", "PUT_MULTIPART_COMPLETE", &Utc::now().date_naive().to_string()]) + .inc(); + if let Err(err) = complete_result { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "PUT_MULTIPART_COMPLETE", status_code]) .observe(complete_elapsed);Note: I’m intentionally not incrementing for each
PUT_MULTIPART_PART
to avoid inflating counts by part numbers. If you do want that, add a single increment per part beside eachput_part
.Also applies to: 797-813
347-350
: Unify Instant usage; prefer the importedInstant
over fully qualifiedstd::time::Instant
Minor consistency/readability nit: the file already imports
time::Instant
. Use it everywhere.Apply this diff:
- let time = std::time::Instant::now(); + let time = Instant::now(); @@ - let time = std::time::Instant::now(); + let time = Instant::now();Also applies to: 379-382
src/metrics/storage.rs (2)
25-32
: Tune histogram buckets for storage latenciesDefault buckets can be skewed for S3-like latencies. Consider custom buckets (e.g., sub-10ms to tens of seconds) to capture both cache hits and cold-path operations better.
Example:
pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) + { + let opts = HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + .namespace(METRICS_NAMESPACE) + .buckets(vec![ + 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, + 1.0, 2.0, 5.0, 10.0, 20.0, 30.0, + ]); + HistogramVec::new(opts, &["provider", "method", "status"]) + } .expect("metric can be created") });
63-83
: DRY: centralize metric registration to remove duplicated code across providersAll four impls register the same three collectors. Factor this into a helper to reduce duplication and keep future changes in one place.
One way:
pub trait StorageMetrics { fn register_metrics(&self, handler: &PrometheusMetrics); } +fn register_storage_metrics(handler: &PrometheusMetrics) { + handler + .registry + .register(Box::new(STORAGE_REQUEST_RESPONSE_TIME.clone())) + .expect("metric can be registered"); + handler + .registry + .register(Box::new(STORAGE_FILES_SCANNED.clone())) + .expect("metric can be registered"); + handler + .registry + .register(Box::new(STORAGE_FILES_SCANNED_DATE.clone())) + .expect("metric can be registered"); +} @@ impl StorageMetrics for FSConfig { fn register_metrics(&self, handler: &actix_web_prometheus::PrometheusMetrics) { - handler - .registry - .register(Box::new(STORAGE_REQUEST_RESPONSE_TIME.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED_DATE.clone())) - .expect("metric can be registered"); + register_storage_metrics(handler); } } @@ impl StorageMetrics for S3Config { fn register_metrics(&self, handler: &actix_web_prometheus::PrometheusMetrics) { - handler - .registry - .register(Box::new(STORAGE_REQUEST_RESPONSE_TIME.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED_DATE.clone())) - .expect("metric can be registered"); + register_storage_metrics(handler); } } @@ impl StorageMetrics for AzureBlobConfig { fn register_metrics(&self, handler: &actix_web_prometheus::PrometheusMetrics) { - handler - .registry - .register(Box::new(STORAGE_REQUEST_RESPONSE_TIME.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED_DATE.clone())) - .expect("metric can be registered"); + register_storage_metrics(handler); } } @@ impl StorageMetrics for GcsConfig { fn register_metrics(&self, handler: &actix_web_prometheus::PrometheusMetrics) { - handler - .registry - .register(Box::new(STORAGE_REQUEST_RESPONSE_TIME.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED.clone())) - .expect("metric can be registered"); - handler - .registry - .register(Box::new(STORAGE_FILES_SCANNED_DATE.clone())) - .expect("metric can be registered"); + register_storage_metrics(handler); } }Note: Per the maintainer’s note and learnings, only one provider is initialized per process, so there’s no multi-registration panic risk; this change is purely to reduce duplication.
Also applies to: 86-107, 109-130, 132-153
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
src/catalog/mod.rs
(1 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(6 hunks)src/metrics/storage.rs
(1 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/mod.rs
(1 hunks)src/query/stream_schema_provider.rs
(12 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(24 hunks)src/storage/gcs.rs
(23 hunks)src/storage/localfs.rs
(15 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(2 hunks)src/storage/s3.rs
(24 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- src/query/mod.rs
- src/catalog/mod.rs
- src/storage/object_storage.rs
- src/metadata.rs
- src/stats.rs
- src/query/listing_table_builder.rs
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-08-25T01:31:41.768Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.768Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/storage/metrics_layer.rs
src/metrics/mod.rs
src/storage/localfs.rs
src/storage/s3.rs
src/metrics/storage.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-25T01:32:25.937Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.937Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/gcs.rs
src/metrics/mod.rs
src/storage/localfs.rs
src/metrics/storage.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/localfs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-25T01:33:36.398Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/storage.rs:51-68
Timestamp: 2025-08-25T01:33:36.398Z
Learning: In the Parseable storage system, only one storage provider (localfs, s3, azureblob, or gcs) is meant to be initialized per process, which ensures that global metrics like STORAGE_REQUEST_RESPONSE_TIME and STORAGE_FILES_SCANNED are registered exactly once without risk of duplicate registration panics.
Applied to files:
src/metrics/storage.rs
🧬 Code graph analysis (6)
src/storage/gcs.rs (2)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/object_storage.rs (3)
parseable_json_path
(1281-1283)new
(82-91)head
(209-209)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(708-738)src/storage/gcs.rs (1)
head
(695-720)src/storage/s3.rs (3)
head
(865-890)from
(1434-1442)from
(1446-1448)
src/query/stream_schema_provider.rs (3)
src/parseable/mod.rs (4)
storage
(244-246)new
(146-158)serde_json
(300-300)serde_json
(306-306)src/enterprise/utils.rs (1)
collect_manifest_files
(159-182)src/storage/s3.rs (2)
from
(1434-1442)from
(1446-1448)
src/storage/localfs.rs (2)
src/storage/metrics_layer.rs (1)
copy
(337-351)src/storage/azure_blob.rs (3)
get_ingestor_meta_file_paths
(812-850)delete_object
(919-950)check
(952-981)
src/storage/s3.rs (3)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/localfs.rs (6)
new
(104-106)name
(76-78)from
(985-987)head
(136-150)get_ingestor_meta_file_paths
(190-242)check
(496-520)src/storage/azure_blob.rs (12)
resp
(347-353)resp
(477-483)resp
(1240-1245)resp
(1284-1289)_delete_prefix
(278-333)_list_streams
(335-400)name
(169-171)_list_dates
(402-443)_upload_file
(529-557)head
(708-738)get_ingestor_meta_file_paths
(812-850)check
(952-981)
src/storage/azure_blob.rs (3)
src/storage/metrics_layer.rs (3)
error_to_status_code
(40-66)new
(75-80)head
(255-269)src/storage/object_storage.rs (4)
parseable_json_path
(1281-1283)new
(82-91)name
(200-200)head
(209-209)src/storage/gcs.rs (9)
resp
(312-318)resp
(442-448)resp
(1206-1211)resp
(1246-1251)_delete_prefix
(243-298)_list_streams
(300-365)name
(130-132)_upload_file
(494-522)head
(695-720)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (6)
src/query/stream_schema_provider.rs (3)
19-67
: Imports and storage/metrics wiring look correct
- Moving to PathBuf/RelativePathBuf and the ObjectStorage abstraction is consistent with the PR objectives.
- TryFutureExt is used later (and_then chain), so no unused import.
- Metrics imports (STORAGE_FILES_SCANNED, STORAGE_FILES_SCANNED_DATE) align with the new counters.
284-301
: populate_via_listing now accepts ObjectStorage — good abstraction boundarySwitching ListingTableBuilder::populate_via_listing to take the storage layer rather than a raw object_store keeps the query layer decoupled and lets the storage layer own metrics and provider-specific logic. The TryFutureExt.and_then usage keeps the flow clean.
404-422
: Good: storage plumbed through collect_from_snapshot with proper error mappingPropagating ObjectStorageError via DataFusionError::External is the right choice here; this preserves the original error chain for diagnostics.
src/metrics/mod.rs (2)
268-276
: Registration of TOTAL_ metrics looks correct.*All three TOTAL_*_DATE metrics are registered in the custom registry; no gaps spotted.
127-137
: It looks like there are no remaining.set()
calls on those three metrics—so all callsites have been updated to useinc()
/inc_by()
.src/storage/metrics_layer.rs (1)
37-66
: Status mapping helper and provider label plumbing look solid.The error_to_status_code coverage is pragmatic and the provider field in MetricLayer enables per-provider partitioning without code duplication. No issues here.
b41cdb7
to
edc749a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/azure_blob.rs (1)
216-248
: Handle body read failures properly to avoid panic.The
resp.bytes().await.unwrap()
on line 228 can panic. The elapsed time should include the body read time, and errors should be handled gracefully.async fn _get_object(&self, path: &RelativePath) -> Result<Bytes, ObjectStorageError> { let time = std::time::Instant::now(); let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "azure_blob", "GET", &Utc::now().date_naive().to_string(), ); match resp { Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) + match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } } Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "GET", status_code]) .observe(elapsed); Err(err.into()) } } }src/storage/s3.rs (1)
777-831
: Multipart upload reads entire file into memory; stream parts to avoid OOMFor large objects this can exhaust memory. Read and upload parts in a loop without
read_to_end
.Apply this diff:
- let mut data = Vec::new(); - file.read_to_end(&mut data).await?; - - // let mut upload_parts = Vec::new(); - - let has_final_partial_part = total_size % MIN_MULTIPART_UPLOAD_SIZE > 0; - let num_full_parts = total_size / MIN_MULTIPART_UPLOAD_SIZE; - let total_parts = num_full_parts + if has_final_partial_part { 1 } else { 0 }; - - // Upload each part with metrics - for part_number in 0..(total_parts) { - let start_pos = part_number * MIN_MULTIPART_UPLOAD_SIZE; - let end_pos = if part_number == num_full_parts && has_final_partial_part { - // Last part might be smaller than 5MB (which is allowed) - total_size - } else { - // All other parts must be at least 5MB - start_pos + MIN_MULTIPART_UPLOAD_SIZE - }; - - // Extract this part's data - let part_data = data[start_pos..end_pos].to_vec(); + // Upload each part with metrics (streamed) + let mut remaining = total_size; + let mut buf = vec![0u8; MIN_MULTIPART_UPLOAD_SIZE]; + while remaining > 0 { + let read_len = remaining.min(MIN_MULTIPART_UPLOAD_SIZE); + buf.resize(read_len, 0); + file.read_exact(&mut buf[..read_len]).await?; + let part_data = buf[..read_len].to_vec(); // Track individual part upload let part_start = Instant::now(); let result = async_writer.put_part(part_data.into()).await; let part_elapsed = part_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "s3", "PUT_MULTIPART", &Utc::now().date_naive().to_string(), ); match result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "PUT_MULTIPART", "200"]) .observe(part_elapsed); increment_files_scanned_in_object_store_calls_by_date( "s3", "PUT_MULTIPART", 1, &Utc::now().date_naive().to_string(), ); } Err(err) => { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "PUT_MULTIPART", status_code]) .observe(part_elapsed); return Err(err.into()); } } - - // upload_parts.push(part_number as u64 + 1); + remaining -= read_len; }
♻️ Duplicate comments (7)
src/query/listing_table_builder.rs (1)
90-111
: Path construction loses parent prefix - results in invalid query paths.The implementation pushes only child names from
list_dirs_relative
but loses the parent prefix, which will produce invalid URIs whenquery_prefixes
tries to construct paths.Apply this diff to preserve the full path:
- // Use storage.list_dirs_relative for all prefixes and flatten results - let mut listing = Vec::new(); - for prefix in prefixes { - match storage.list_dirs_relative(&prefix).await { - Ok(paths) => { - listing.extend(paths.into_iter().map(|p| p.to_string())); - } - Err(e) => { - return Err(DataFusionError::External(Box::new(e))); - } - } - } + // Use storage.list_dirs_relative for all prefixes and flatten results, + // preserving the full path + let mut listing = Vec::new(); + for prefix in prefixes { + let base = prefix.as_str(); + match storage.list_dirs_relative(&prefix).await { + Ok(children) => { + listing.extend(children.into_iter().map(|child| { + format!("{}/{}", base, child) + })); + } + Err(e) => { + return Err(DataFusionError::External(Box::new(e))); + } + } + }src/storage/object_storage.rs (1)
150-154
: Avoid failing the upload flow on metadata errors.If
metadata()
fails after a successful upload, the entire flow fails, leaving the object uploaded but no manifest created. Metrics should be best-effort.- let compressed_size = path - .metadata() - .map(|m| m.len()) - .map_err(|e| ObjectStorageError::Custom(format!("metadata failed for {filename}: {e}")))?; + let compressed_size = match path.metadata() { + Ok(meta) => meta.len(), + Err(e) => { + tracing::warn!("Failed to get file metadata for {}: {}, using 0 for metrics", filename, e); + 0 + } + };src/storage/azure_blob.rs (2)
852-866
: Remove duplicate GET metrics from get_objects loop.The loop records GET metrics that are already handled by the
get_object()
call. This causes double-counting.let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - ); res.push(byts);
647-675
: Track PUT metrics for small file uploads in multipart path.When
total_size < MIN_MULTIPART_UPLOAD_SIZE
, a plain PUT is performed but files scanned metrics aren't incremented consistently.The current implementation already includes the metrics at lines 661-666, so this is correctly handled.
src/storage/gcs.rs (2)
262-262
: Move file deletion counter to success path only.
files_deleted
is incremented before the delete attempt, which causes overcounting when deletes fail.Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "gcs", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); } Err(err) => { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", status_code]) .observe(delete_elapsed); error!("Failed to delete object during delete stream: {:?}", err); } } }
189-189
: Avoid panic on body read failure.Similar to Azure,
resp.bytes().await.unwrap()
can panic. Handle the error case properly.Apply the same fix pattern as suggested for Azure:
- let body: Bytes = resp.bytes().await.unwrap(); + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + // ... rest of success path + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }src/storage/s3.rs (1)
977-987
: Remove duplicate GET metrics in get_objects(); rely on _get_object()This double-counts GETs and uses LIST latency for GET. Drop this block;
_get_object()
already emits GET metrics and file-scanned counts.Apply this diff:
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
🧹 Nitpick comments (13)
src/metrics/mod.rs (4)
161-195
: Clarify help text: explicitly say “global totals across all streams”Small wording tweak avoids confusion with the new billing counters and the per-stream counters.
- "total events ingested on a particular date", + "Global total events ingested across all streams on a date", ... - "Total events ingested size in bytes on a particular date", + "Global total ingested size (bytes) across all streams on a date", ... - "Total events storage size in bytes on a particular date", + "Global total storage size (bytes) across all streams on a date",
339-346
: Latency histogram without buckets — add SLO-aligned bucketsDefaults can be too coarse. Suggest buckets like [5ms..5s] exponential.
- HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), + HistogramVec::new( + HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + .namespace(METRICS_NAMESPACE) + .buckets(vec![ + 0.005, 0.01, 0.025, 0.05, 0.1, + 0.25, 0.5, 1.0, 2.5, 5.0 + ]), &["provider", "method", "status"],
498-555
: Helpers take &str dates — consider accepting NaiveDate to centralize formattingOptional ergonomics: accept chrono::NaiveDate and format inside the helpers to prevent inconsistent date formatting across callsites.
229-347
: Rename or clarify BY_DATE counter names: All new*_BY_DATE
counters have live callsites. Consider renaming them tousage_*_by_date
or expanding their help strings to note “cumulative counter” vs “current total gauge.”src/handlers/http/query.rs (1)
347-350
: Align counts() with query(): move increment after auth checkCurrently counts() increments before authorization, unlike query(). This will count unauthorized requests and diverge semantics.
- // Track billing metrics for query calls - let current_date = chrono::Utc::now().date_naive().to_string(); - increment_query_calls_by_date(¤t_date); - let creds = extract_session_key_from_req(&req)?; let permissions = Users.get_permissions(&creds); + // Track billing metrics for query calls (after auth, consistent with query()) + let current_date = chrono::Utc::now().date_naive().to_string(); + increment_query_calls_by_date(¤t_date);src/stats.rs (2)
122-140
: Remove per-date counter series once per date, not per manifestYou call remove_label_values for the same [stream,format,date] inside the per-manifest loop. It’s harmless but redundant. Collect distinct dates first and remove once.
- if !manifests.is_empty() { - for manifest in manifests { - let manifest_date = manifest.time_lower_bound.date_naive().to_string(); - let _ = - EVENTS_INGESTED_DATE.remove_label_values(&[stream_name, "json", &manifest_date]); - let _ = EVENTS_INGESTED_SIZE_DATE.remove_label_values(&[ - stream_name, - "json", - &manifest_date, - ]); - let _ = EVENTS_STORAGE_SIZE_DATE.remove_label_values(&[ - "data", - stream_name, - "parquet", - &manifest_date, - ]); + if !manifests.is_empty() { + let mut dates_seen = std::collections::HashSet::new(); + for manifest in &manifests { + dates_seen.insert(manifest.time_lower_bound.date_naive().to_string()); + } + for d in &dates_seen { + let _ = EVENTS_INGESTED_DATE.remove_label_values(&[stream_name, "json", d]); + let _ = EVENTS_INGESTED_SIZE_DATE.remove_label_values(&[stream_name, "json", d]); + let _ = EVENTS_STORAGE_SIZE_DATE.remove_label_values(&["data", stream_name, "parquet", d]); + } + for manifest in manifests { + let manifest_date = manifest.time_lower_bound.date_naive().to_string();
211-227
: Safer matching by label keys, not only values, to avoid false positivesdelete_with_label_prefix matches any labels whose values happen to equal the prefix values. Rare, but stream names could collide with a date string or “json/parquet.” Prefer key+value matching.
-fn delete_with_label_prefix(metrics: &IntCounterVec, prefix: &[&str]) { +fn delete_with_label_prefix(metrics: &IntCounterVec, prefix: &[(&str, &str)]) { @@ - // Check if all prefix elements are present in label values - let all_prefixes_found = prefix.iter().all(|p| label_map.values().any(|v| v == p)); + // Check if all (key,value) pairs match + let all_prefixes_found = prefix.iter().all(|(k, v)| label_map.get(k).copied() == Some(*v));Callers would pass e.g.:
delete_with_label_prefix(&EVENTS_INGESTED_DATE, &[("stream", stream_name), ("format", format)]);src/query/stream_schema_provider.rs (1)
329-349
: Consider extracting billing metrics to reduce coupling.The billing metrics logic (lines 349-350, 408-411) is tightly coupled with the partitioning logic. Consider extracting this to a separate method for better separation of concerns.
fn partitioned_files( &self, manifest_files: Vec<File>, ) -> (Vec<Vec<PartitionedFile>>, datafusion::common::Statistics) { let target_partition: usize = num_cpus::get(); let mut partitioned_files = Vec::from_iter((0..target_partition).map(|_| Vec::new())); let mut column_statistics = HashMap::<String, Option<TypedStatistics>>::new(); let mut count = 0; - let mut total_file_size = 0u64; - let mut file_count = 0u64; + let mut billing_tracker = BillingTracker::new(); + for (index, file) in manifest_files .into_iter() .enumerate() .map(|(x, y)| (x % target_partition, y)) { #[allow(unused_mut)] let File { mut file_path, num_rows, columns, file_size, .. } = file; - // Track billing metrics for files scanned in query - file_count += 1; - total_file_size += file_size; + billing_tracker.record_file(file_size); // ... rest of the method } - // Track billing metrics for query scan - let current_date = chrono::Utc::now().date_naive().to_string(); - increment_files_scanned_in_query_by_date(file_count, ¤t_date); - increment_bytes_scanned_in_query_by_date(total_file_size, ¤t_date); + billing_tracker.submit_metrics(); (partitioned_files, statistics) } +struct BillingTracker { + file_count: u64, + total_file_size: u64, +} + +impl BillingTracker { + fn new() -> Self { + Self { file_count: 0, total_file_size: 0 } + } + + fn record_file(&mut self, file_size: u64) { + self.file_count += 1; + self.total_file_size += file_size; + } + + fn submit_metrics(&self) { + let current_date = chrono::Utc::now().date_naive().to_string(); + increment_files_scanned_in_query_by_date(self.file_count, ¤t_date); + increment_bytes_scanned_in_query_by_date(self.total_file_size, ¤t_date); + } +}src/storage/localfs.rs (1)
364-381
: Consider using a single error label for file read failures.Currently using status 404 for all file read errors in get_objects (line 377), but some errors might be permission-related (403) or other I/O errors (500).
Err(err) => { + let status_code = match err.kind() { + std::io::ErrorKind::NotFound => "404", + std::io::ErrorKind::PermissionDenied => "403", + _ => "500", + }; STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["localfs", "GET", "404"]) + .with_label_values(&["localfs", "GET", status_code]) .observe(file_read_elapsed); return Err(err.into()); }src/storage/s3.rs (4)
948-955
: Record LIST call count at start; avoid missing counts on early errorIf iteration errors out, the LIST call may never be counted. Increment at start and drop the trailing increment.
Apply this diff:
// Track list operation let list_start = Instant::now(); let mut list_stream = self.client.list(Some(&prefix)); + increment_object_store_calls_by_date("s3", "LIST", &Utc::now().date_naive().to_string()); @@ - increment_object_store_calls_by_date("s3", "LIST", &Utc::now().date_naive().to_string());Also applies to: 1001-1001
957-962
: Emit LIST failure metrics on streaming error pathsWhen
list_stream.next()
returnsErr
, we exit without observing LIST latency or status. Record it before returning.Apply this diff:
- Err(err) => { - return Err(err.into()); - } + Err(err) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "LIST", error_to_status_code(&err)]) + .observe(list_start.elapsed().as_secs_f64()); + return Err(err.into()); + }
343-343
: Use the importedInstant
consistentlyMinor style: you already
use std::time::Instant;
. PreferInstant::now()
overstd::time::Instant::now()
.Apply this diff:
- let time = std::time::Instant::now(); + let time = Instant::now();- let time = std::time::Instant::now(); + let time = Instant::now();Also applies to: 379-379
347-360
: Capture date once per operation to avoid cross-midnight label splitsMultiple
Utc::now()
calls within an operation can cross midnight and produce inconsistent per-date labels. Compute once and reuse.Illustrative pattern:
- increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string()); + let date = Utc::now().date_naive().to_string(); + increment_object_store_calls_by_date("s3", "GET", &date); ... - increment_files_scanned_in_object_store_calls_by_date("s3", "GET", 1, &Utc::now().date_naive().to_string()); + increment_files_scanned_in_object_store_calls_by_date("s3", "GET", 1, &date);Please apply similarly across methods that emit multiple metrics within one logical operation.
Also applies to: 382-393, 500-505, 550-566, 665-671, 682-693, 714-738, 804-820, 847-850, 868-879, 909-927, 989-999, 1031-1041, 1073-1090, 1116-1128, 1150-1160, 1219-1223, 1271-1276, 1297-1307, 1337-1347, 1406-1418, 1447-1459
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
src/catalog/mod.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/mod.rs
(1 hunks)src/query/stream_schema_provider.rs
(13 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(23 hunks)src/storage/gcs.rs
(23 hunks)src/storage/localfs.rs
(15 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(23 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- src/query/mod.rs
- src/catalog/mod.rs
- src/metadata.rs
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/event/mod.rs
src/storage/s3.rs
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/metrics_layer.rs
src/storage/object_storage.rs
src/metrics/mod.rs
src/stats.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/event/mod.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/metrics/mod.rs
src/stats.rs
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
Applied to files:
src/handlers/http/query.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/s3.rs
src/storage/azure_blob.rs
src/storage/gcs.rs
src/storage/object_storage.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/s3.rs
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/gcs.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/s3.rs
src/storage/azure_blob.rs
src/storage/gcs.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/stream_schema_provider.rs
🧬 Code graph analysis (9)
src/event/mod.rs (1)
src/metrics/mod.rs (2)
increment_events_ingested_by_date
(499-503)increment_events_ingested_size_by_date
(505-509)
src/handlers/http/query.rs (1)
src/metrics/mod.rs (1)
increment_query_calls_by_date
(523-525)
src/storage/s3.rs (4)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/localfs.rs (7)
new
(100-102)name
(76-78)from
(998-1000)head
(132-147)get_ingestor_meta_file_paths
(188-240)get_stream_file_paths
(242-311)check
(504-529)src/storage/azure_blob.rs (13)
resp
(373-379)resp
(517-523)resp
(1353-1358)resp
(1398-1403)_delete_prefix
(287-354)_list_streams
(356-430)name
(169-171)_list_dates
(432-478)_upload_file
(572-605)head
(775-806)get_ingestor_meta_file_paths
(888-930)get_stream_file_paths
(932-983)check
(1036-1070)
src/storage/localfs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (1)
copy
(333-347)src/storage/azure_blob.rs (1)
get_ingestor_meta_file_paths
(888-930)
src/storage/azure_blob.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/localfs.rs (5)
new
(100-102)name
(76-78)from
(998-1000)head
(132-147)get_ingestor_meta_file_paths
(188-240)src/storage/gcs.rs (9)
resp
(322-328)resp
(458-464)resp
(1265-1270)resp
(1306-1311)_delete_prefix
(244-307)_list_streams
(309-379)name
(130-132)head
(735-762)get_ingestor_meta_file_paths
(840-878)src/storage/s3.rs (11)
resp
(484-490)resp
(620-626)resp
(1430-1435)resp
(1471-1476)_delete_prefix
(406-469)_list_streams
(471-541)name
(301-303)from
(1485-1493)from
(1497-1499)head
(904-931)get_ingestor_meta_file_paths
(1005-1043)
src/storage/gcs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/azure_blob.rs (9)
resp
(373-379)resp
(517-523)resp
(1353-1358)resp
(1398-1403)_delete_prefix
(287-354)_list_streams
(356-430)name
(169-171)_upload_file
(572-605)head
(775-806)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(775-806)src/storage/gcs.rs (1)
head
(735-762)src/storage/s3.rs (3)
head
(904-931)from
(1485-1493)from
(1497-1499)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(511-515)increment_parquets_stored_size_by_date
(517-521)
src/query/stream_schema_provider.rs (3)
src/metrics/mod.rs (4)
increment_bytes_scanned_in_query_by_date
(533-537)increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_files_scanned_in_query_by_date
(527-531)get
(558-560)src/parseable/mod.rs (4)
storage
(244-246)new
(146-158)serde_json
(300-300)serde_json
(306-306)src/enterprise/utils.rs (1)
collect_manifest_files
(159-182)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (13)
src/metrics/mod.rs (1)
125-159
: No.set()
or.sub()
calls remain on per-date counters; all useinc()
,inc_by()
, oradd()
.src/event/mod.rs (1)
31-32
: LGTM: ingestion billing counters emitted once per event batchDate derivation and increments look correct and are after stats update. No issues.
Also applies to: 92-96
src/handlers/http/query.rs (1)
125-128
: LGTM: count query calls after auth in main query pathPlacement after user_auth_for_datasets avoids counting unauthorized attempts. Good.
src/stats.rs (1)
144-153
: LGTM: adjusting global daily totals when deleting manifestsSubtracting from TOTAL_*_DATE gauges keeps “current totals by date” accurate post-deletion.
src/query/stream_schema_provider.rs (4)
19-19
: Good: Using PathBuf for robust path handling.The addition of
PathBuf
to the imports enhances cross-platform path handling, which is essential for the storage abstraction layer.
60-63
: Well-structured metrics imports for comprehensive observability.The new metric imports (
increment_bytes_scanned_in_query_by_date
,increment_files_scanned_in_object_store_calls_by_date
,increment_files_scanned_in_query_by_date
) provide excellent granularity for tracking query performance and billing metrics.
90-96
: LGTM: Clean removal of URL field from StandardTableProvider.The removal of the URL field simplifies the table provider structure and aligns well with the storage abstraction changes.
862-885
: Use bounded concurrency and graceful error handling in collect_manifest_files
Implementation now limits concurrent fetches with buffer_unordered, replaces all expect() calls with proper ObjectStorageError mapping (PathError/UnhandledError), and logs per-URL failures—eliminating unbounded task spawning and panics.src/storage/object_storage.rs (1)
163-169
: Excellent addition of comprehensive storage metrics.The new TOTAL_EVENTS_STORAGE_SIZE_DATE metric and billing metrics (increment_parquets_stored_by_date, increment_parquets_stored_size_by_date) provide valuable insights into storage patterns per date.
src/storage/metrics_layer.rs (2)
35-62
: Well-designed error-to-status mapping utility.The
error_to_status_code
function provides a clean, centralized mapping of object_store errors to HTTP-like status codes. This is excellent for consistent metric labeling across all storage operations.
93-106
: Excellent pattern for capturing operation results before observing metrics.The pattern of capturing the result first, then determining the status, and finally observing the metric ensures accurate status labeling. This is consistently applied across all operations.
src/storage/localfs.rs (2)
133-146
: Good defensive metrics recording for unimplemented operations.Recording metrics even for unimplemented HEAD operations ensures consistent observability and helps track attempted operations.
452-460
: Comprehensive error status mapping for file operations.The granular error mapping (404 for NotFound, 403 for PermissionDenied, 500 for others) provides excellent visibility into failure modes.
efad408
to
83be66f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/storage/object_storage.rs (1)
123-139
: Don’t fail the upload after a successful object put; move metrics after manifest creation and make them best-effortIf metrics update fails here, we return Err before creating the manifest, leaving the object uploaded without a manifest (inconsistent state) and aborting the batch. Move metrics below manifest creation and swallow metrics errors.
- // Update storage metrics - update_storage_metrics(&path, &stream_name, filename)?; - - // Create manifest entry + // Create manifest entry let absolute_path = store @@ let manifest = catalog::create_from_parquet_file(absolute_path, &path)?; + // Best-effort: update storage metrics after manifest creation + if let Err(e) = update_storage_metrics(&path, &stream_name, filename) { + warn!("update_storage_metrics failed for {}: {}", filename, e); + }src/storage/metrics_layer.rs (1)
398-421
: Missing provider label in StreamMetricWrapper.The
StreamMetricWrapper
still uses hard-coded labels without the provider context. It should incorporate the provider label for consistency.Apply this diff to add provider-aware labeling:
-struct StreamMetricWrapper<'a, const N: usize, T> { +struct StreamMetricWrapper<'a, T> { time: time::Instant, - labels: [&'static str; N], + provider: String, + operation: &'static str, + status: &'static str, inner: BoxStream<'a, T>, } -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; fn poll_next( mut self: std::pin::Pin<&mut Self>, cx: &mut Context<'_>, ) -> Poll<Option<Self::Item>> { match self.inner.poll_next_unpin(cx) { t @ Poll::Ready(None) => { STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) + .with_label_values(&[&self.provider, self.operation, self.status]) .observe(self.time.elapsed().as_secs_f64()); t } t => t, } } }
♻️ Duplicate comments (8)
src/storage/object_storage.rs (1)
148-153
: Avoid panics and make metadata failures non-fatalIndexing
[1]
onsplit('=')
can panic on malformed names, and propagatingmetadata()
errors aborts the whole upload flow.- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; - let compressed_size = path - .metadata() - .map(|m| m.len()) - .map_err(|e| ObjectStorageError::Custom(format!("metadata failed for {filename}: {e}")))?; + let mut file_date_part = match filename.split('.').next() + .and_then(|s| s.split('=').nth(1)) { + Some(d) => d, + None => { + warn!("cannot extract date from filename: {filename}; skipping size/date metrics"); + return Ok(()); + } + }; + let compressed_size = match path.metadata().map(|m| m.len()) { + Ok(len) => len, + Err(e) => { + warn!("metadata() failed for {filename}: {e}; skipping size-based metrics"); + 0 + } + };src/storage/azure_blob.rs (4)
215-248
: Potential panic on body read - handle error properly.The code still uses
resp.bytes().await.unwrap()
which can panic. This needs proper error handling.Apply this diff:
match resp { Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } }
287-354
: Files deleted counter increments before delete attempt.The counter increments before the delete operation completes successfully. Move it inside the success branch.
Apply this diff:
match x { Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "azure_blob", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "DELETE", "200"]) .observe(delete_elapsed); }
645-676
: Missing files-scanned increment for small file PUT.The small file PUT path doesn't increment the files-scanned counter on success.
Apply this diff after line 660:
Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "PUT", "200"]) .observe(put_elapsed); increment_files_scanned_in_object_store_calls_by_date( "azure_blob", "PUT", 1, &Utc::now().date_naive().to_string(), ); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", + "PUT", + 1, + &Utc::now().date_naive().to_string(), + ); }
852-866
: Remove duplicate GET metrics from get_objects loop.The loop records GET metrics that are already recorded by
get_object()
, causing double counting.Apply this diff:
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - ); res.push(byts);src/storage/gcs.rs (3)
180-209
: Handle body read errors properly to avoid panic.Same issue as Azure -
resp.bytes().await.unwrap()
can panic and needs proper error handling.Apply this diff:
match resp { Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "gcs", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } }
244-307
: Move files_deleted increment to success branch.Similar to Azure, the counter increments before confirming the delete succeeded.
Apply this diff:
match x { Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "gcs", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); }
808-822
: Remove duplicate GET metrics recording.The get_objects loop duplicates GET metrics already recorded by get_object().
Apply this diff:
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - ); res.push(byts);
🧹 Nitpick comments (5)
src/metrics/mod.rs (3)
161-195
: Clarify help text: explicitly state “across all streams” to avoid confusion with per-stream countersThese TOTAL_*_DATE gauges aggregate globally. Make the help text explicit to prevent future misinterpretation.
- "total events ingested on a particular date", + "Total events ingested across all streams on a particular date", @@ - "Total events ingested size in bytes on a particular date", + "Total events ingested size in bytes across all streams on a particular date", @@ - "Total events storage size in bytes on a particular date", + "Total events storage size in bytes across all streams on a particular date",
339-346
: Tune latency buckets for object store histogramsDefault buckets are suboptimal for storage APIs. Use exponential buckets (seconds) to capture 5ms–60s spread, and consider aligning on standard SRE buckets.
-use prometheus::{HistogramOpts, HistogramVec, IntCounterVec, IntGaugeVec, Opts, Registry}; +use prometheus::{HistogramOpts, HistogramVec, IntCounterVec, IntGaugeVec, Opts, Registry, exponential_buckets}; @@ - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) + HistogramVec::new( + HistogramOpts::new("storage_request_response_time", "Storage request latency (seconds)") + .namespace(METRICS_NAMESPACE) + .buckets(exponential_buckets(0.005, 2.0, 15).expect("buckets")), // 5ms .. ~163s + &["provider", "method", "status"], + )
229-337
: Metric naming consistency: “_date” vs “_by_date”New billing counters use “_by_date” while other date-scoped metrics use “_date”. Consider standardizing for easier discovery (or add a comment noting the naming convention difference).
src/metadata.rs (1)
71-73
: Typo: “compatability” → “compatibility”Minor doc fix.
-/// In order to support backward compatability with streams created before v1.6.4, +/// In order to support backward compatibility with streams created before v1.6.4,src/storage/metrics_layer.rs (1)
87-106
: Consider extracting the repeated pattern to reduce code duplication.Each method follows the same pattern: measure time, execute operation, determine status, record metric. Consider creating a helper method to reduce duplication across all 15+ methods.
Example helper method:
async fn execute_and_measure<F, R>( &self, operation: &str, fut: F, ) -> ObjectStoreResult<R> where F: Future<Output = ObjectStoreResult<R>>, { let time = time::Instant::now(); let result = fut.await; let elapsed = time.elapsed().as_secs_f64(); let status = match &result { Ok(_) => "200", Err(err) => error_to_status_code(err), }; STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&[&self.provider, operation, status]) .observe(elapsed); result }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
src/catalog/mod.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/mod.rs
(1 hunks)src/query/stream_schema_provider.rs
(13 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(23 hunks)src/storage/gcs.rs
(23 hunks)src/storage/localfs.rs
(15 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(23 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/modal/server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/catalog/mod.rs
- src/query/mod.rs
- src/stats.rs
- src/event/mod.rs
- src/handlers/http/query.rs
- src/query/listing_table_builder.rs
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/metrics_layer.rs
src/storage/object_storage.rs
src/metadata.rs
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/metadata.rs
src/metrics/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/object_storage.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/localfs.rs
📚 Learning: 2025-02-14T09:49:25.818Z
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Applied to files:
src/query/stream_schema_provider.rs
🧬 Code graph analysis (8)
src/storage/gcs.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/localfs.rs (5)
new
(100-102)name
(76-78)from
(998-1000)head
(132-147)get_ingestor_meta_file_paths
(188-240)src/storage/azure_blob.rs (11)
resp
(373-379)resp
(517-523)resp
(1353-1358)resp
(1398-1403)_delete_prefix
(287-354)_list_streams
(356-430)name
(169-171)_list_dates
(432-478)_upload_file
(572-605)head
(775-806)get_ingestor_meta_file_paths
(888-930)src/storage/s3.rs (13)
resp
(484-490)resp
(620-626)resp
(1430-1435)resp
(1471-1476)_delete_prefix
(406-469)_list_streams
(471-541)name
(301-303)_list_dates
(543-585)from
(1485-1493)from
(1497-1499)_upload_file
(675-704)head
(904-931)get_ingestor_meta_file_paths
(1005-1043)
src/storage/azure_blob.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/gcs.rs (13)
resp
(322-328)resp
(458-464)resp
(1265-1270)resp
(1306-1311)_delete_prefix
(244-307)_list_streams
(309-379)name
(130-132)_list_dates
(381-423)_upload_file
(513-542)head
(735-762)get_ingestor_meta_file_paths
(840-878)get_stream_file_paths
(880-927)check
(976-1006)
src/storage/s3.rs (4)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (4)
parseable_json_path
(1284-1286)new
(82-91)name
(203-203)head
(212-212)src/storage/azure_blob.rs (9)
resp
(373-379)resp
(517-523)resp
(1353-1358)resp
(1398-1403)_delete_prefix
(287-354)_list_streams
(356-430)name
(169-171)_upload_file
(572-605)head
(775-806)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(775-806)src/storage/gcs.rs (1)
head
(735-762)src/storage/s3.rs (3)
head
(904-931)from
(1485-1493)from
(1497-1499)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(511-515)increment_parquets_stored_size_by_date
(517-521)
src/query/stream_schema_provider.rs (2)
src/metrics/mod.rs (4)
increment_bytes_scanned_in_query_by_date
(533-537)increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_files_scanned_in_query_by_date
(527-531)get
(558-560)src/parseable/mod.rs (4)
storage
(244-246)new
(146-158)serde_json
(300-300)serde_json
(306-306)
src/metadata.rs (1)
src/catalog/mod.rs (4)
num_rows
(61-61)num_rows
(78-80)ingestion_size
(58-58)ingestion_size
(70-72)
src/storage/localfs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/metrics_layer.rs (1)
copy
(333-347)src/storage/azure_blob.rs (3)
get_ingestor_meta_file_paths
(888-930)delete_object
(1003-1034)check
(1036-1070)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (21)
src/metrics/mod.rs (1)
125-159
: Ensure single execution of load_daily_metrics
Grep shows one call in migration/mod.rs, but migration may run per-stream or per-tenant. Confirm it only runs once at startup (no timers or reloads); if not, guard it with a one-shot check or make it idempotent.src/metadata.rs (3)
49-56
: LGTM: per-stream date counters now use monotonic incrementsSwitch to
inc_by
with u64 aligns with counter semantics and avoids negative deltas.
63-68
: LGTM: global totals labeled by [format,date]Matches the intended global aggregation design from earlier discussion/learnings.
181-190
: Ensure load_daily_metrics warming is idempotent
load_daily_metrics
is only called atsrc/migration/mod.rs:374
, but if that path ever runs more than once in the same process, the counters will double-count on each warmup. Add a one-time cache keyed by (stream_name, manifest_date) or switch to using gauges for warmup and reserve counters for live ingestion.src/storage/object_storage.rs (1)
163-170
: LGTM: global storage total and billing incrementsThe new TOTAL_*_DATE aggregation and billing counters align with PR goals.
src/storage/metrics_layer.rs (2)
35-62
: LGTM! Well-structured error-to-status mapping.The
error_to_status_code
function provides good coverage of object store error types with appropriate HTTP status code mappings. The function is correctly marked as public to be used across storage implementations.
65-76
: LGTM! Provider-aware metrics layer properly configured.The addition of the
provider
field enables proper labeling of metrics by storage backend, which aligns with the PR objectives of tracking metrics per provider.src/storage/gcs.rs (2)
583-611
: LGTM! Proper metrics for small file PUT path.The small file PUT path correctly includes metrics tracking with files-scanned increment.
139-139
: LGTM! Consistent provider labeling.The MetricLayer is properly instantiated with "gcs" provider label for consistent metrics across the GCS backend.
src/storage/localfs.rs (4)
132-141
: Don’t count files-scanned for an unimplemented HEAD; emit a 501 timing and count the call once.HEAD isn’t implemented here, yet we increment “files scanned.” Drop that, and record a single HEAD attempt with status 501 and latency.
[ suggest_essential_refactor ]- async fn head(&self, _path: &RelativePath) -> Result<ObjectMeta, ObjectStorageError> { - // Record attempt to access file (even though operation not implemented) - increment_files_scanned_in_object_store_calls_by_date( - "localfs", - "HEAD", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("localfs", "HEAD", &Utc::now().date_naive().to_string()); - Err(ObjectStorageError::UnhandledError(Box::new( + async fn head(&self, _path: &RelativePath) -> Result<ObjectMeta, ObjectStorageError> { + let start = Instant::now(); + increment_object_store_calls_by_date("localfs", "HEAD", &Utc::now().date_naive().to_string()); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["localfs", "HEAD", "501"]) + .observe(start.elapsed().as_secs_f64()); + Err(ObjectStorageError::UnhandledError(Box::new( std::io::Error::new( std::io::ErrorKind::Unsupported, "Head operation not implemented for LocalFS yet", ), ))) }
151-169
: Count object-store calls on attempt, not only on success (missing/late increments).Several methods increment TOTAL_OBJECT_STORE_CALLS_BY_DATE only on success or after the operation finishes, skipping failed/early-return cases. Move the increment immediately after you start each operation to get accurate call counts.
[ suggest_essential_refactor ]Examples (apply across similar sites):
async fn get_object(&self, path: &RelativePath) -> Result<Bytes, ObjectStorageError> { let file_path = self.path_in_root(path); - let get_start = Instant::now(); + increment_object_store_calls_by_date("localfs", "GET", &Utc::now().date_naive().to_string()); let file_result = fs::read(file_path).await; let get_elapsed = get_start.elapsed().as_secs_f64(); @@ - increment_object_store_calls_by_date("localfs", "GET", &Utc::now().date_naive().to_string()); Ok(x.into())async fn put_object(&self, path: &RelativePath, resource: Bytes) -> Result<(), ObjectStorageError> { @@ - let res = fs::write(path, resource).await; + increment_object_store_calls_by_date("localfs", "PUT", &Utc::now().date_naive().to_string()); + let res = fs::write(path, resource).await; @@ - increment_object_store_calls_by_date("localfs", "PUT", &Utc::now().date_naive().to_string());async fn delete_object(&self, path: &RelativePath) -> Result<(), ObjectStorageError> { @@ - let result = tokio::fs::remove_file(path).await; + increment_object_store_calls_by_date("localfs", "DELETE", &Utc::now().date_naive().to_string()); + let result = tokio::fs::remove_file(path).await;Similarly add the increment at operation start for:
- delete_prefix
- check
- delete_stream
- try_delete_node_meta
- list_streams / list_old_streams / list_dirs / list_dirs_relative / list_dates (before matching on read_dir results)
- get_ingestor_meta_file_paths / get_stream_file_paths / get_objects (LIST) before calling read_dir.
Also applies to: 410-427, 441-461, 470-487, 509-526, 598-606, 645-654, 682-692, 733-741, 769-783, 391-395, 305-309, 232-239
206-211
: Map LIST/GET errors to correct status codes (404/403/500), not hardcoded 404.Use std::io::ErrorKind to label errors correctly for the histogram.
[ suggest_optional_refactor ]- Err(err) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["localfs", "LIST", "404"]) - .observe(list_elapsed); - return Err(err.into()); - } + Err(err) => { + let status = match err.kind() { + std::io::ErrorKind::NotFound => "404", + std::io::ErrorKind::PermissionDenied => "403", + _ => "500", + }; + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["localfs", "LIST", status]) + .observe(list_elapsed); + return Err(err.into()); + }- Err(err) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["localfs", "GET", "404"]) - .observe(file_read_elapsed); - return Err(err.into()); - } + Err(err) => { + let status = match err.kind() { + std::io::ErrorKind::NotFound => "404", + std::io::ErrorKind::PermissionDenied => "403", + _ => "500", + }; + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["localfs", "GET", status]) + .observe(file_read_elapsed); + return Err(err.into()); + }Also applies to: 264-269, 336-341, 376-380
805-840
: Add LIST metrics for list_hours/list_minutes for parity with other backends.S3/Azure record LIST timings and counts for hours/minutes; LocalFS should do the same for consistency.
[ suggest_optional_refactor ]src/storage/s3.rs (5)
343-361
: Fix panic and mislabeling on GET body read (resp.bytes().await.unwrap()).unwrap() can panic and will incorrectly record 200 on body-read failure. Handle the Result and map status via error_to_status_code.
[ raise_critical_issue ]- match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + match resp { + Ok(resp) => { + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "s3", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&object_store::Error::Generic { + store: "s3", + source: Box::new(err), + }); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + Err(ObjectStorageError::UnhandledError(Box::new( + std::io::Error::other("failed to read body"), + ))) + } + } + }
971-987
: Remove duplicate/incorrect GET instrumentation inside get_objects().GET metrics are already emitted by _get_object()/get_object(). This block double-counts and uses LIST latency for GET.
[ duplicate_comment ]
[ suggest_essential_refactor ]- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
714-739
: Separate multipart metrics: INIT vs PART vs COMPLETE for cleaner dashboards.Currently INIT and PART both use PUT_MULTIPART. Use distinct method labels to avoid mixing phases.
[ suggest_optional_refactor ]- increment_object_store_calls_by_date( - "s3", - "PUT_MULTIPART", - &Utc::now().date_naive().to_string(), - ); + increment_object_store_calls_by_date("s3", "PUT_MULTIPART_INIT", &Utc::now().date_naive().to_string()); @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", "200"]) + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "PUT_MULTIPART_INIT", "200"]) .observe(multipart_elapsed); @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", status_code]) + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "PUT_MULTIPART_INIT", status_code]) .observe(multipart_elapsed);- increment_object_store_calls_by_date( - "s3", - "PUT_MULTIPART", - &Utc::now().date_naive().to_string(), - ); + increment_object_store_calls_by_date("s3", "PUT_MULTIPART_PART", &Utc::now().date_naive().to_string()); @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", "200"]) + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "PUT_MULTIPART_PART", "200"]) .observe(part_elapsed); @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "PUT_MULTIPART", status_code]) + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "PUT_MULTIPART_PART", status_code]) .observe(part_elapsed);(Complete keeps PUT_MULTIPART_COMPLETE as-is.)
Also applies to: 804-829, 838-850
406-469
: LGTM: delete-prefix streaming instrumentation is solid.Accurate LIST timing, per-item DELETE timing, and files_scanned/deleted tallies.
[ approve_code_changes ]
543-585
: LGTM: LIST/PUT/list_hours/list_minutes/list_dirs instrumentation is consistent and correct.
[ approve_code_changes ]Also applies to: 675-704, 1288-1367, 1400-1477
src/query/stream_schema_provider.rs (3)
862-885
: Eliminate panics and unbounded fan-out in collect_manifest_files().expect() can crash the process; spawning unbounded tasks scales poorly and loses path context. Bound concurrency, handle errors, and keep path context.
[ raise_critical_issue ]pub async fn collect_manifest_files( storage: Arc<dyn ObjectStorage>, manifest_urls: Vec<String>, ) -> Result<Vec<Manifest>, ObjectStorageError> { - let mut tasks = Vec::new(); - manifest_urls.into_iter().for_each(|path| { - let path = RelativePathBuf::from_path(PathBuf::from(path)).expect("Invalid path"); - let storage = Arc::clone(&storage); - tasks.push(tokio::task::spawn(async move { - storage.get_object(&path).await - })); - }); - - let mut op = Vec::new(); - for task in tasks { - let file = task.await??; - op.push(file); - } - - Ok(op - .into_iter() - .map(|res| serde_json::from_slice(&res).expect("Data is invalid for Manifest")) - .collect()) + let concurrency = std::cmp::min(32, std::cmp::max(1, num_cpus::get() * 2)); + let results = futures_util::stream::iter(manifest_urls.into_iter()) + .map(|path| { + let storage = Arc::clone(&storage); + async move { + let rel = RelativePathBuf::from_path(PathBuf::from(path)) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + storage.get_object(&rel).await + } + }) + .buffer_unordered(concurrency) + .collect::<Vec<_>>() + .await; + + let mut manifests = Vec::with_capacity(results.len()); + for res in results { + let bytes = res?; + let manifest: Manifest = + serde_json::from_slice(&bytes) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + manifests.push(manifest); + } + Ok(manifests) }Add import (outside this hunk):
use futures_util::stream::StreamExt;
585-591
: Avoid double-counting GET “files scanned” here; MetricLayer already tracks real GETs.This pre-emptively increments GET counts based on planned parquet files, which will be counted again by the storage layer during actual reads (and may diverge due to pruning/errors). Recommend removing.
[ request_verification ]
[ suggest_optional_refactor ]- let parquet_files_to_scan = manifest_files.len(); - increment_files_scanned_in_object_store_calls_by_date( - PARSEABLE.storage().name(), - "GET", - parquet_files_to_scan as u64, - &Utc::now().date_naive().to_string(), - );If you intend to track “planned reads,” consider a separate metric (e.g., QUERY_PLANNED_PARQUET_FILES) to avoid conflating with backend GETs.
408-412
: LGTM: query billing metrics (files/bytes) are computed once per scan and tagged by date.
[ approve_code_changes ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/localfs.rs (1)
842-877
: Missing instrumentation for list_hours and list_minutes.These directory listing operations lack the same comprehensive instrumentation applied to other listing methods in this file.
Apply consistent instrumentation to both methods:
async fn list_hours( &self, stream_name: &str, date: &str, ) -> Result<Vec<String>, ObjectStorageError> { let path = self.root.join(stream_name).join(date); - let directories = ReadDirStream::new(fs::read_dir(&path).await?); + let list_start = Instant::now(); + let result = fs::read_dir(&path).await; + let list_elapsed = list_start.elapsed().as_secs_f64(); + + let read_dir = match result { + Ok(read_dir) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["localfs", "LIST", "200"]) + .observe(list_elapsed); + read_dir + } + Err(err) => { + let status_code = match err.kind() { + std::io::ErrorKind::NotFound => "404", + std::io::ErrorKind::PermissionDenied => "403", + _ => "500", + }; + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["localfs", "LIST", status_code]) + .observe(list_elapsed); + return Err(err.into()); + } + }; + + let directories = ReadDirStream::new(read_dir);
🧹 Nitpick comments (1)
src/storage/localfs.rs (1)
455-482
: DELETE prefix operation should track files affected.The
delete_prefix
operation removes entire directory trees but doesn't track how many files were actually deleted. This makes the metrics less useful for understanding the scale of delete operations.Consider tracking the number of files/directories affected by the delete operation:
async fn delete_prefix(&self, path: &RelativePath) -> Result<(), ObjectStorageError> { let path = self.path_in_root(path); + + // Count files before deletion + let files_count = count_files_recursively(&path).await.unwrap_or(0); let delete_start = Instant::now(); let result = tokio::fs::remove_dir_all(path).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", "200"]) .observe(delete_elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "localfs", + "DELETE", + files_count, + &Utc::now().date_naive().to_string(), + ); + increment_object_store_calls_by_date( + "localfs", + "DELETE", + &Utc::now().date_naive().to_string(), + ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/storage/localfs.rs
(15 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/localfs.rs
🧬 Code graph analysis (1)
src/storage/localfs.rs (4)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(545-554)increment_object_store_calls_by_date
(539-543)status_code
(574-578)src/storage/azure_blob.rs (4)
get_ingestor_meta_file_paths
(888-930)get_stream_file_paths
(932-983)delete_object
(1003-1034)check
(1036-1070)src/storage/gcs.rs (4)
get_ingestor_meta_file_paths
(840-878)get_stream_file_paths
(880-927)delete_object
(947-974)check
(976-1006)src/storage/s3.rs (6)
get_ingestor_meta_file_paths
(1005-1043)get_stream_file_paths
(1045-1092)from
(1485-1493)from
(1497-1499)delete_object
(1112-1139)check
(1141-1171)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (12)
src/storage/localfs.rs (12)
28-28
: LGTM! Proper imports for centralized metrics.The addition of
chrono::Utc
and centralized storage metrics imports aligns with the PR's objective to standardize metrics collection across all storage providers.Also applies to: 42-45
152-194
: LGTM! Comprehensive instrumentation with proper error handling.The
get_object
method now includes proper timing measurement and status code mapping (200/404/500). The implementation correctly increments metrics only on successful operations and maps filesystem errors to appropriate HTTP status codes.
196-253
: LGTM! Proper LIST operation instrumentation.The
get_ingestor_meta_file_paths
method correctly instruments both the list operation timing and files scanned counting. The implementation follows the same pattern as other storage providers in the codebase.
255-324
: LGTM! Consistent instrumentation pattern.The
get_stream_file_paths
method properly instruments the LIST operation and tracks files scanned, following the established pattern across storage providers.
326-411
: LGTM! Proper instrumentation for complex operation.The
get_objects
method correctly separates timing for the initial LIST operation and individual GET operations. Files scanned counting is properly accumulated across both phases of the operation.
413-453
: LGTM! Proper PUT operation instrumentation.The
put_object
method includes comprehensive timing measurement and appropriate status code observation (200 for success, 500 for errors).
484-523
: LGTM! Comprehensive DELETE operation instrumentation.The
delete_object
method includes proper timing, status code mapping (200/404/403/500), and tracks the single file deletion in metrics.
525-554
: LGTM! Appropriate instrumentation for check operation.The
check
method correctly instruments the directory creation operation and uses HEAD as the method type, which is consistent with other storage providers since this is a health check operation.
612-662
: LGTM! Proper LIST operation instrumentation.Both
list_streams
andlist_old_streams
methods correctly instrument the directory listing operations with timing measurement, status code mapping, and call counting.Also applies to: 664-713
715-754
: LGTM! Consistent instrumentation across directory listing methods.Both
list_dirs
andlist_dirs_relative
methods include appropriate timing and status code instrumentation, maintaining consistency across the storage interface.Also applies to: 756-800
802-840
: LGTM! Proper date listing instrumentation.The
list_dates
method includes both timing instrumentation and call counting, following the established pattern.
887-921
: LGTM! Comprehensive upload file instrumentation.The
upload_file
method includes proper timing measurement, status code mapping (200/500), and call counting for successful operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
resources/formats.json (2)
1469-1469
: Consider preserving backward compatibility for the renamed format.Renaming "rust_server_logs" to "parseable_server_logs" is a breaking change for existing users referencing the old name. Either keep an alias or retain a duplicate entry with the old name (and the same regex list) until a deprecation window passes.
1472-1472
: Make timestamp regex accept timezone offsets as well as Z.Some deployments emit offsets like +00:00 or -0700. Allow both.
Apply this diff:
- "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.*?):(?P<line_number>\\d+):\\s+(?P<body>.*)", + "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+(?:Z|[-+]\\d{2}:?\\d{2})?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.*?):(?P<line_number>\\d+):\\s+(?P<body>.*)",src/metrics/mod.rs (2)
39-44
: Fix help text grammar: use “size in bytes”.Minor clarity tweak across help strings.
Apply this diff:
- "events_ingested_size", - "Events ingested size bytes for a stream", + "events_ingested_size", + "Events ingested size in bytes for a stream", @@ - Opts::new("storage_size", "Storage size bytes for a stream").namespace(METRICS_NAMESPACE), + Opts::new("storage_size", "Storage size in bytes for a stream").namespace(METRICS_NAMESPACE), @@ - "events_deleted_size", - "Events deleted size bytes for a stream", + "events_deleted_size", + "Events deleted size in bytes for a stream", @@ - "deleted_events_storage_size", - "Deleted events storage size bytes for a stream", + "deleted_events_storage_size", + "Deleted events storage size in bytes for a stream", @@ - "lifetime_events_ingested_size", - "Lifetime events ingested size bytes for a stream", + "lifetime_events_ingested_size", + "Lifetime events ingested size in bytes for a stream", @@ - "lifetime_events_storage_size", - "Lifetime events storage size bytes for a stream", + "lifetime_events_storage_size", + "Lifetime events storage size in bytes for a stream",Also applies to: 51-51, 67-72, 81-82, 104-106, 116-118
450-457
: Name latency histograms with “_seconds” and set buckets.Aligns with Prometheus best practices and yields actionable distributions.
Apply this diff:
-pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) +pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { + HistogramVec::new( + HistogramOpts::new( + "storage_request_latency_seconds", + "Storage request latency in seconds", + ) + .buckets(vec![ + 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, + ]) + .namespace(METRICS_NAMESPACE), + &["provider", "method", "status"], + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
resources/formats.json
(1 hunks)src/metrics/mod.rs
(9 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/metrics/mod.rs
🧬 Code graph analysis (1)
src/metrics/mod.rs (2)
src/storage/metrics_layer.rs (1)
new
(71-76)src/handlers/http/cluster/utils.rs (4)
new
(39-51)new
(66-84)new
(99-117)new
(129-136)
🔇 Additional comments (3)
src/metrics/mod.rs (3)
125-135
: Good change: per-date metrics switched to counters.Using IntCounterVec for daily increments is correct and avoids gauge reset confusion.
161-195
: ACK: TOTAL_*_DATE are global (not per-stream).Labels ["format","date"] match the intended global-aggregate design. No change needed.
339-448
: No writer found for TOTAL_CLUSTER_ gauges*I can see the gauges declared and registered in src/metrics/mod.rs, and a cluster metrics scheduler that ingests "cluster-metrics" into the internal stream (src/handlers/http/cluster/mod.rs), but my search found no calls that set/inc the TOTAL_CLUSTER_* statics. Confirm where the cluster-wide gauges are populated (or add an aggregator/updater that sets/inc them).
c50d0c9
to
8eceae3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/metrics_layer.rs (1)
290-315
: Panic risk: label cardinality mismatch in streaming wrappers; provider label missing.
StreamMetricWrapper
emits only ["LIST","200"] (2 labels) toSTORAGE_REQUEST_RESPONSE_TIME
, which expects [provider, operation, status] (3 labels). This will panic at runtime. Also, streaming errors aren’t reflected in status.Fix by carrying provider/op/status in the wrapper and updating status on first error:
@@ - fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { - let time = time::Instant::now(); - let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; - Box::pin(res) - } + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + let time = time::Instant::now(); + let inner = self.inner.list(prefix); + let res = StreamMetricWrapper { + time, + provider: self.provider.clone(), + op: "LIST", + status: "200", + inner, + }; + Box::pin(res) + } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; - - Box::pin(res) + let res = StreamMetricWrapper { + time, + provider: self.provider.clone(), + op: "LIST_OFFSET", + status: "200", + inner, + }; + Box::pin(res) } @@ -struct StreamMetricWrapper<'a, const N: usize, T> { - time: time::Instant, - labels: [&'static str; N], - inner: BoxStream<'a, T>, -} +struct StreamMetricWrapper<'a> { + time: time::Instant, + provider: String, + op: &'static str, + status: &'static str, + inner: BoxStream<'a, ObjectStoreResult<ObjectMeta>>, +} @@ -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { - type Item = T; +impl<'a> Stream for StreamMetricWrapper<'a> { + type Item = ObjectStoreResult<ObjectMeta>; @@ - match self.inner.poll_next_unpin(cx) { - t @ Poll::Ready(None) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) - .observe(self.time.elapsed().as_secs_f64()); - t - } - t => t, - } + let t = self.inner.poll_next_unpin(cx); + if let Poll::Ready(Some(Err(ref e))) = t { + // Capture first error status + self.status = error_to_status_code(e); + } + if let Poll::Ready(None) = t { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[&self.provider, self.op, self.status]) + .observe(self.time.elapsed().as_secs_f64()); + } + t } }Optionally align op labels to method names for clarity:
- "COPY_IF" -> "COPY_IF_NOT_EXISTS"
- "RENAME_IF" -> "RENAME_IF_NOT_EXISTS"
Also applies to: 398-422
♻️ Duplicate comments (8)
src/storage/object_storage.rs (1)
146-150
: Don’t fail upload if metadata() fails; make size metrics best-effort.Propagating the
metadata()
error aborts the flow after upload, leaving uploaded objects without manifests. Log and continue with size=0 (or move metrics post-manifest) so retries remain possible and invariants hold. This mirrors prior guidance.Apply:
- let compressed_size = path - .metadata() - .map(|m| m.len()) - .map_err(|e| ObjectStorageError::Custom(format!("metadata failed for {filename}: {e}")))?; + let compressed_size = match path.metadata().map(|m| m.len()) { + Ok(len) => len, + Err(e) => { + warn!("metadata() failed for {filename}: {e}; skipping size-based metrics"); + 0 + } + };src/query/listing_table_builder.rs (1)
90-111
: Bug: child entries lose their parent prefix; join base with child.The flattened
listing
currently includes only child names (e.g., “minute=00”), producing invalid URIs downstream. Join each child with its base prefix.- // Use storage.list_dirs_relative for all prefixes and flatten results - let mut listing = Vec::new(); - for prefix in prefixes { - match storage.list_dirs_relative(&prefix).await { - Ok(paths) => { - listing.extend(paths.into_iter().map(|p| p.to_string())); - } - Err(e) => { - return Err(DataFusionError::External(Box::new(e))); - } - } - } + // Use storage.list_dirs_relative for all prefixes and flatten results, + // preserving the full relative path + let mut listing = Vec::new(); + for prefix in prefixes { + let base = prefix.as_str().to_owned(); + match storage.list_dirs_relative(&prefix).await { + Ok(children) => { + listing.extend(children.into_iter().map(|c| format!("{}/{}", base, c))); + } + Err(e) => { + return Err(DataFusionError::External(Box::new(e))); + } + } + }src/storage/azure_blob.rs (2)
683-697
: Remove duplicate GET metrics inside get_objects loop._get_object already records GET latency and counters; this double-counts and uses LIST duration, skewing latency.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - );
215-237
: Avoid panic and measure full GET latency; handle body read errors.resp.bytes().await.unwrap() can panic; also elapsed is captured before reading the body, underreporting latency and recording “200” before validating success.
Apply:
- let time = std::time::Instant::now(); - let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let time = std::time::Instant::now(); + let resp = self.client.get(&to_object_store_path(path)).await; @@ - match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } - Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", status_code]) - .observe(elapsed); - Err(err.into()) - } - } + match resp { + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }, + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }src/storage/gcs.rs (3)
647-660
: Remove duplicate GET metrics inside get_objects loop.get_object() already emits GET metrics; this causes double counting and skewed latency.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - );
261-283
: Count deletions only on success; current code overcounts.files_deleted is incremented before delete.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); }
180-199
: Avoid panic and measure full GET latency; handle body read errors.Same issues as Azure: unwrap and early elapsed.
- let time = std::time::Instant::now(); - let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let time = std::time::Instant::now(); + let resp = self.client.get(&to_object_store_path(path)).await; @@ - Ok(resp) => { - let body: Bytes = resp.bytes().await.unwrap(); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "gcs", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }src/query/stream_schema_provider.rs (1)
879-901
: Eliminate panics, bound concurrency, and preserve error context in manifest collection.expect() can panic on bad paths/JSON; unbounded spawning scales poorly.
@@ -pub async fn collect_manifest_files( - storage: Arc<dyn ObjectStorage>, - manifest_urls: Vec<String>, -) -> Result<Vec<Manifest>, ObjectStorageError> { - let mut tasks = Vec::new(); - manifest_urls.into_iter().for_each(|path| { - let path = RelativePathBuf::from_path(PathBuf::from(path)).expect("Invalid path"); - let storage = Arc::clone(&storage); - tasks.push(tokio::task::spawn(async move { - storage.get_object(&path).await - })); - }); - - let mut op = Vec::new(); - for task in tasks { - let file = task.await??; - op.push(file); - } - - Ok(op - .into_iter() - .map(|res| serde_json::from_slice(&res).expect("Data is invalid for Manifest")) - .collect()) -} +pub async fn collect_manifest_files( + storage: Arc<dyn ObjectStorage>, + manifest_urls: Vec<String>, +) -> Result<Vec<Manifest>, ObjectStorageError> { + let concurrency = std::cmp::min(32, std::cmp::max(1, num_cpus::get() * 2)); + let results = futures_util::stream::iter(manifest_urls.into_iter()) + .map(|path| { + let storage = Arc::clone(&storage); + async move { + let path = RelativePathBuf::from_path(PathBuf::from(path)) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + storage.get_object(&path).await + } + }) + .buffer_unordered(concurrency) + .collect::<Vec<_>>() + .await; + + let mut manifests = Vec::with_capacity(results.len()); + for res in results { + let bytes = res?; + let manifest: Manifest = serde_json::from_slice(&bytes) + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?; + manifests.push(manifest); + } + Ok(manifests) +}Add import:
- use futures_util::TryFutureExt; + use futures_util::{stream::StreamExt, TryFutureExt};
🧹 Nitpick comments (6)
src/stats.rs (1)
211-227
: Tighten delete_with_label_prefix matching to reduce accidental removals.Current match checks only for value presence (across any label name). If label values like "data"/"parquet" appear in other metrics with different semantics, we could remove unrelated series. Prefer matching by known label names (e.g., stream, format) where possible.
src/storage/metrics_layer.rs (1)
365-395
: Op label consistency.Consider renaming "COPY_IF" and "RENAME_IF" to the full method names for easier dashboards/search.
src/storage/azure_blob.rs (1)
566-578
: Multipart completion isn’t counted; add call and “file scanned” increments.Add a call counter for PUT_MULTIPART_COMPLETE and record a single “file scanned” for the whole upload on success to match the single-PUT path.
let complete_elapsed = complete_start.elapsed().as_secs_f64(); + let date = &Utc::now().date_naive().to_string(); + increment_object_store_calls_by_date("azure_blob", "PUT_MULTIPART_COMPLETE", date); if let Err(err) = complete_result { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "PUT_MULTIPART_COMPLETE", status_code]) .observe(complete_elapsed); error!("Failed to complete multipart upload. {:?}", err); async_writer.abort().await?; return Err(err.into()); } else { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "PUT_MULTIPART_COMPLETE", "200"]) .observe(complete_elapsed); + // Count one logical file upload + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", + "PUT", + 1, + date, + ); }src/storage/gcs.rs (1)
499-521
: Multipart completion isn’t counted; add call and “file scanned” increments.Mirror the Azure suggestion for consistent billing counters.
- let complete_elapsed = complete_start.elapsed().as_secs_f64(); + let complete_elapsed = complete_start.elapsed().as_secs_f64(); + let date = &Utc::now().date_naive().to_string(); + increment_object_store_calls_by_date("gcs", "PUT_MULTIPART_COMPLETE", date); @@ - } else { + } else { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "PUT_MULTIPART_COMPLETE", "200"]) .observe(complete_elapsed); + increment_files_scanned_in_object_store_calls_by_date("gcs", "PUT", 1, date); }src/metrics/mod.rs (2)
450-457
: Histogram labels OK; consider explicit buckets.Optional: set buckets (e.g., sub-100ms to multi-second) to improve latency SLO tracking.
643-699
: Optional: centralize UTC date label formatting.Date strings are constructed ad hoc across the codebase; a small helper (e.g., fn today_utc_date()) would reduce drift and mistakes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
resources/formats.json
(1 hunks)src/catalog/mod.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/stream_schema_provider.rs
(9 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(14 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(21 hunks)src/users/dashboards.rs
(1 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/server.rs
- src/metrics/storage.rs
✅ Files skipped from review due to trivial changes (1)
- src/users/dashboards.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/handlers/http/query.rs
- src/catalog/mod.rs
- src/metadata.rs
- resources/formats.json
- src/storage/localfs.rs
- src/storage/s3.rs
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/event/mod.rs
src/storage/object_storage.rs
src/storage/metrics_layer.rs
src/stats.rs
src/metrics/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/azure_blob.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/azure_blob.rs
src/storage/metrics_layer.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/event/mod.rs
src/storage/object_storage.rs
src/stats.rs
src/metrics/mod.rs
🧬 Code graph analysis (6)
src/storage/gcs.rs (7)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1063-1065)new
(78-87)head
(208-208)get_ingestor_meta_file_paths
(262-264)check
(227-227)src/storage/mod.rs (4)
to_object_store_path
(293-295)new
(204-206)new
(217-223)from
(179-185)src/storage/localfs.rs (5)
new
(100-102)head
(132-151)get_ingestor_meta_file_paths
(217-274)from
(993-995)check
(475-504)src/storage/azure_blob.rs (8)
resp
(1134-1139)resp
(1179-1184)_delete_prefix
(286-353)_list_dates
(355-401)_upload_file
(403-436)head
(606-637)get_ingestor_meta_file_paths
(719-761)check
(814-848)src/storage/s3.rs (10)
resp
(1223-1228)resp
(1264-1269)_delete_prefix
(405-468)_list_dates
(470-512)_upload_file
(514-543)head
(743-770)get_ingestor_meta_file_paths
(844-882)from
(1285-1293)from
(1297-1299)check
(931-961)
src/storage/azure_blob.rs (4)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/gcs.rs (7)
_delete_prefix
(243-306)_list_dates
(308-350)upload_multipart
(566-572)head
(574-601)get_ingestor_meta_file_paths
(679-717)check
(766-796)stream_json_check
(905-905)src/storage/s3.rs (9)
_delete_prefix
(405-468)_list_dates
(470-512)upload_multipart
(735-741)head
(743-770)get_ingestor_meta_file_paths
(844-882)from
(1285-1293)from
(1297-1299)check
(931-961)stream_json_check
(1070-1070)
src/event/mod.rs (1)
src/metrics/mod.rs (2)
increment_events_ingested_by_date
(644-648)increment_events_ingested_size_by_date
(650-654)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(656-660)increment_parquets_stored_size_by_date
(662-666)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(606-637)src/storage/gcs.rs (1)
head
(574-601)src/storage/s3.rs (3)
head
(743-770)from
(1285-1293)from
(1297-1299)
src/query/stream_schema_provider.rs (3)
src/correlation.rs (1)
path
(227-234)src/metrics/mod.rs (3)
increment_bytes_scanned_in_query_by_date
(678-682)increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_files_scanned_in_query_by_date
(672-676)src/storage/s3.rs (2)
from
(1285-1293)from
(1297-1299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (13)
src/event/mod.rs (1)
92-96
: Billing metrics label cardinality verified — no mismatch found.TOTAL_EVENTS_INGESTED_DATE / TOTAL_EVENTS_INGESTED_SIZE_DATE / TOTAL_EVENTS_STORAGE_SIZE_DATE are IntGaugeVec with &["format","date"]; billing counters (TOTAL_EVENTS_INGESTED_BY_DATE, TOTAL_EVENTS_INGESTED_SIZE_BY_DATE, TOTAL_PARQUETS_STORED_BY_DATE, TOTAL_PARQUETS_STORED_SIZE_BY_DATE) are IntCounterVec with &["date"]; helper increment_* functions call .with_label_values(&[date]) and the registry registers both — no runtime label-count mismatch detected.
src/stats.rs (2)
127-139
: Removing per-stream day series resets the entire day; ensure full-day deletions only.
remove_label_values
on EVENTS_*_DATE wipes the whole stream+format+date series. If any manifests for that day remain, this will undercount. Please confirmdates
always represent entire-day purges for the stream; otherwise, we need a recompute path instead of removal.
144-152
: Good: subtract from global totals instead of removing.Switching to
.sub(...)
on TOTAL_*_DATE avoids clearing other streams’ contributions. Keep this pattern for global totals.src/storage/object_storage.rs (1)
159-166
: LGTM: global storage-size totals and billing counters.
TOTAL_EVENTS_STORAGE_SIZE_DATE
and theincrement_parquets_*_by_date
calls align with the new global/billing metrics model.src/storage/metrics_layer.rs (2)
35-62
: Error -> status mapping: solid and consistent.The mapping covers common object_store errors with sensible HTTP-like codes.
33-34
: STORAGE_REQUEST_RESPONSE_TIME is registered with labels [provider, method, status] (not "operation"). Defined in src/metrics/mod.rs (~line 450); call sites pass values as [provider, , status] by position (e.g. src/storage/metrics_layer.rs:103), so behavior is correct — rename "method" → "operation" only if you want naming consistency.Likely an incorrect or invalid review comment.
src/storage/azure_blob.rs (2)
176-177
: Good: MetricsLayer wrapped around Azure store.Provider label aligned to "azure_blob" here; ensures DataFusion ops are timed uniformly.
168-171
: Align provider name with metrics label ("azure_blob")ObjectStorageProvider::name returns "blob-store" while metrics in this module use "azure_blob" — causes split aggregation; change the provider name to "azure_blob".
- fn name(&self) -> &'static str { - "blob-store" - } + fn name(&self) -> &'static str { + "azure_blob" + }Verify CLI/helm references that use "blob-store" (src/cli.rs, helm/values.yaml) and update mappings/docs if they must remain consistent.
src/storage/gcs.rs (1)
138-139
: Good: MetricsLayer wrapped around GCS store.Ensures provider-scoped latency labels.
src/query/stream_schema_provider.rs (2)
328-351
: Nice: query billing counters for files and bytes scanned.Counters are updated once per plan build; aligns with PR goals.
Also applies to: 408-412
601-607
: Verify Azure provider label used for metrics (azure_blob vs blob-store)PARSEABLE.storage().name() is used to increment metrics; confirm Azure's provider name() — if it currently returns "blob-store" change it to "azure_blob" or normalize the name at metric recording so Azure and other providers aggregate correctly.
Location: src/query/stream_schema_provider.rs:601-607.
Actions: verify AzureBlobConfig::name() and either (a) update AzureBlobConfig::name() -> "azure_blob" or (b) map "blob-store" → "azure_blob" when calling increment_files_scanned_in_object_store_calls_by_date.src/metrics/mod.rs (2)
229-338
: Billing counters look good and cohesive.Clear separation of per-date counters for events, parquet, queries, and object store ops.
339-457
: Cluster gauge metrics registration looks consistent.Good symmetry with node-level counters; labels are minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
src/storage/gcs.rs (3)
647-660
: Remove duplicate GET metrics in get_objects (double count; wrong latency source).
get_object()
already emits GET metrics. This block double-counts and uses LIST stopwatch for GET latency.- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - );
261-276
: Count deleted files on success only.
files_deleted.fetch_add(1, ...)
runs before the delete request and will overcount on failures. Move it into theOk(_)
branch.- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "gcs", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); }
186-205
: Record GET status on body-read failures; don’t short-circuit metrics with?
.If
resp.bytes().await
fails, we currently return via?
without emitting latency/status. Match the result to record failure with the correct status and only increment files-scanned on success.- Ok(resp) => { - let body: Bytes = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + Ok(resp) => { + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "gcs", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } + }src/storage/s3.rs (2)
816-826
: Remove duplicate GET metrics in get_objects (already recorded by_get_object
).Same duplication as GCS; also uses LIST stopwatch for GET latency.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
349-367
: Record GET status on body-read failures; don’t short-circuit metrics with?
.Mirror the GCS fix: handle
resp.bytes().await
with a match, emit failure status and latency, and only increment files-scanned on success.- Ok(resp) => { - let body = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + Ok(resp) => { + match resp.bytes().await { + Ok(body) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "s3", + "GET", + 1, + &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } + }
🧹 Nitpick comments (12)
src/query/stream_schema_provider.rs (4)
327-333
: Accumulate scan stats here is fine; minor nit on types.
- Using
usize
for CPU count is fine;file_count/total_file_size
asu64
is appropriate for metrics.- Note: if
file_size
can be absent/unknown for any file, consider guarding the add to avoid inflating bytes.Also applies to: 343-350
407-411
: Centralize “current_date” to avoid drift and pass it in.You compute
Utc::now().date_naive()
insidepartitioned_files
. If a long query crosses midnight, attribution can drift across days per call site. Suggest capturing the date once at scan() start and threading it through.Apply:
- fn partitioned_files( + fn partitioned_files( &self, manifest_files: Vec<File>, - ) -> (Vec<Vec<PartitionedFile>>, datafusion::common::Statistics) { + current_date: &str, + ) -> (Vec<Vec<PartitionedFile>>, datafusion::common::Statistics) { ... - let current_date = chrono::Utc::now().date_naive().to_string(); - increment_files_scanned_in_query_by_date(file_count, ¤t_date); - increment_bytes_scanned_in_query_by_date(total_file_size, ¤t_date); + increment_files_scanned_in_query_by_date(file_count, current_date); + increment_bytes_scanned_in_query_by_date(total_file_size, current_date);And pass
&Utc::now().date_naive().to_string()
from the two call sites.
600-607
: Avoid over/duplicate counting: move GET call metrics to storage layer.Counting “files scanned” for GET at planning time likely diverges from reality (row-group pruning, retries, range GETs), and may double-count if storage backends already instrument GET calls. Prefer emitting this metric inside the actual GET call path (ObjectStorage impl), or rename this to “files_planned_for_scan” to avoid conflation.
Can you confirm whether GET/HEAD/LIST are already instrumented in
src/storage/object_storage.rs
? If yes, drop this increment to prevent double-counting.
219-270
: Include staging parquet bytes/files in query totals (if intended).Staging parquet branch constructs
PartitionedFile
s but doesn’t callpartitioned_files
, so the “in_query_by_date” counters omit those scans. If the daily query metrics are meant to be global (across all storage tiers), increment there as well.This depends on whether staging should be billable/visible in these totals. If yes, I can propose a minimal helper to compute
file_count/bytes
during the staging loop and call the same increment fns.src/storage/gcs.rs (3)
474-496
: Add per-part PUT_MULTIPART files-scanned increment (align with s3.rs).S3 increments for each successful part; GCS currently doesn’t. Add it to keep parity across providers.
Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "PUT_MULTIPART", "200"]) .observe(part_elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "gcs", + "PUT_MULTIPART", + 1, + &Utc::now().date_naive().to_string(), + ); }
383-523
: Avoid loading entire file into memory for multipart uploads.
file.read_to_end(...)
reads the full object; stream fixed-size chunks instead to cap memory.Example (outside this hunk) of chunked upload:
let mut buf = vec![0u8; MIN_MULTIPART_UPLOAD_SIZE]; loop { let n = file.read(&mut buf).await?; if n == 0 { break; } let part = Bytes::copy_from_slice(&buf[..n]); // emit metrics around put_part(part.clone()) }
243-306
: Optional: bound delete concurrency to avoid provider throttling.
for_each_concurrent(None, ...)
can fan out deletes unbounded. Consider a cap (e.g.,Some(super::MAX_OBJECT_STORE_REQUESTS)
).src/storage/s3.rs (2)
585-610
: Stream multipart data; avoidread_to_end
to cap memory.Load-and-split reads the entire file into RAM. Stream fixed-size chunks to
put_part
.// inside the large-file branch, replace read_to_end + slicing with a loop: let mut buf = vec![0u8; MIN_MULTIPART_UPLOAD_SIZE]; loop { let n = file.read(&mut buf).await?; if n == 0 { break; } let part = Bytes::copy_from_slice(&buf[..n]); // measure/emit metrics per part, as you already do async_writer.put_part(part.into()).await?; }Also applies to: 639-667
405-468
: Optional: bound delete concurrency to avoid throttling.Consider
for_each_concurrent(Some(super::MAX_OBJECT_STORE_REQUESTS), ...)
.src/event/format/known_schema.rs (3)
518-523
: Preferexpect
overunwrap
for clearer failure context.Use a descriptive
expect
so failures point directly to a missing or misnamed schema.- let schema = processor - .schema_definitions - .get("parseable_server_logs") - .unwrap(); + let schema = processor + .schema_definitions + .get("parseable_server_logs") + .expect("parseable_server_logs schema not found in resources/formats.json");
539-545
: Update stale error text ("rust server log") → "parseable server log".Matches the renamed test and new format.
- "Failed to extract fields from rust server log {}: {}", + "Failed to extract fields from parseable server log {}: {}",
547-586
: Strengthen assertions: verify prefixes and format flag; simplify conditional.
- Assert customer_id and deployment_id are parsed.
- Assert FORMAT_VERIFY_KEY is set to "true" on success.
- Drop the
if i < 4
guard; in this test set all lines follow the same format.// Verify basic fields that should be present in all formats assert!( obj.contains_key("timestamp"), "Missing timestamp field for log {}", i + 1 ); assert!( obj.contains_key("level"), "Missing level field for log {}", i + 1 ); assert!( obj.contains_key("body"), "Missing body field for log {}", i + 1 ); assert!( obj.contains_key("module"), "Missing module field for log {}", i + 1 ); + + // Verify provider-specific prefixes exist and match the first two tokens + let tokens: Vec<_> = log_text.split_whitespace().collect(); + assert!( + obj.contains_key("customer_id") && obj.contains_key("deployment_id"), + "Missing customer_id/deployment_id for log {}", + i + 1 + ); + assert_eq!( + obj.get("customer_id").unwrap().as_str().unwrap(), + tokens[0], + "customer_id mismatch for log {}", + i + 1 + ); + assert_eq!( + obj.get("deployment_id").unwrap().as_str().unwrap(), + tokens[1], + "deployment_id mismatch for log {}", + i + 1 + ); + + // FORMAT_VERIFY_KEY should be true on successful extraction + assert_eq!( + obj.get(FORMAT_VERIFY_KEY).and_then(|v| v.as_str()), + Some("true"), + "FORMAT_VERIFY_KEY should be 'true' for log {}", + i + 1 + ); - // ThreadId and line_number are only present in the first format (logs 0-3) - if i < 4 { - assert!( - obj.contains_key("logger_context"), - "Missing logger_context field for log {}", - i + 1 - ); - assert!( - obj.contains_key("thread_id"), - "Missing thread_id field for log {}", - i + 1 - ); - assert!( - obj.contains_key("line_number"), - "Missing line_number field for log {}", - i + 1 - ); - } + // For this format, these should be present for all logs in the set + assert!( + obj.contains_key("logger_context"), + "Missing logger_context field for log {}", + i + 1 + ); + assert!( + obj.contains_key("thread_id"), + "Missing thread_id field for log {}", + i + 1 + ); + assert!( + obj.contains_key("line_number"), + "Missing line_number field for log {}", + i + 1 + );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/event/format/known_schema.rs
(1 hunks)src/query/stream_schema_provider.rs
(7 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/s3.rs
(21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/storage/azure_blob.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/gcs.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/query/stream_schema_provider.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
🧬 Code graph analysis (3)
src/storage/gcs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/azure_blob.rs (8)
resp
(1125-1130)resp
(1170-1175)_delete_prefix
(286-353)_list_dates
(355-401)_upload_file
(403-436)head
(606-637)get_ingestor_meta_file_paths
(719-761)check
(814-848)
src/query/stream_schema_provider.rs (3)
src/metrics/mod.rs (3)
increment_bytes_scanned_in_query_by_date
(678-682)increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_files_scanned_in_query_by_date
(672-676)src/query/listing_table_builder.rs (1)
new
(47-52)src/catalog/mod.rs (2)
file_size
(60-60)file_size
(74-76)
src/storage/s3.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/localfs.rs (4)
new
(100-102)head
(132-151)get_ingestor_meta_file_paths
(217-274)from
(993-995)src/storage/azure_blob.rs (6)
resp
(1125-1130)resp
(1170-1175)_delete_prefix
(286-353)_list_dates
(355-401)head
(606-637)get_ingestor_meta_file_paths
(719-761)src/storage/gcs.rs (6)
resp
(1049-1054)resp
(1090-1095)_delete_prefix
(243-306)_list_dates
(308-350)head
(574-601)get_ingestor_meta_file_paths
(679-717)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (9)
src/query/stream_schema_provider.rs (5)
47-47
: LGTM: futures import matches usage.
TryFutureExt
is used by.and_then(...)
below; import is justified.
54-54
: LGTM: explicit File import.Brings the manifest
File
type into scope for partitioning and metrics.
59-63
: LGTM: metrics wiring imports look correct.Signatures match the helpers in metrics/mod.rs and label cardinality aligns with the “by date” counters.
285-286
: Signature change adopted correctly for listing.
populate_via_listing(glob_storage, time_filters)
aligns with the new builder API; the chained.and_then
withTryFutureExt
is fine.
407-411
: Label cardinality check vs learnings.Per your earlier design, daily totals without stream labels are aggregated globally; these two increments follow that pattern. Just calling out that this is by-date-only (no stream label), which matches the retrieved learnings on totals vs per-stream metrics.
Also applies to: 600-607
src/storage/s3.rs (3)
299-316
: LGTM: MetricLayer wiring for DataFusion.Provider-labeled MetricLayer around the limited store looks good.
107-156
: LGTM: SSE-C parsing and validation.Solid input validation and error variants for SSEC. No changes needed.
697-733
: LGTM: Buffered reader HEAD instrumentation.HEAD timing/status and per-date counters look consistent with the metrics model.
src/event/format/known_schema.rs (1)
522-523
: Remove the redundant guard — formats.json already contains all expected fieldsVerification output: Actual fields = ['body','customer_id','deployment_id','level','line_number','logger_context','module','thread_id','timestamp'] — no missing/extra fields. Remove the runtime guard at src/event/format/known_schema.rs:522-523.
f9f4ab3
to
169da99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/stats.rs (1)
211-227
: Fix: use IntCounterVec::remove_label_values with order-aware prefix matching (src/stats.rs)metrics.remove(&label_map) won't compile and value-only HashMap matching is too permissive — use the registered-order label values and IntCounterVec::remove_label_values instead.
Location: src/stats.rs — calls at ~197–199; function definition at ~211–227.
-fn delete_with_label_prefix(metrics: &IntCounterVec, prefix: &[&str]) { - let families: Vec<MetricFamily> = metrics.collect().into_iter().collect(); - for metric in families.iter().flat_map(|m| m.get_metric()) { - let label_map: HashMap<&str, &str> = metric - .get_label() - .iter() - .map(|l| (l.get_name(), l.get_value())) - .collect(); - - // Check if all prefix elements are present in label values - let all_prefixes_found = prefix.iter().all(|p| label_map.values().any(|v| v == p)); - - if all_prefixes_found && let Err(err) = metrics.remove(&label_map) { - warn!("Error removing metric with labels {:?}: {err}", label_map); - } - } -} +fn delete_with_label_prefix(metrics: &IntCounterVec, prefix: &[&str]) { + // Collect then iterate metrics in this Vec + let families: Vec<MetricFamily> = metrics.collect().into_iter().collect(); + for metric in families.iter().flat_map(|m| m.get_metric()) { + // Preserve registration order of labels for correct removal + let label_values: Vec<&str> = metric.get_label().iter().map(|l| l.get_value()).collect(); + // Match only when the leading labels equal the provided prefix (order-aware) + if label_values.len() >= prefix.len() + && label_values[..prefix.len()] == prefix[..] + { + if let Err(err) = metrics.remove_label_values(&label_values) { + warn!("Error removing metric with labels {:?}: {err}", label_values); + } + } + } +}src/query/stream_schema_provider.rs (1)
365-365
: Fix compile-time bug: moved variable used after deconstruction
file
is destructured;file.file_size
no longer exists. Use the boundfile_size
variable.- let pf = PartitionedFile::new(file_path, file.file_size); + let pf = PartitionedFile::new(file_path, file_size);src/storage/metrics_layer.rs (1)
290-299
: Fix histogram label cardinality in streaming list wrappers
STORAGE_REQUEST_RESPONSE_TIME
expects 3 labels [provider, op, status], but StreamMetricWrapper supplies only 2, causing runtime label mismatch. Carry provider explicitly; avoid&'static str
for provider.- fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { - let time = time::Instant::now(); - let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; - Box::pin(res) - } + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + let time = time::Instant::now(); + let inner = self.inner.list(prefix); + let res = StreamMetricWrapper { + time, + provider: &self.provider, + operation: "LIST", + status: "200", + inner, + }; + Box::pin(res) + } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + operation: "LIST_OFFSET", + status: "200", + inner, + }; Box::pin(res) } @@ -struct StreamMetricWrapper<'a, const N: usize, T> { +struct StreamMetricWrapper<'a, T> { time: time::Instant, - labels: [&'static str; N], + provider: &'a str, + operation: &'static str, + status: &'static str, inner: BoxStream<'a, T>, } @@ -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; @@ - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[self.provider, self.operation, self.status]) .observe(self.time.elapsed().as_secs_f64());Also applies to: 301-315, 398-421
♻️ Duplicate comments (10)
src/stats.rs (1)
32-34
: Using global TOTAL_*_DATE metrics here is correct (keep them stream-agnostic).This resolves the earlier concern about deleting totals; you now adjust them via arithmetic instead of removing the series.
src/query/stream_schema_provider.rs (1)
862-872
: Do not unwrap timestamp parsing; handle malformed input and multiple formatsUnwraps can panic. Also,
NaiveDateTime::from_str
won’t parse ISO “T” format used in tests. Parse safely and support both “%Y-%m-%d %H:%M:%S” and “%Y-%m-%dT%H:%M:%S”.- ScalarValue::TimestampMillisecond(Some(value), _) => Some(( - binexpr.op, - DateTime::from_timestamp_millis(*value).unwrap().naive_utc(), - )), + ScalarValue::TimestampMillisecond(Some(value), _) => { + DateTime::from_timestamp_millis(*value).map(|dt| (binexpr.op, dt.naive_utc())) + }, ScalarValue::TimestampNanosecond(Some(value), _) => Some(( binexpr.op, DateTime::from_timestamp_nanos(*value).naive_utc(), )), - ScalarValue::Utf8(Some(str_value)) if is_time_partition => { - Some((binexpr.op, str_value.parse().unwrap())) - } + ScalarValue::Utf8(Some(str_value)) if is_time_partition => { + // Try common formats; return None if parsing fails. + if let Ok(dt) = NaiveDateTime::parse_from_str(str_value, "%Y-%m-%dT%H:%M:%S") { + Some((binexpr.op, dt)) + } else if let Ok(dt) = + NaiveDateTime::parse_from_str(str_value, "%Y-%m-%d %H:%M:%S") + { + Some((binexpr.op, dt)) + } else { + None + } + }src/storage/azure_blob.rs (3)
215-237
: Measure full GET latency (include body read) and avoid premature timing
elapsed
is taken beforeresp.bytes().await
, under-reporting GET latency. Move timing/observe into the success/error arms after body read.- let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let resp = self.client.get(&to_object_store_path(path)).await; @@ - Ok(resp) => { - let body: Bytes = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); + Ok(resp) => { + match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob", "GET", 1, &Utc::now().date_naive().to_string(), + ); + return Ok(body); + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + return Err(err.into()); + } + } - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) } - Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", status_code]) - .observe(elapsed); + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); Err(err.into()) }
308-331
: Count deletes only on success
files_deleted
is incremented before the delete; move it into theOk(_)
arm.- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "DELETE", "200"]) .observe(delete_elapsed);
683-697
: Remove duplicate GET metrics inside list loop
get_object()
already records GET timing and counts. The extra observations here double-count and use LIST timer for GET latency.- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - );src/storage/localfs.rs (2)
132-151
: Don’t emit metrics for unimplemented HEADRecording HEAD metrics here is misleading; the operation always errors.
- // Record attempt to access file (even though operation not implemented) - increment_files_scanned_in_object_store_calls_by_date( - "localfs", - "HEAD", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "localfs", - "HEAD", - &Utc::now().date_naive().to_string(), - );
506-532
: Also increment DELETE call counters for delete_stream and try_delete_node_metaAlign with other backends’ behavior.
- match &result { + match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", "200"]) .observe(delete_elapsed); } Err(err) => { let status_code = match err.kind() { std::io::ErrorKind::NotFound => "404", std::io::ErrorKind::PermissionDenied => "403", _ => "500", }; STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", status_code]) .observe(delete_elapsed); } } + increment_object_store_calls_by_date( + "localfs", + "DELETE", + &Utc::now().date_naive().to_string(), + );Apply the same pattern to try_delete_node_meta after observing timing.
Also applies to: 534-560
src/storage/s3.rs (3)
341-370
: Measure full GET latency (include body read) and handle body-read errorsTake
elapsed
after reading the body and record the correct status on read failure.- let resp = self.client.get(&to_object_store_path(path)).await; - let elapsed = time.elapsed().as_secs_f64(); + let resp = self.client.get(&to_object_store_path(path)).await; @@ - Ok(resp) => { - let body = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "s3", "GET", 1, &Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + } - Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", status_code]) - .observe(elapsed); + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); Err(err.into()) }
423-446
: Count deletes only on successMove
files_deleted.fetch_add(1, ...)
into the success arm of the inner delete.- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "DELETE", "200"]) .observe(delete_elapsed);
816-841
: Remove duplicate GET metrics inside get_objects()
_get_object()
already emits GET metrics; this block double-counts and uses LIST timer for GET latency.- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
🧹 Nitpick comments (5)
resources/formats.json (2)
1472-1472
: Broaden timestamp to accept no-fraction and TZ offsets (avoid silent mismatches).Current pattern mandates fractional seconds and only optional 'Z'. Suggest supporting optional fraction and ±HH:MM/±HHMM.
Apply:
- "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.*?):(?P<line_number>\\d+):\\s+(?P<body>.*)", + "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:?\\d{2})?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.+):(?P<line_number>\\d+):\\s+(?P<body>.*)",
1474-1484
: Downstream awareness: customer_id/deployment_id only exist for pattern 1.Lines not matching pattern 1 will parse via later patterns without these fields, potentially degrading per-origin metrics.
Confirm ingest tolerates null/missing customer_id/deployment_id for older lines. If needed, I can propose an enrichment fallback (e.g., default values from stream metadata) or add a secondary pattern with optional leading IDs.
src/query/stream_schema_provider.rs (2)
401-405
: Provide total_byte_size to DataFusion for better planningYou already track
total_file_size
; feed it intoStatistics.total_byte_size
for better optimizer hints.- let statistics = datafusion::common::Statistics { - num_rows: Precision::Exact(count as usize), - total_byte_size: Precision::Absent, + let statistics = datafusion::common::Statistics { + num_rows: Precision::Exact(count as usize), + total_byte_size: Precision::Exact(total_file_size as usize), column_statistics: statistics, };Also applies to: 331-333
171-218
: Typo in method nameRename
get_hottier_exectuion_plan
toget_hottier_execution_plan
(and call sites).- async fn get_hottier_exectuion_plan( + async fn get_hottier_execution_plan( ... - self.get_hottier_exectuion_plan( + self.get_hottier_execution_plan(Also applies to: 579-594
src/storage/localfs.rs (1)
405-432
: Add DELETE call counter for delete_prefix (consistency with other ops)You record timing but never increment the per-date call counter.
match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", "200"]) .observe(delete_elapsed); } Err(err) => { let status_code = match err.kind() { std::io::ErrorKind::NotFound => "404", std::io::ErrorKind::PermissionDenied => "403", _ => "500", }; STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", status_code]) .observe(delete_elapsed); } } + increment_object_store_calls_by_date( + "localfs", + "DELETE", + &Utc::now().date_naive().to_string(), + );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
resources/formats.json
(1 hunks)src/catalog/mod.rs
(1 hunks)src/event/format/known_schema.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/stream_schema_provider.rs
(7 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(14 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(21 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/ingest_server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- src/handlers/http/query.rs
- src/event/mod.rs
- src/catalog/mod.rs
- src/event/format/known_schema.rs
- src/storage/gcs.rs
- src/metadata.rs
- src/query/listing_table_builder.rs
- src/storage/object_storage.rs
- src/metrics/mod.rs
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/stats.rs
src/storage/metrics_layer.rs
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/stats.rs
src/storage/localfs.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/metrics_layer.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/query/stream_schema_provider.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.
Applied to files:
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/s3.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
🧬 Code graph analysis (5)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(606-637)src/storage/gcs.rs (1)
head
(574-601)src/storage/s3.rs (3)
head
(743-770)from
(1276-1284)from
(1288-1290)
src/query/stream_schema_provider.rs (2)
src/metrics/mod.rs (3)
increment_bytes_scanned_in_query_by_date
(678-682)increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_files_scanned_in_query_by_date
(672-676)src/catalog/mod.rs (2)
file_size
(60-60)file_size
(74-76)
src/storage/localfs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (1)
copy
(333-347)src/storage/azure_blob.rs (3)
get_ingestor_meta_file_paths
(719-761)delete_object
(781-812)check
(814-848)
src/storage/azure_blob.rs (7)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1063-1065)new
(78-87)upload_multipart
(216-220)head
(208-208)get_ingestor_meta_file_paths
(262-264)src/storage/mod.rs (4)
to_object_store_path
(293-295)new
(204-206)new
(217-223)from
(179-185)src/storage/localfs.rs (5)
new
(100-102)upload_multipart
(111-120)head
(132-151)get_ingestor_meta_file_paths
(217-274)from
(993-995)src/storage/gcs.rs (7)
resp
(1049-1054)resp
(1090-1095)_delete_prefix
(243-306)_list_dates
(308-350)upload_multipart
(566-572)head
(574-601)get_ingestor_meta_file_paths
(679-717)src/storage/s3.rs (9)
resp
(1214-1219)resp
(1255-1260)_delete_prefix
(405-468)_list_dates
(470-512)upload_multipart
(735-741)head
(743-770)get_ingestor_meta_file_paths
(844-882)from
(1276-1284)from
(1288-1290)
src/storage/s3.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(690-699)increment_object_store_calls_by_date
(684-688)status_code
(719-723)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1063-1065)new
(78-87)head
(208-208)get_ingestor_meta_file_paths
(262-264)check
(227-227)src/storage/mod.rs (4)
to_object_store_path
(293-295)new
(204-206)new
(217-223)from
(179-185)src/storage/azure_blob.rs (7)
resp
(1125-1130)resp
(1170-1175)_delete_prefix
(286-353)_list_dates
(355-401)head
(606-637)get_ingestor_meta_file_paths
(719-761)check
(814-848)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
🔇 Additional comments (2)
resources/formats.json (1)
1469-1471
: Rename ripple effects — verified: norust_server_logs
references;parseable_server_logs
present.
- resources/formats.json — line 1469:
"name": "parseable_server_logs"
.- src/event/format/known_schema.rs — test_parseable_server_logs() references
.get("parseable_server_logs")
(around lines 518–522).No matches for
rust_server_logs
found by the repository search.src/stats.rs (1)
25-25
: Importing IntCounterVec is fine.Needed for date-bucketed counters. No issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/metrics/mod.rs (1)
670-739
: Defensive check for date label format to avoid accidental high cardinality.These helpers accept &str; a stray RFC3339 timestamp would create per‑timestamp series. Add a debug assertion for “YYYY‑MM‑DD”.
@@ -// Helper functions for tracking billing metrics +// Helper functions for tracking billing metrics +#[inline] +fn assert_date_label(date: &str) { + debug_assert!( + date.len() == 10 && date.as_bytes()[4] == b'-' && date.as_bytes()[7] == b'-', + "date label must be normalized UTC in YYYY-MM-DD" + ); +} @@ pub fn increment_events_ingested_by_date(count: u64, date: &str) { + assert_date_label(date); TOTAL_EVENTS_INGESTED_BY_DATE .with_label_values(&[date]) .inc_by(count); } @@ pub fn increment_events_ingested_size_by_date(size: u64, date: &str) { + assert_date_label(date); TOTAL_EVENTS_INGESTED_SIZE_BY_DATE .with_label_values(&[date]) .inc_by(size); } @@ pub fn increment_parquets_stored_by_date(date: &str) { + assert_date_label(date); TOTAL_PARQUETS_STORED_BY_DATE .with_label_values(&[date]) .inc(); } @@ pub fn increment_parquets_stored_size_by_date(size: u64, date: &str) { + assert_date_label(date); TOTAL_PARQUETS_STORED_SIZE_BY_DATE .with_label_values(&[date]) .inc_by(size); } @@ pub fn increment_query_calls_by_date(date: &str) { + assert_date_label(date); TOTAL_QUERY_CALLS_BY_DATE.with_label_values(&[date]).inc(); } @@ pub fn increment_files_scanned_in_query_by_date(count: u64, date: &str) { + assert_date_label(date); TOTAL_FILES_SCANNED_IN_QUERY_BY_DATE .with_label_values(&[date]) .inc_by(count); } @@ pub fn increment_bytes_scanned_in_query_by_date(bytes: u64, date: &str) { + assert_date_label(date); TOTAL_BYTES_SCANNED_IN_QUERY_BY_DATE .with_label_values(&[date]) .inc_by(bytes); } @@ pub fn increment_object_store_calls_by_date(provider: &str, method: &str, date: &str) { + assert_date_label(date); TOTAL_OBJECT_STORE_CALLS_BY_DATE .with_label_values(&[provider, method, date]) .inc(); } @@ pub fn increment_files_scanned_in_object_store_calls_by_date( provider: &str, method: &str, count: u64, date: &str, ) { + assert_date_label(date); TOTAL_FILES_SCANNED_IN_OBJECT_STORE_CALLS_BY_DATE .with_label_values(&[provider, method, date]) .inc_by(count); } @@ pub fn increment_input_llm_tokens_by_date(provider: &str, model: &str, tokens: u64, date: &str) { + assert_date_label(date); TOTAL_INPUT_LLM_TOKENS_BY_DATE .with_label_values(&[provider, model, date]) .inc_by(tokens); } @@ pub fn increment_output_llm_tokens_by_date(provider: &str, model: &str, tokens: u64, date: &str) { + assert_date_label(date); TOTAL_OUTPUT_LLM_TOKENS_BY_DATE .with_label_values(&[provider, model, date]) .inc_by(tokens); }
🧹 Nitpick comments (4)
src/metrics/mod.rs (4)
161-195
: Clarify TOTAL_*_DATE help text to indicate “across all streams”.Given these are global per‑date totals (by format), tweak the help to prevent future confusion.
- "total_events_ingested_date", - "total events ingested on a particular date", + "total_events_ingested_date", + "Total events ingested on a particular date (across all streams)",Also consider similarly clarifying the two size variants’ help strings.
39-44
: Metric help strings: make units explicit (“in bytes”) and keep style consistent.Minor wording nits to improve clarity and consistency.
- "events_ingested_size", - "Events ingested size bytes for a stream", + "events_ingested_size", + "Events ingested size in bytes for a stream",- Opts::new("storage_size", "Storage size bytes for a stream").namespace(METRICS_NAMESPACE), + Opts::new("storage_size", "Storage size in bytes for a stream").namespace(METRICS_NAMESPACE),- "events_deleted_size", - "Events deleted size bytes for a stream", + "events_deleted_size", + "Events deleted size in bytes for a stream",- "Deleted events storage size bytes for a stream", + "Deleted events storage size in bytes for a stream",- "Lifetime events ingested size bytes for a stream", + "Lifetime events ingested size in bytes for a stream",- "Lifetime events storage size bytes for a stream", + "Lifetime events storage size in bytes for a stream",Also applies to: 51-52, 67-72, 81-82, 105-106, 117-118
471-478
: Latency histogram: set explicit buckets and include units in help.Default buckets often don’t match object‑store latencies. Also make units obvious in the help text.
-pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) - .expect("metric can be created") -}); +pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { + let buckets = prometheus::exponential_buckets(0.005, 2.0, 13).expect("buckets"); // 5ms .. ~40s + let mut opts = HistogramOpts::new( + "storage_request_response_time", + "Storage request latency in seconds", + ) + .namespace(METRICS_NAMESPACE); + opts.buckets = buckets; + HistogramVec::new(opts, &["provider", "method", "status"]).expect("metric can be created") +});
336-358
: LLM token metrics: guard model/provider cardinality.Free‑form model names can create high series counts. Consider a sanitizer/allow‑list or mapping (e.g., “gpt‑4.1‑mini” → “gpt-4.1-mini”), and avoid including org/project suffixes.
Would you like a tiny helper to normalize provider/model labels?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/metrics/mod.rs
(9 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/metrics/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (4)
src/metrics/mod.rs (4)
125-159
: Good move: switch per-date series to IntCounterVec.Counters for EVENTS_*_DATE align with monotonic growth and avoid gauge scrape races. Looks correct.
311-335
: Confirm “status” is a low‑cardinality code, not error strings.If “status” sometimes contains verbose error text, cardinality will explode. Recommend numeric HTTP‑style codes or a small enum (e.g., ok, timeout, 4xx, 5xx).
Also applies to: 471-478
360-469
: Cluster billing gauges look consistent with per‑node counters.Label sets and types make sense for aggregator set semantics. No issues spotted.
517-611
: Registration completeness LGTM.All newly added metrics are registered; ordering is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/metrics/mod.rs (1)
700-769
: Normalize “date” label formatting via typed UTC helpersAdd typed helpers in src/metrics/mod.rs that accept NaiveDate / DateTime, format to "YYYY-MM-DD", and expose new increment_*_by_utc_date wrappers; keep the existing &str variants for backward compatibility and migrate callsites to the typed APIs.
- Actionable changes:
- Replace usages that pass non‑UTC or unverified strings:
- src/event/mod.rs — change the parsed_timestamp.date().to_string() call (around the increment_events_ingested_by_date call) to use a UTC/NaiveDate helper.
- src/storage/object_storage.rs — ensure file_date_part is produced/normalized as a NaiveDate/UTC date before calling increment_parquets_stored_by_date(...) (calls at lines ~164–165).
- Note: most storage backends already use Utc::now().date_naive().to_string() (no change needed there), but unify all callsites to the new typed helpers to avoid split‑day series and double counting.
🧹 Nitpick comments (5)
src/metrics/mod.rs (5)
161-195
: Global totals by format/date look correct.Label set ["format","date"] for TOTAL_*_DATE matches the intended global aggregation (not per‑stream). Good.
Minor nit: align help string capitalization for consistency.
- "total events ingested on a particular date", + "Total events ingested on a particular date",
229-359
: Naming clarity: distinguish “billing” vs “cluster billing” vs “totals” with subsystem.Right now we have three families with very similar names:
- TOTAL_*_DATE (global-by-format gauges)
- TOTAL_*_BY_DATE (billing counters)
- TOTAL_CLUSTER_*_BY_DATE (cluster gauges)
To avoid confusion and dashboards mixing series, set a Prometheus subsystem:
- subsystem("billing") for TOTAL_*_BY_DATE
- subsystem("cluster_billing") for TOTAL_CLUSTER_*_BY_DATE
Apply pattern like:
- Opts::new( - "total_events_ingested_by_date", - "Total events ingested by date", - ) - .namespace(METRICS_NAMESPACE), + Opts::new( + "total_events_ingested_by_date", + "Total events ingested by date", + ) + .namespace(METRICS_NAMESPACE) + .subsystem("billing"),- Opts::new( - "total_cluster_events_ingested_by_date", - "Total cluster events ingested by date (Gauge for cluster billing)", - ) - .namespace(METRICS_NAMESPACE), + Opts::new( + "total_cluster_events_ingested_by_date", + "Total cluster events ingested by date (Gauge for cluster billing)", + ) + .namespace(METRICS_NAMESPACE) + .subsystem("cluster_billing"),Repeat for the rest of the billing/cluster metrics.
Also applies to: 360-493
495-502
: Latency histogram: set explicit buckets and constrain status codes.
- Default buckets may not suit object‑store latencies; define buckets.
- Ensure status label is a canonical 3‑digit code to avoid cardinality spikes.
Example buckets (seconds):
-pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { - HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") - .namespace(METRICS_NAMESPACE), - &["provider", "method", "status"], - ) +pub static STORAGE_REQUEST_RESPONSE_TIME: Lazy<HistogramVec> = Lazy::new(|| { + let mut opts = HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + .namespace(METRICS_NAMESPACE); + opts.buckets(vec![ + 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, + ]); + HistogramVec::new( + opts, + &["provider", "method", "status"], + )Please confirm your metrics layer emits numeric status codes (e.g., "200","404","500").
311-334
: Consider counting outcomes by status for error‑rate SLOs.Counters for object‑store calls and files‑scanned are not partitioned by status, only the latency histogram is. If you need rates like error_ratio per provider/method/date, add a status label to the counters too (or add a separate “total_object_store_errors_by_date”).
Do you plan to derive error rates solely from histograms, or do you want explicit counters by status?
Also applies to: 445-469
229-359
: Help text polish.Small nits for consistency and clarity (e.g., “bytes” suffix, pluralization). Optional, but makes dashboards clearer.
Suggested examples:
- "Total files scanned in queries by date", + "Total files scanned in queries (count) by date", - "Total bytes scanned in queries by date", + "Total bytes scanned in queries by date (bytes)",Apply similarly as you see fit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/metrics/mod.rs
(9 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/metrics/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
🔇 Additional comments (1)
src/metrics/mod.rs (1)
125-159
: Per-date metrics converted to counters — verify no .set()/.dec() callsites remainThese were Gauges before; callers must use .inc()/.inc_by() (remove any .set()/.dec()) for:
- EVENTS_INGESTED_DATE
- EVENTS_INGESTED_SIZE_DATE
- EVENTS_STORAGE_SIZE_DATE
Run locally (expect no matches):
rg -n -C2 -P '(EVENTS_INGESTED_DATE|EVENTS_INGESTED_SIZE_DATE|EVENTS_STORAGE_SIZE_DATE).*?\.(set|dec)\s*\('
607ffa0
to
5c990c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/query/stream_schema_provider.rs (1)
862-865
: Don’t unwrap timestamp millis (panic risk on out-of-range).Map None to None to preserve Option contract.
- ScalarValue::TimestampMillisecond(Some(value), _) => Some(( - binexpr.op, - DateTime::from_timestamp_millis(*value).unwrap().naive_utc(), - )), + ScalarValue::TimestampMillisecond(Some(value), _) => { + DateTime::from_timestamp_millis(*value) + .map(|dt| (binexpr.op, dt.naive_utc())) + }src/storage/object_storage.rs (1)
146-148
: Swallow metrics errors at call site.Even if update_storage_metrics returns Err in other scenarios, don’t abort upload.
- // Update storage metrics - update_storage_metrics(&path, &stream_name, filename)?; + // Update storage metrics (best-effort) + if let Err(e) = update_storage_metrics(&path, &stream_name, filename) { + warn!("update_storage_metrics failed for {filename}: {e}"); + }src/metadata.rs (1)
175-191
: Make load_daily_metrics idempotent — avoid double-counting countersEVENTS_INGESTED_DATE / EVENTS_INGESTED_SIZE_DATE / EVENTS_STORAGE_SIZE_DATE are IntCounterVecs; load_daily_metrics (src/metadata.rs) is called from migration.setup_logstream_metadata (src/migration/mod.rs:382) and currently .inc_by(...)s each manifest — repeated invocations will double-count. Compute and apply deltas (or ensure the function runs once per-stream) instead.
Suggested change (replace the three inc_by calls):
let labels = &[stream_name, "json", &manifest_date]; let curr = EVENTS_INGESTED_DATE .get_metric_with_label_values(labels) .map(|m| m.get() as u64) .unwrap_or(0); if events_ingested > curr { EVENTS_INGESTED_DATE.with_label_values(labels).inc_by(events_ingested - curr); } let size_labels = &[stream_name, "json", &manifest_date]; let curr_size = EVENTS_INGESTED_SIZE_DATE .get_metric_with_label_values(size_labels) .map(|m| m.get() as u64) .unwrap_or(0); if ingestion_size > curr_size { EVENTS_INGESTED_SIZE_DATE.with_label_values(size_labels) .inc_by(ingestion_size - curr_size); } let storage_labels = &["data", stream_name, "parquet", &manifest_date]; let curr_storage = EVENTS_STORAGE_SIZE_DATE .get_metric_with_label_values(storage_labels) .map(|m| m.get() as u64) .unwrap_or(0); if storage_size > curr_storage { EVENTS_STORAGE_SIZE_DATE.with_label_values(storage_labels) .inc_by(storage_size - curr_storage); }
♻️ Duplicate comments (10)
src/query/stream_schema_provider.rs (2)
600-606
: Remove query-layer pre-increment of object-store GETs (double-counts provider metrics).Provider layers already emit GET/HEAD; this inflates totals and can mislabel providers.
- let parquet_files_to_scan = manifest_files.len(); - increment_files_scanned_in_object_store_calls_by_date( - PARSEABLE.storage().name(), - "GET", - parquet_files_to_scan as u64, - &Utc::now().date_naive().to_string(), - );
871-872
: Avoid unwrap on string timestamp parse (panic on malformed input).Return None on parse failure.
- ScalarValue::Utf8(Some(str_value)) if is_time_partition => { - Some((binexpr.op, str_value.parse().unwrap())) - } + ScalarValue::Utf8(Some(str_value)) if is_time_partition => { + match str_value.parse() { + Ok(dt) => Some((binexpr.op, dt)), + Err(_) => None, + } + }src/storage/object_storage.rs (1)
173-189
: Don’t fail uploads on local metadata read; make metrics best-effort.An Err here aborts manifest creation after a successful upload, leaving orphaned objects.
- let compressed_size = path - .metadata() - .map(|m| m.len()) - .map_err(|e| ObjectStorageError::Custom(format!("metadata failed for {filename}: {e}")))?; + let compressed_size = match path.metadata().map(|m| m.len()) { + Ok(len) => len, + Err(e) => { + warn!("metadata() failed for {filename}: {e}; skipping size-based metrics"); + 0 + } + }; @@ - TOTAL_EVENTS_STORAGE_SIZE_DATE - .with_label_values(&["parquet", file_date_part]) - .add(compressed_size as i64); + if compressed_size > 0 { + TOTAL_EVENTS_STORAGE_SIZE_DATE + .with_label_values(&["parquet", file_date_part]) + .add(compressed_size as i64); + // billing metrics for parquet storage + increment_parquets_stored_by_date(file_date_part); + increment_parquets_stored_size_by_date(compressed_size, file_date_part); + } - - // billing metrics for parquet storage - increment_parquets_stored_by_date(file_date_part); - increment_parquets_stored_size_by_date(compressed_size, file_date_part);src/query/listing_table_builder.rs (1)
98-111
: Preserve parent prefix when flattening listings (fixes invalid URIs).Join child names with their base prefix before adding to listing.
- // Use storage.list_dirs_relative for all prefixes and flatten results - let mut listing = Vec::new(); - for prefix in prefixes { - match storage.list_dirs_relative(&prefix).await { - Ok(paths) => { - listing.extend(paths.into_iter().map(|p| p.to_string())); - } - Err(e) => { - return Err(DataFusionError::External(Box::new(e))); - } - } - } + // Use storage.list_dirs_relative for all prefixes and flatten results, preserving base + let mut listing = Vec::new(); + for prefix in prefixes { + let base = prefix.as_str().to_owned(); + match storage.list_dirs_relative(&prefix).await { + Ok(children) => { + listing.extend(children.into_iter().map(|c| format!("{}/{}", base, c))); + } + Err(e) => { + return Err(DataFusionError::External(Box::new(e))); + } + } + }src/storage/azure_blob.rs (2)
308-309
: Files deleted counter placement issue.The
files_deleted
counter is incremented before the actual delete operation. If the delete fails, the metric will be incorrect.Move the increment to after successful deletion:
Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "azure_blob", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "DELETE", "200"]) .observe(delete_elapsed); }
683-696
: Remove duplicate GET metrics in get_objects loop.The loop records GET metrics that are already being recorded by the
get_object()
call. This causes double-counting of GET operations and incorrect latency attribution (using list_start for GET timing).Remove the duplicate metrics:
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - ); res.push(byts);src/storage/gcs.rs (2)
261-261
: Files deleted counter placement issue.Same issue as in Azure - the counter is incremented before the delete operation.
Move the increment after successful deletion:
Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "gcs", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); }
647-660
: Remove duplicate GET metrics in get_objects loop.Same double-counting issue as in Azure implementation.
Remove the duplicate metrics:
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - ); res.push(byts);src/storage/s3.rs (2)
423-423
: Files deleted counter placement issue.Same pre-delete increment issue as in other storage backends.
Move the increment after successful deletion:
Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); increment_object_store_calls_by_date( "s3", "DELETE", &Utc::now().date_naive().to_string(), ); match delete_resp { Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "DELETE", "200"]) .observe(delete_elapsed); }
816-825
: Remove duplicate GET metrics in get_objects loop.Same double-counting issue present in all three storage backends.
Remove the duplicate metrics:
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string()); res.push(byts);
🧹 Nitpick comments (13)
resources/formats.json (2)
1472-1483
: Make timestamp TZ-flexible (support offsets as well as Z, still optional)Broaden parsing to accept both "Z" and "+/-HH:MM" (or none), matching existing behavior while future‑proofing.
- "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+Z?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.*?):(?P<line_number>\\d+):\\s+(?P<body>.*)", + "pattern": "^(?P<customer_id>\\S+)\\s+(?P<deployment_id>\\S+)\\s+(?P<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d+(?:Z|[+-]\\d{2}:\\d{2})?)\\s+(?P<level>\\w+)\\s+(?P<logger_context>\\S+)\\s+(?P<thread_id>ThreadId\\(\\d+\\))\\s+(?P<module>.*?):(?P<line_number>\\d+):\\s+(?P<body>.*)",
1469-1516
: Consistency: align TZ handling across all parseable_server_logs patternsThe first and fourth patterns accept optional Z; the second requires Z. Consider normalizing TZ handling across the group for predictability.
src/event/format/known_schema.rs (3)
527-531
: Broaden test coverage: add cases without 'Z' and with TZ offsetsTo lock in the intended tolerance from formats.json, add one line without 'Z' and one with a "+00:00"/"-05:30" offset.
541-544
: Fix stale assert message: s/rust/parseable/Message still says "rust server log".
- "Failed to extract fields from rust server log {}: {}", + "Failed to extract fields from parseable server log {}: {}",
569-585
: Assert new prefix fields are extractedSince the first pattern adds customer_id and deployment_id, assert their presence to prevent regressions.
if i < 4 { + assert!( + obj.contains_key("customer_id"), + "Missing customer_id field for log {}", + i + 1 + ); + assert!( + obj.contains_key("deployment_id"), + "Missing deployment_id field for log {}", + i + 1 + ); assert!( obj.contains_key("logger_context"),src/query/stream_schema_provider.rs (3)
401-405
: Set total_byte_size for better planning.Provide exact total bytes to DataFusion to improve stats-based planning.
let statistics = datafusion::common::Statistics { num_rows: Precision::Exact(count as usize), - total_byte_size: Precision::Absent, + total_byte_size: Precision::Exact(total_file_size as usize), column_statistics: statistics, };Also applies to: 407-411
172-183
: Fix typo in method name (execution).Rename to get_hot_tier_execution_plan for clarity.
- async fn get_hottier_exectuion_plan( + async fn get_hot_tier_execution_plan( &self, @@ - self.get_hottier_exectuion_plan( + self.get_hot_tier_execution_plan(Also applies to: 583-593
359-363
: Verify storage accessor consistency (field vs method).Repo uses both PARSEABLE.storage.name() and PARSEABLE.storage().name(); make them consistent.
Field-access occurrences: src/storage/store_metadata.rs:77, src/query/stream_schema_provider.rs:359, src/metastore/metastores/object_store_metastore.rs:736, src/handlers/http/modal/query_server.rs:92, src/handlers/http/modal/ingest_server.rs:90.
Method-occurrence: src/query/stream_schema_provider.rs:602.
Action: choose one style (recommend PARSEABLE.storage().name()) and update the others, or confirm the PARSEABLE type intentionally exposes both.src/metrics/mod.rs (2)
161-195
: Unify help text casing and intent.Minor polish: start sentences with uppercase and align phrasing.
- "total events ingested on a particular date", + "Total events ingested on a particular date", @@ - "Total events ingested size in bytes on a particular date", + "Total events ingested size in bytes on a particular date", @@ - "Total events storage size in bytes on a particular date", + "Total events storage size in bytes on a particular date",
700-769
: Centralize UTC date label formatting to avoid drift.Add a small helper to produce YYYY-MM-DD (UTC) and use it at call sites over repeating Utc::now().date_naive().to_string().
Example helper to add in this module:
#[inline] pub fn utc_date_label(now: chrono::DateTime<chrono::Utc>) -> String { now.date_naive().to_string() }src/storage/object_storage.rs (1)
171-173
: Safer date extraction from filename.Avoid panics on unexpected filenames.
- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; + let file_date_part = filename + .split('.') + .next() + .and_then(|s| s.split('=').nth(1)) + .ok_or_else(|| ObjectStorageError::Custom(format!("bad parquet filename: {filename}")))?;src/storage/metrics_layer.rs (1)
294-298
: Missing provider label in StreamMetricWrapper.The
StreamMetricWrapper
forlist()
still uses hardcoded labels["LIST", "200"]
without the provider dimension. This breaks consistency with other operations that use[&self.provider, "LIST", status]
.Update the
StreamMetricWrapper
to include the provider label:fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list(prefix); let res = StreamMetricWrapper { time, - labels: ["LIST", "200"], + provider: self.provider.clone(), + operation: "LIST", inner, }; Box::pin(res) }Also update the
StreamMetricWrapper
struct and its Stream implementation accordingly:-struct StreamMetricWrapper<'a, const N: usize, T> { +struct StreamMetricWrapper<'a, T> { time: time::Instant, - labels: [&'static str; N], + provider: String, + operation: &'static str, inner: BoxStream<'a, T>, } -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; fn poll_next( mut self: std::pin::Pin<&mut Self>, cx: &mut Context<'_>, ) -> Poll<Option<Self::Item>> { match self.inner.poll_next_unpin(cx) { t @ Poll::Ready(None) => { STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) + .with_label_values(&[&self.provider, self.operation, "200"]) .observe(self.time.elapsed().as_secs_f64()); t } t => t, } } }src/storage/s3.rs (1)
653-658
: Consider removing per-part files_scanned increment for multipart uploads.The
increment_files_scanned_in_object_store_calls_by_date
call increments by 1 for each part of a multipart upload. This doesn't accurately represent "files scanned" since all parts belong to a single file. Consider tracking this only once per complete file upload rather than per part.Track files scanned only after successful multipart completion:
Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "PUT_MULTIPART", "200"]) .observe(part_elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "PUT_MULTIPART", - 1, - &Utc::now().date_naive().to_string(), - ); }And add it after successful completion:
} else { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "PUT_MULTIPART_COMPLETE", "200"]) .observe(complete_elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "s3", + "PUT_MULTIPART", + 1, + &Utc::now().date_naive().to_string(), + ); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
resources/formats.json
(1 hunks)src/catalog/mod.rs
(1 hunks)src/event/format/known_schema.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/stream_schema_provider.rs
(7 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(14 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(21 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/modal/server.rs
- src/handlers/http/modal/query_server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- src/catalog/mod.rs
- src/event/mod.rs
- src/storage/localfs.rs
- src/stats.rs
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
Applied to files:
src/handlers/http/query.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/storage/object_storage.rs
src/storage/metrics_layer.rs
src/storage/s3.rs
src/metadata.rs
src/storage/azure_blob.rs
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/gcs.rs
src/storage/object_storage.rs
src/metadata.rs
src/metrics/mod.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/object_storage.rs
src/storage/s3.rs
src/storage/azure_blob.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-09-14T15:17:59.234Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1432
File: src/storage/object_storage.rs:124-132
Timestamp: 2025-09-14T15:17:59.234Z
Learning: In Parseable's upload validation system (src/storage/object_storage.rs), the validate_uploaded_parquet_file function should not include bounded retries for metadata consistency issues. Instead, failed validations rely on the 30-second sync cycle for natural retries, with staging files preserved when manifest_file is set to None.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/metrics_layer.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.
Applied to files:
src/query/stream_schema_provider.rs
🧬 Code graph analysis (9)
src/handlers/http/query.rs (1)
src/metrics/mod.rs (1)
increment_query_calls_by_date
(725-727)
src/storage/gcs.rs (4)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (2)
error_to_status_code
(36-62)new
(71-76)src/storage/object_storage.rs (2)
parseable_json_path
(1129-1131)new
(78-87)src/storage/mod.rs (4)
to_object_store_path
(293-295)new
(204-206)new
(217-223)from
(179-185)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(713-717)increment_parquets_stored_size_by_date
(719-723)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(606-637)src/storage/gcs.rs (1)
head
(574-601)src/storage/s3.rs (3)
head
(743-770)from
(1276-1284)from
(1288-1290)
src/storage/s3.rs (2)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (2)
error_to_status_code
(36-62)new
(71-76)
src/metadata.rs (1)
src/catalog/mod.rs (4)
num_rows
(61-61)num_rows
(78-80)ingestion_size
(58-58)ingestion_size
(70-72)
src/storage/azure_blob.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1129-1131)new
(78-87)head
(273-273)get_ingestor_meta_file_paths
(327-329)check
(292-292)src/storage/gcs.rs (7)
resp
(1049-1054)resp
(1090-1095)_delete_prefix
(243-306)_list_dates
(308-350)head
(574-601)get_ingestor_meta_file_paths
(679-717)check
(766-796)src/storage/s3.rs (9)
resp
(1214-1219)resp
(1255-1260)_delete_prefix
(405-468)_list_dates
(470-512)head
(743-770)get_ingestor_meta_file_paths
(844-882)from
(1276-1284)from
(1288-1290)check
(931-961)
src/query/stream_schema_provider.rs (2)
src/metrics/mod.rs (3)
increment_bytes_scanned_in_query_by_date
(735-739)increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_files_scanned_in_query_by_date
(729-733)src/catalog/mod.rs (2)
file_size
(60-60)file_size
(74-76)
src/metrics/mod.rs (1)
src/handlers/http/cluster/utils.rs (4)
new
(39-51)new
(66-84)new
(99-117)new
(129-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (9)
src/event/format/known_schema.rs (1)
518-523
: Good rename and schema lookup switchTest now targets the new public name; ok.
src/metadata.rs (1)
63-69
: Global totals wiring looks good.Per design, totals aggregate across streams via ["format","date"] and use gauges to allow decrements.
src/storage/metrics_layer.rs (3)
36-62
: Good addition for error-to-status mapping.The
error_to_status_code
helper provides consistent error-to-HTTP-status mapping across all storage backends. The categorization is logical and well-documented.
71-75
: Constructor signature change looks good.The addition of the
provider
parameter enables provider-scoped metrics as intended.
93-106
: Consistent metric instrumentation pattern.The implementation correctly captures the result first, determines the status, records metrics with provider-aware labels, and returns the original result. This pattern is consistently applied across all methods.
src/storage/azure_blob.rs (2)
227-237
: Good handling of body read errors.The code properly handles potential errors from
resp.bytes().await
instead of usingunwrap()
, which addresses previous concerns about panics.
478-506
: Good implementation of small file PUT in multipart path.The code correctly handles small files by falling back to a simple PUT and includes proper metrics tracking for this case.
src/storage/gcs.rs (1)
188-198
: Proper body read error handling implemented.Good implementation that handles
resp.bytes().await
errors properly instead of usingunwrap()
.src/storage/s3.rs (1)
350-360
: Good implementation of error-safe body reading.The code properly handles potential errors from
resp.bytes().await
instead of using unwrap.
7a68e7b
to
9fe03ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/metrics_layer.rs (1)
290-315
: Critical: label cardinality mismatch for LIST/LIST_OFFSET.STORAGE_REQUEST_RESPONSE_TIME expects 3 labels [provider, operation, status], but StreamMetricWrapper is initialized with only ["LIST"/"LIST_OFFSET", "200"]. This will panic at runtime.
Apply:
- fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + operation: "LIST", + status: "200", + inner, + }; Box::pin(res) } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; + let res = StreamMetricWrapper { + time, + provider: &self.provider, + operation: "LIST_OFFSET", + status: "200", + inner, + }; Box::pin(res) }And update the wrapper:
-struct StreamMetricWrapper<'a, const N: usize, T> { - time: time::Instant, - labels: [&'static str; N], - inner: BoxStream<'a, T>, -} +struct StreamMetricWrapper<'a, T> { + time: time::Instant, + provider: &'a str, + operation: &'static str, + status: &'static str, + inner: BoxStream<'a, T>, +} @@ -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; @@ - t @ Poll::Ready(None) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) - .observe(self.time.elapsed().as_secs_f64()); + t @ Poll::Ready(None) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[self.provider, self.operation, self.status]) + .observe(self.time.elapsed().as_secs_f64()); t }
♻️ Duplicate comments (16)
src/handlers/http/query.rs (2)
126-129
: Good: count after auth only.Increment moved post-authorization; prevents billing unauthorized requests.
355-358
: Good: counts API increments after auth.Matches the intended billing semantics.
src/query/listing_table_builder.rs (1)
101-107
: Bug: child entries lose their parent prefix (regression of earlier fix).list_dirs_relative returns children under the given prefix; pushing p.to_string() drops the base, producing invalid URIs later. Join prefix + child before extending listing.
Apply:
- for prefix in prefixes { - match storage.list_dirs_relative(&prefix).await { + for prefix in prefixes { + let base = prefix.as_str().to_owned(); + match storage.list_dirs_relative(&prefix).await { Ok(paths) => { - listing.extend(paths.into_iter().map(|p| p.to_string())); + listing.extend(paths.into_iter().map(|c| format!("{}/{}", base, c))); } Err(e) => { return Err(DataFusionError::External(Box::new(e))); } } }src/stats.rs (1)
144-152
: Avoid negative global totals; use saturating update instead of sub.IntGaugeVec can go negative; clamp at zero to keep “TOTAL_*” semantics sane and log once.
Apply:
- TOTAL_EVENTS_INGESTED_DATE - .with_label_values(&["json", &manifest_date]) - .sub(manifest.events_ingested as i64); + { + let g = TOTAL_EVENTS_INGESTED_DATE.with_label_values(&["json", &manifest_date]); + let cur = g.get(); + let dec = manifest.events_ingested as i64; + let newv = cur.saturating_sub(dec); + if newv != cur - dec { + warn!("Clamped TOTAL_EVENTS_INGESTED_DATE below zero for date={}", manifest_date); + } + g.set(newv); + } - TOTAL_EVENTS_INGESTED_SIZE_DATE - .with_label_values(&["json", &manifest_date]) - .sub(manifest.ingestion_size as i64); + { + let g = TOTAL_EVENTS_INGESTED_SIZE_DATE.with_label_values(&["json", &manifest_date]); + let cur = g.get(); + let dec = manifest.ingestion_size as i64; + let newv = cur.saturating_sub(dec); + if newv != cur - dec { + warn!("Clamped TOTAL_EVENTS_INGESTED_SIZE_DATE below zero for date={}", manifest_date); + } + g.set(newv); + } - TOTAL_EVENTS_STORAGE_SIZE_DATE - .with_label_values(&["parquet", &manifest_date]) - .sub(manifest.storage_size as i64); + { + let g = TOTAL_EVENTS_STORAGE_SIZE_DATE.with_label_values(&["parquet", &manifest_date]); + let cur = g.get(); + let dec = manifest.storage_size as i64; + let newv = cur.saturating_sub(dec); + if newv != cur - dec { + warn!("Clamped TOTAL_EVENTS_STORAGE_SIZE_DATE below zero for date={}", manifest_date); + } + g.set(newv); + }src/storage/azure_blob.rs (3)
308-330
: Count deletions only on success.files_deleted is incremented before the delete; move it into the Ok(_) arm post-delete.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "DELETE", "200"]) .observe(delete_elapsed); }
683-697
: Remove duplicate GET metrics in loop; _get_object() already records them.This double-counts GETs and skews latency (uses list_start). Drop this block.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - );
215-244
: Measure full GET latency and handle body-read errors without pre-timing.elapsed is captured before reading the body; 200 is emitted even if body read fails. Compute elapsed after bytes().await and on the error path too.
- let elapsed = time.elapsed().as_secs_f64(); + // elapsed is computed after body read or error increment_object_store_calls_by_date( "azure_blob", "GET", &Utc::now().date_naive().to_string(), ); match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", "200"]) - .observe(elapsed); + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "azure_blob","GET",1,&Utc::now().date_naive().to_string(), + ); + return Ok(body); + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); + return Err(err.into()); + } + }, - increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } Err(err) => { - let status_code = error_to_status_code(&err); - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["azure_blob", "GET", status_code]) - .observe(elapsed); + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["azure_blob", "GET", status_code]) + .observe(elapsed); Err(err.into()) } }src/storage/gcs.rs (3)
179-208
: Measure full GET latency and handle body-read errors without pre-timing.elapsed is computed before bytes().await; record timing after body read and on error.
- let elapsed = time.elapsed().as_secs_f64(); + // compute elapsed after body read or error increment_object_store_calls_by_date("gcs", "GET", &Utc::now().date_naive().to_string()); match resp { - Ok(resp) => { - let body: Bytes = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(elapsed); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) - } + Ok(resp) => match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "gcs","GET",1,&Utc::now().date_naive().to_string(), + ); + Ok(body) + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["gcs", "GET", status_code]) + .observe(elapsed); + Err(err.into()) + } + }, Err(err) => { - let status_code = error_to_status_code(&err); + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "GET", status_code]) .observe(elapsed); Err(err.into()) } }
261-283
: Count deletions only on success.Move files_deleted.fetch_add into the Ok(_) arm after a successful delete.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["gcs", "DELETE", "200"]) .observe(delete_elapsed); }
647-660
: Remove duplicate GET metrics inside get_objects().Let get_object() be the single source of GET instrumentation.
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["gcs", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - );src/storage/localfs.rs (2)
133-151
: Don’t record metrics for an unimplemented operation.head() always errors; drop the increments to avoid misleading data.
- // Record attempt to access file (even though operation not implemented) - increment_files_scanned_in_object_store_calls_by_date( - "localfs", - "HEAD", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "localfs", - "HEAD", - &Utc::now().date_naive().to_string(), - );
408-431
: Emit DELETE call counter for delete_prefix.Add increment_object_store_calls_by_date("localfs","DELETE",date) alongside timing.
match &result { Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", "200"]) .observe(delete_elapsed); + increment_object_store_calls_by_date( + "localfs","DELETE",&Utc::now().date_naive().to_string(), + ); } Err(err) => { @@ STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "DELETE", status_code]) .observe(delete_elapsed); + increment_object_store_calls_by_date( + "localfs","DELETE",&Utc::now().date_naive().to_string(), + ); } }src/storage/s3.rs (3)
421-446
: Count deletions only on success.Move files_deleted.fetch_add into the successful delete branch.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_start = Instant::now(); let delete_resp = self.client.delete(&obj.location).await; let delete_elapsed = delete_start.elapsed().as_secs_f64(); @@ - Ok(_) => { + Ok(_) => { + files_deleted.fetch_add(1, Ordering::Relaxed); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "DELETE", "200"]) .observe(delete_elapsed); }
816-826
: Remove duplicate GET metrics inside get_objects().These duplicate metrics are already emitted by get_object().
- STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(list_start.elapsed().as_secs_f64()); - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
342-369
: Measure full GET latency and handle body-read errors without pre-timing.Compute elapsed after bytes().await and on error; avoid pre-200 on body-read failure.
- let elapsed = time.elapsed().as_secs_f64(); + // compute elapsed after body read or error increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string()); match resp { Ok(resp) => { - let body = resp.bytes().await?; - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&["s3", "GET", "200"]) - .observe(elapsed); + match resp.bytes().await { + Ok(body) => { + let elapsed = time.elapsed().as_secs_f64(); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", "200"]) + .observe(elapsed); + increment_files_scanned_in_object_store_calls_by_date( + "s3","GET",1,&Utc::now().date_naive().to_string(), + ); + return Ok(body); + } + Err(err) => { + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&["s3", "GET", status_code]) + .observe(elapsed); + return Err(err.into()); + } + } - increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - Ok(body) } Err(err) => { - let status_code = error_to_status_code(&err); + let elapsed = time.elapsed().as_secs_f64(); + let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["s3", "GET", status_code]) .observe(elapsed); Err(err.into()) } }src/metrics/mod.rs (1)
161-195
: Global totals by date (gauge) with ["format","date"]: confirmed intention.This aligns with the documented design to aggregate across all streams for a given date. Ensure these are set() by a single authority to avoid contention.
Are there dedicated setters (or a single task) that computes and set()s these gauges once per date roll? If not, consider adding small helpers to prevent accidental inc_by on gauges elsewhere.
🧹 Nitpick comments (20)
src/query/listing_table_builder.rs (1)
111-111
: Optional: de-dup results before sorting.If multiple prefixes can yield the same child, dedup to avoid redundant paths.
- let listing = listing.into_iter().sorted().rev().collect_vec(); + let mut listing = listing; + listing.sort_unstable(); + listing.dedup(); + listing.reverse();src/handlers/http/query.rs (1)
126-129
: Nit: use imported Utc to be consistent.Minor consistency touch-up.
- let current_date = chrono::Utc::now().date_naive().to_string(); + let current_date = Utc::now().date_naive().to_string();Also applies to: 355-358
src/storage/metrics_layer.rs (1)
35-62
: Optional: reconsider mapping Generic -> 400.object_store::Error::Generic often wraps server-side faults; 500 may be a safer default to avoid misclassifying provider issues as client errors.
src/storage/azure_blob.rs (2)
561-578
: Emit call counter for multipart completion.PUT_MULTIPART_COMPLETE lacks increment_object_store_calls_by_date; add it before evaluating complete_result for consistency with other ops.
let complete_result = async_writer.complete().await; let complete_elapsed = complete_start.elapsed().as_secs_f64(); + increment_object_store_calls_by_date( + "azure_blob", + "PUT_MULTIPART_COMPLETE", + &Utc::now().date_naive().to_string(), + ); if let Err(err) = complete_result { let status_code = error_to_status_code(&err); STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["azure_blob", "PUT_MULTIPART_COMPLETE", status_code]) .observe(complete_elapsed);
302-337
: Bound delete concurrency; avoid unbounded parallel deletes.for_each_concurrent(None, …) can overwhelm the backend; cap with MAX_OBJECT_STORE_REQUESTS.
- object_stream - .for_each_concurrent(None, |x| async { + object_stream + .for_each_concurrent(Some(super::MAX_OBJECT_STORE_REQUESTS), |x| async {src/query/stream_schema_provider.rs (2)
852-866
: Avoid unwrap on out-of-range timestamps.DateTime::from_timestamp_millis(...).unwrap() can panic on invalid ranges; mirror the Utf8 path: return None on invalid millis.
- ScalarValue::TimestampMillisecond(Some(value), _) => Some(( - binexpr.op, - DateTime::from_timestamp_millis(*value).unwrap().naive_utc(), - )), + ScalarValue::TimestampMillisecond(Some(value), _) => { + DateTime::from_timestamp_millis(*value) + .map(|dt| (binexpr.op, dt.naive_utc())) + }
326-332
: Tiny nit: compute target_partition once; name typo nearby.OK as-is. Optional: fix “get_hottier_exectuion_plan” typo and consider caching num_cpus() if called often.
src/storage/gcs.rs (2)
500-521
: Emit call counter for multipart completion.Add increment_object_store_calls_by_date for PUT_MULTIPART_COMPLETE to mirror Azure/S3 patterns.
let complete_result = async_writer.complete().await; let complete_elapsed = complete_start.elapsed().as_secs_f64(); + increment_object_store_calls_by_date( + "gcs", + "PUT_MULTIPART_COMPLETE", + &Utc::now().date_naive().to_string(), + ); if let Err(err) = complete_result {
255-290
: Bound delete concurrency.Use a concurrency cap to prevent thundering-herd deletes.
- object_stream - .for_each_concurrent(None, |x| async { + object_stream + .for_each_concurrent(Some(super::MAX_OBJECT_STORE_REQUESTS), |x| async {src/storage/localfs.rs (4)
316-321
: Avoid expect() on non‑UTF8 filenames.Use to_string_lossy or return an error; expect can panic.
- .to_str() - .expect("file name is parseable to str") - .to_owned(); + .to_string_lossy() + .into_owned();
347-359
: Fix GET counters: count only attempted reads and increment calls per attempt.Currently counts all dir entries as GET “files scanned” and increments call counter once. Count only filtered files and bump call counter per read.
- let mut files_scanned = 0; + let mut get_attempts = 0; @@ - files_scanned += 1; let ingestor_file = filter_func(path); if !ingestor_file { continue; } + get_attempts += 1; + increment_object_store_calls_by_date( + "localfs","GET",&Utc::now().date_naive().to_string(), + ); let file_read_start = Instant::now(); let file_result = fs::read(entry.path()).await; let file_read_elapsed = file_read_start.elapsed().as_secs_f64(); @@ // Record total files scanned increment_files_scanned_in_object_store_calls_by_date( "localfs", "GET", - files_scanned as u64, + get_attempts as u64, &Utc::now().date_naive().to_string(), ); - increment_object_store_calls_by_date( - "localfs", - "GET", - &Utc::now().date_naive().to_string(), - );Also applies to: 327-345
854-872
: Also increment “files scanned” for PUT on success.Align with S3/Azure/GCS: add a PUT files_scanned increment.
Ok(_) => { STORAGE_REQUEST_RESPONSE_TIME .with_label_values(&["localfs", "PUT", "200"]) .observe(upload_elapsed); increment_object_store_calls_by_date( "localfs", "PUT", &Utc::now().date_naive().to_string(), ); + increment_files_scanned_in_object_store_calls_by_date( + "localfs","PUT",1,&Utc::now().date_naive().to_string(), + ); Ok(()) }
680-698
: Optional: increment LIST call counters for list_dirs to match other providers.*You already record timings; add increment_object_store_calls_by_date("localfs","LIST",date) on success for parity.
Also applies to: 722-745
src/storage/s3.rs (3)
672-690
: Emit call counter for multipart completion.Add increment_object_store_calls_by_date("s3","PUT_MULTIPART_COMPLETE",date).
let complete_result = async_writer.complete().await; let complete_elapsed = complete_start.elapsed().as_secs_f64(); + increment_object_store_calls_by_date( + "s3", + "PUT_MULTIPART_COMPLETE", + &Utc::now().date_naive().to_string(), + ); if let Err(err) = complete_result {
417-452
: Bound delete concurrency.Use Some(MAX_OBJECT_STORE_REQUESTS) instead of unlimited concurrency for deletes.
- object_stream - .for_each_concurrent(None, |x| async { + object_stream + .for_each_concurrent(Some(super::MAX_OBJECT_STORE_REQUESTS), |x| async {
653-659
: Unify semantics: avoid treating multipart “parts” as “files scanned”.Other providers don’t increment files_scanned per part. Consider removing this increment or switching to a dedicated parts counter if needed.
src/metrics/mod.rs (4)
229-359
: Cardinality guardrails for provider/method/model/date labels.Daily series are fine, but please ensure provider/method/model values are from bounded enums to prevent unbounded time‑series growth (e.g., map SDK method names to a fixed set).
If helpful, I can add constants and a mapping layer in metrics_layer to sanitize these labels.
360-494
: Cluster totals as gauges: clarify writer and rollover.Gauge choice makes sense for externally aggregated cluster numbers. Confirm:
- a single writer updates them;
- rollover behavior at UTC midnight is defined (e.g., recompute vs. reset).
Consider lightweight setters here (e.g., set_cluster_*_by_date) to keep write patterns consistent and discourage local inc/dec.
495-502
: Name should include unit suffix “_seconds”.Prometheus best practice: duration metrics end with _seconds. Recommend renaming now before dashboards depend on it.
Apply:
- HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + HistogramOpts::new( + "storage_request_response_seconds", + "Storage request latency in seconds" + ) .namespace(METRICS_NAMESPACE),Optionally add custom buckets later if S3/GCS tail latencies warrant it.
700-769
: Date label hygiene: consider centralizing UTC formatting.Helpers accept &str; easy to pass non‑UTC or non‑YYYY‑MM‑DD strings by mistake. Consider adding overloads that accept DateTime and format internally, or add a debug_assert on format.
I can add small helpers (e.g., increment_*_by_utc_date(dt: DateTime)) and update callsites on request.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
resources/formats.json
(1 hunks)src/catalog/mod.rs
(1 hunks)src/event/format/known_schema.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/stream_schema_provider.rs
(6 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(14 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(21 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/ingest_server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/event/mod.rs
- src/storage/object_storage.rs
- src/catalog/mod.rs
- resources/formats.json
- src/event/format/known_schema.rs
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/gcs.rs
src/stats.rs
src/storage/localfs.rs
src/metadata.rs
src/storage/azure_blob.rs
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/gcs.rs
src/stats.rs
src/storage/localfs.rs
src/metadata.rs
src/storage/azure_blob.rs
src/metrics/mod.rs
📚 Learning: 2025-09-18T09:52:07.536Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.536Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/storage/gcs.rs
src/storage/metrics_layer.rs
src/storage/s3.rs
src/query/stream_schema_provider.rs
src/storage/localfs.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/query/stream_schema_provider.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/gcs.rs
src/storage/s3.rs
src/storage/localfs.rs
src/storage/azure_blob.rs
📚 Learning: 2025-09-18T09:59:20.167Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.167Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
Applied to files:
src/stats.rs
src/metadata.rs
src/metrics/mod.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/metrics_layer.rs
src/storage/azure_blob.rs
📚 Learning: 2025-08-18T12:37:47.732Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.
Applied to files:
src/query/stream_schema_provider.rs
📚 Learning: 2025-06-18T12:44:31.983Z
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
Applied to files:
src/handlers/http/query.rs
📚 Learning: 2025-08-21T12:04:38.398Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T12:04:38.398Z
Learning: In Parseable's timestamp extraction (extract_timestamp_for_date in src/storage/object_storage.rs), minute-level precision is sufficient for both first and latest event timestamps. Seconds are always set to 0, meaning both first and latest events within the same minute will have identical timestamps, which is the intended behavior.
Applied to files:
src/metrics/mod.rs
🧬 Code graph analysis (8)
src/storage/gcs.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/localfs.rs (4)
new
(100-102)head
(132-151)get_ingestor_meta_file_paths
(217-274)check
(475-504)src/storage/azure_blob.rs (6)
_delete_prefix
(286-353)_list_dates
(355-401)_upload_file
(403-436)head
(606-637)get_ingestor_meta_file_paths
(719-761)check
(814-848)src/storage/s3.rs (6)
_delete_prefix
(405-468)_list_dates
(470-512)_upload_file
(514-543)head
(743-770)get_ingestor_meta_file_paths
(844-882)check
(931-961)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(606-637)src/storage/gcs.rs (1)
head
(574-601)src/storage/s3.rs (3)
head
(743-770)from
(1276-1284)from
(1288-1290)
src/storage/s3.rs (5)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1129-1131)new
(78-87)head
(273-273)get_ingestor_meta_file_paths
(327-329)check
(292-292)src/storage/mod.rs (4)
to_object_store_path
(293-295)new
(204-206)new
(217-223)from
(179-185)src/storage/azure_blob.rs (7)
resp
(1125-1130)resp
(1170-1175)_delete_prefix
(286-353)_list_dates
(355-401)head
(606-637)get_ingestor_meta_file_paths
(719-761)check
(814-848)
src/query/stream_schema_provider.rs (2)
src/metrics/mod.rs (2)
increment_bytes_scanned_in_query_by_date
(735-739)increment_files_scanned_in_query_by_date
(729-733)src/catalog/mod.rs (2)
file_size
(60-60)file_size
(74-76)
src/storage/localfs.rs (3)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (1)
copy
(333-347)src/storage/azure_blob.rs (1)
get_ingestor_meta_file_paths
(719-761)
src/metadata.rs (1)
src/catalog/mod.rs (4)
num_rows
(61-61)num_rows
(78-80)ingestion_size
(58-58)ingestion_size
(70-72)
src/storage/azure_blob.rs (6)
src/metrics/mod.rs (3)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)status_code
(788-792)src/storage/metrics_layer.rs (3)
error_to_status_code
(36-62)new
(71-76)head
(251-265)src/storage/object_storage.rs (5)
parseable_json_path
(1129-1131)new
(78-87)upload_multipart
(281-285)head
(273-273)get_ingestor_meta_file_paths
(327-329)src/storage/localfs.rs (5)
new
(100-102)upload_multipart
(111-120)head
(132-151)get_ingestor_meta_file_paths
(217-274)from
(1003-1005)src/storage/gcs.rs (5)
_delete_prefix
(243-306)_list_dates
(308-350)upload_multipart
(566-572)head
(574-601)get_ingestor_meta_file_paths
(679-717)src/storage/s3.rs (7)
_delete_prefix
(405-468)_list_dates
(470-512)upload_multipart
(735-741)head
(743-770)get_ingestor_meta_file_paths
(844-882)from
(1276-1284)from
(1288-1290)
src/handlers/http/query.rs (1)
src/metrics/mod.rs (1)
increment_query_calls_by_date
(725-727)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (7)
src/query/listing_table_builder.rs (1)
90-97
: LGTM: building relative prefixes per-stream is correct.src/metadata.rs (3)
49-56
: LGTM: switched per-date metrics to counters.inc_by on EVENTS_INGESTED_DATE / EVENTS_INGESTED_SIZE_DATE aligns with counter semantics.
181-190
: Confirm call frequency of load_daily_metrics.Since it now accumulates via inc_by, calling it multiple times per process will overcount. If it can be invoked more than once, reset/remove label values first or switch to set.
63-69
: No double-count — global TOTAL_*_DATE only updated in metadata pathVerified: TOTAL_EVENTS_*_DATE is incremented in src/metadata.rs (and src/storage/object_storage.rs); src/event/mod.rs only bumps per-stream metrics via increment_events_ingested_by_date (uses UTC-normalized parsed_date), so the global totals will not be double-counted.
src/stats.rs (1)
211-227
: LGTM: generic delete_with_label_prefix now targets counters.The label-prefix removal across all dates is appropriate for per-date counter series.
Please confirm the prometheus crate version supports IntCounterVec::remove(&HashMap<&str,&str>); some versions only offer remove_label_values(&[&str]).
src/query/stream_schema_provider.rs (1)
406-410
: Metrics alignment looks good.files and bytes scanned by date are emitted once per scan, post-partitioning. Matches PR intent.
src/metrics/mod.rs (1)
125-159
: Per-date per-stream counters added — ensure callsites increment them and are idempotent.Grep shows current increments call the global TOTAL_* per-date helpers only (not the new per-stream IntCounterVecs): src/event/mod.rs:94-95, src/storage/object_storage.rs:191-192, src/query/stream_schema_provider.rs:408-409. If you intend per-stream metrics, update these ingestion/billing paths to also increment the per-stream counters (use [stream, format, date] labels) and ensure those increments run only on real ingestion (not during backfills/rebuilds) to avoid double counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
src/storage/gcs.rs (2)
511-521
: Remove duplicate GET metrics in get_objects; _get_object already emits them.These increments double-count GET calls and files. Let get_object/_get_object be the single source of GET metrics.
- increment_files_scanned_in_object_store_calls_by_date( - "gcs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "gcs", - "GET", - &Utc::now().date_naive().to_string(), - );
229-241
: Count deleted files on success only in delete-prefix stream.files_deleted is incremented before attempting delete; failed deletes inflate metrics. Increment only when delete_resp is Ok.
- files_deleted.fetch_add(1, Ordering::Relaxed); let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date( "gcs", "DELETE", &Utc::now().date_naive().to_string(), ); - if delete_resp.is_err() { - error!( - "Failed to delete object during delete stream: {:?}", - delete_resp - ); - } + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { + error!( + "Failed to delete object during delete stream: {:?}", + delete_resp + ); + }Also applies to: 256-261
src/storage/azure_blob.rs (2)
556-565
: Remove duplicate GET metrics in get_objects; _get_object already emits them.This loop double-counts GET calls and files. Keep LIST metrics here; let get_object handle GET.
- increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - );
277-289
: Count deleted files on success only in delete-prefix stream.files_deleted is incremented before delete; move it to run only when delete_resp is Ok.
- files_deleted.fetch_add(1, Ordering::Relaxed); let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date( "azure_blob", "DELETE", &Utc::now().date_naive().to_string(), ); - if delete_resp.is_err() { - error!( - "Failed to delete object during delete stream: {:?}", - delete_resp - ); - } + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { + error!( + "Failed to delete object during delete stream: {:?}", + delete_resp + ); + }Also applies to: 298-309
src/storage/s3.rs (2)
673-680
: Remove duplicate GET metrics in get_objects; _get_object already emits them.Drop these to avoid double-counting and skewed totals.
- increment_files_scanned_in_object_store_calls_by_date( - "s3", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
393-405
: Count deleted files on success only in delete-prefix stream.Move files_deleted.fetch_add into the success path of delete_resp.
- files_deleted.fetch_add(1, Ordering::Relaxed); let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date( "s3", "DELETE", &Utc::now().date_naive().to_string(), ); - if delete_resp.is_err() { - error!( - "Failed to delete object during delete stream: {:?}", - delete_resp - ); - } + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { + error!( + "Failed to delete object during delete stream: {:?}", + delete_resp + ); + }Also applies to: 414-425
🧹 Nitpick comments (5)
src/storage/gcs.rs (1)
325-337
: Optional: record multipart completion as a distinct call for visibility.Consider incrementing a "PUT_MULTIPART_COMPLETE" (or reuse "PUT_MULTIPART") counter on complete(). Helps reconcile initiated vs completed uploads.
let complete_result = async_writer.complete().await; if let Err(err) = complete_result { ... } + increment_object_store_calls_by_date( + "gcs", + "PUT_MULTIPART", + &Utc::now().date_naive().to_string(), + );Also applies to: 400-411
src/storage/azure_blob.rs (1)
381-392
: Optional: increment call count on multipart complete() for parity.Add a counter after successful complete() (e.g., PUT_MULTIPART) to reconcile initiated vs completed uploads.
let complete_result = async_writer.complete().await; if let Err(err) = complete_result { error!("Failed to complete multipart upload. {:?}", err); async_writer.abort().await?; return Err(err.into()); } + increment_object_store_calls_by_date( + "azure_blob", + "PUT_MULTIPART", + &Utc::now().date_naive().to_string(), + );Also applies to: 460-466
src/storage/localfs.rs (2)
337-346
: Ensure call counters increment on attempts, not only successes (DELETE, HEAD).Other providers increment TOTAL_OBJECT_STORE_CALLS_BY_DATE on each attempt. LocalFS currently increments only on Ok, skewing cross-provider comparisons.
Example for delete_object():
- let result = tokio::fs::remove_file(path).await; - if result.is_ok() { + let result = tokio::fs::remove_file(path).await; + increment_object_store_calls_by_date( + "localfs", "DELETE", &Utc::now().date_naive().to_string(), + ); + if result.is_ok() { // Record single file deleted successfully increment_files_scanned_in_object_store_calls_by_date( "localfs", "DELETE", 1, &Utc::now().date_naive().to_string(), ); - increment_object_store_calls_by_date( - "localfs", - "DELETE", - &Utc::now().date_naive().to_string(), - ); }Apply the same attempt-first pattern in delete_prefix, delete_stream, try_delete_node_meta, and check().
Also applies to: 349-369, 385-398, 403-413
624-635
: Avoid blocking the async runtime with fs_extra::file::copy.fs_extra::file::copy is blocking; wrap it in spawn_blocking to prevent starving the reactor.
- let result = fs_extra::file::copy(path, to_path, &op); + let result = tokio::task::spawn_blocking({ + let path = path.to_path_buf(); + let to_path = to_path.clone(); + let op = op.clone(); + move || fs_extra::file::copy(&path, to_path, &op) + }) + .await + .map_err(|e| ObjectStorageError::UnhandledError(Box::new(e)))?;src/storage/s3.rs (1)
489-501
: Optional: increment a call on multipart completion for parity with initiation/parts.Adds visibility to completed multipart uploads.
let complete_result = async_writer.complete().await; if let Err(err) = complete_result { error!("Failed to complete multipart upload. {:?}", err); async_writer.abort().await?; return Err(err.into()); } + increment_object_store_calls_by_date( + "s3", + "PUT_MULTIPART", + &Utc::now().date_naive().to_string(), + );Also applies to: 567-573
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(13 hunks)src/storage/s3.rs
(21 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/localfs.rs
src/storage/gcs.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/azure_blob.rs
src/storage/localfs.rs
src/storage/gcs.rs
📚 Learning: 2025-09-18T09:52:07.536Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.536Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/localfs.rs
src/storage/gcs.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/gcs.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/azure_blob.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/gcs.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/azure_blob.rs
src/storage/s3.rs
src/storage/localfs.rs
src/storage/gcs.rs
🧬 Code graph analysis (4)
src/storage/azure_blob.rs (5)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/object_storage.rs (6)
new
(78-87)upload_multipart
(281-285)head
(273-273)get_ingestor_meta_file_paths
(327-329)check
(292-292)parseable_json_path
(1129-1131)src/storage/mod.rs (4)
new
(204-206)new
(217-223)to_object_store_path
(293-295)from
(179-185)src/storage/gcs.rs (8)
resp
(818-823)resp
(850-855)_delete_prefix
(217-263)_list_dates
(265-296)upload_multipart
(446-452)head
(454-467)get_ingestor_meta_file_paths
(536-568)check
(603-620)
src/storage/s3.rs (4)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/localfs.rs (5)
new
(98-100)head
(130-137)get_ingestor_meta_file_paths
(191-236)from
(766-768)check
(372-383)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/azure_blob.rs (8)
resp
(902-907)resp
(937-942)_delete_prefix
(261-311)_list_dates
(313-348)head
(494-511)get_ingestor_meta_file_paths
(584-620)check
(659-679)stream_json_check
(761-761)
src/storage/localfs.rs (4)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/azure_blob.rs (4)
get_ingestor_meta_file_paths
(584-620)delete_object
(640-657)check
(659-679)try_delete_node_meta
(687-708)src/storage/gcs.rs (4)
get_ingestor_meta_file_paths
(536-568)delete_object
(588-601)check
(603-620)try_delete_node_meta
(628-645)src/storage/s3.rs (4)
get_ingestor_meta_file_paths
(693-725)delete_object
(745-758)check
(760-777)try_delete_node_meta
(785-802)
src/storage/gcs.rs (7)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/localfs.rs (6)
new
(98-100)upload_multipart
(109-118)head
(130-137)get_ingestor_meta_file_paths
(191-236)from
(766-768)check
(372-383)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/object_storage.rs (6)
new
(78-87)upload_multipart
(281-285)head
(273-273)get_ingestor_meta_file_paths
(327-329)check
(292-292)parseable_json_path
(1129-1131)src/storage/mod.rs (4)
new
(204-206)new
(217-223)to_object_store_path
(293-295)from
(179-185)src/storage/azure_blob.rs (9)
resp
(902-907)resp
(937-942)_delete_prefix
(261-311)_list_dates
(313-348)_upload_file
(350-371)upload_multipart
(486-492)head
(494-511)get_ingestor_meta_file_paths
(584-620)check
(659-679)src/storage/s3.rs (11)
resp
(974-979)resp
(1006-1011)_delete_prefix
(380-427)_list_dates
(429-460)_upload_file
(462-479)upload_multipart
(608-614)head
(616-629)get_ingestor_meta_file_paths
(693-725)from
(1027-1035)from
(1039-1041)check
(760-777)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
… object store calls
239c078
to
b87431f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/storage/metrics_layer.rs (1)
398-416
: Bug: LIST/offset histograms miss provider label (will not compile).STORAGE_REQUEST_RESPONSE_TIME expects 3 labels [provider, operation, status], but StreamMetricWrapper passes only 2. Add provider to the wrapper and use 3 labels.
-struct StreamMetricWrapper<'a, const N: usize, T> { +struct StreamMetricWrapper<'a, T> { time: time::Instant, - labels: [&'static str; N], + provider: String, + op: &'static str, inner: BoxStream<'a, T>, } @@ - fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { + fn list(&self, prefix: Option<&Path>) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list(prefix); - let res = StreamMetricWrapper { - time, - labels: ["LIST", "200"], - inner, - }; + let res = StreamMetricWrapper { time, provider: self.provider.clone(), op: "LIST", inner }; Box::pin(res) } @@ - fn list_with_offset( + fn list_with_offset( &self, prefix: Option<&Path>, offset: &Path, ) -> BoxStream<'_, ObjectStoreResult<ObjectMeta>> { let time = time::Instant::now(); let inner = self.inner.list_with_offset(prefix, offset); - let res = StreamMetricWrapper { - time, - labels: ["LIST_OFFSET", "200"], - inner, - }; + let res = StreamMetricWrapper { time, provider: self.provider.clone(), op: "LIST_OFFSET", inner }; Box::pin(res) } @@ -impl<T, const N: usize> Stream for StreamMetricWrapper<'_, N, T> { +impl<T> Stream for StreamMetricWrapper<'_, T> { type Item = T; @@ - t @ Poll::Ready(None) => { - STORAGE_REQUEST_RESPONSE_TIME - .with_label_values(&self.labels) - .observe(self.time.elapsed().as_secs_f64()); + t @ Poll::Ready(None) => { + STORAGE_REQUEST_RESPONSE_TIME + .with_label_values(&[&self.provider, self.op, "200"]) + .observe(self.time.elapsed().as_secs_f64()); t }Also applies to: 290-299, 301-315
🧹 Nitpick comments (17)
src/metrics/mod.rs (3)
229-359
: Clarify naming to avoid “DATE” vs “BY_DATE” ambiguityThe coexistence of TOTAL_DATE (gauges) and TOTAL_BY_DATE (counters) is easy to misread. Consider tightening help text to spell out scope and mutability (global gauge vs append‑only billing counter) without renaming metrics.
Apply help‑text tweaks:
- "total_events_ingested_by_date", - "Total events ingested by date", + "total_events_ingested_by_date", + "Cluster-wide events ingested (counter, append-only) for YYYY-MM-DD",Repeat similarly for the other BY_DATE counters.
495-502
: Consider custom histogram buckets for storage latencyDefault buckets may not fit object-store RTTs. Suggest SLO-aligned buckets.
- HistogramVec::new( - HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + HistogramVec::new( + HistogramOpts::new("storage_request_response_time", "Storage Request Latency") + .buckets(vec![0.01, 0.02, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]) .namespace(METRICS_NAMESPACE),
700-768
: Optional: centralize date-label formattingMany callsites pass Utc::now().date_naive().to_string(). Consider a small helper to standardize YYYY-MM-DD and reduce mistakes.
Would you like a follow-up patch adding a metrics::date_label_utc() helper and wiring callsites?
src/stats.rs (1)
211-227
: Prefix-based delete risks unintended matchesFiltering by “values contain prefix elements” can remove unrelated series if a value coincidentally matches (e.g., a stream named “json”). Prefer matching explicit key–value pairs (stream/format[/type]) or exact tuple prefix.
If you want, I can send a small patch that adds an overload accepting (&[(&str,&str)]) and checks label_map.get(key)==Some(value).
src/storage/object_storage.rs (1)
171-176
: Safer date extraction from filenameRelying on fixed indices can panic. Use chained split with expect for invariant, or split_once to reduce allocations.
- let mut file_date_part = filename.split('.').collect::<Vec<&str>>()[0]; - file_date_part = file_date_part.split('=').collect::<Vec<&str>>()[1]; + let file_date_part = filename + .split('.') + .next() + .and_then(|s| s.split('=').nth(1)) + .expect("filename contains date=YYYY-MM-DD");src/storage/localfs.rs (8)
162-186
: Count object store call regardless of outcome in get_object."Calls" should reflect attempts. Increment before matching on read result; keep files_scanned on success only.
- let file_result = fs::read(file_path).await; - let res: Result<Bytes, ObjectStorageError> = match file_result { + let file_result = fs::read(file_path).await; + increment_object_store_calls_by_date("drive", "GET", &Utc::now().date_naive().to_string()); + let res: Result<Bytes, ObjectStorageError> = match file_result { Ok(x) => { - // Record single file accessed successfully - increment_files_scanned_in_object_store_calls_by_date( - "localfs", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "localfs", - "GET", - &Utc::now().date_naive().to_string(), - ); + increment_files_scanned_in_object_store_calls_by_date("drive","GET",1,&Utc::now().date_naive().to_string()); Ok(x.into()) } Err(e) => {
197-203
: LIST: increment call on attempt, not only on success.Move/list the LIST call counter to immediately after read_dir() starts; keep files_scanned at end.
- let entries_result = fs::read_dir(&self.root).await; + let entries_result = fs::read_dir(&self.root).await; + increment_object_store_calls_by_date("drive","LIST",&Utc::now().date_naive().to_string()); let mut entries = match entries_result { Ok(entries) => entries, Err(err) => { return Err(err.into()); } }; @@ - increment_object_store_calls_by_date( - "localfs", - "LIST", - &Utc::now().date_naive().to_string(), - );Also applies to: 223-234
250-256
: get_objects: good GET vs LIST split; also count LIST call at start.Add LIST call counter before scanning so failed reads still count the attempt.
- let entries_result = fs::read_dir(&prefix).await; + let entries_result = fs::read_dir(&prefix).await; + increment_object_store_calls_by_date("drive","LIST",&Utc::now().date_naive().to_string()); let mut entries = match entries_result {And drop the trailing LIST call increment at 307-311 (already counted at start).
- increment_object_store_calls_by_date( - "localfs", - "LIST", - &Utc::now().date_naive().to_string(), - );Also applies to: 301-311
326-340
: put_object: count call on attempt; keep files_scanned on success.- let res = fs::write(path, resource).await; - if res.is_ok() { + let res = fs::write(path, resource).await; + increment_object_store_calls_by_date("drive","PUT",&Utc::now().date_naive().to_string()); + if res.is_ok() { // Record single file written successfully - increment_files_scanned_in_object_store_calls_by_date( - "localfs", - "PUT", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "localfs", - "PUT", - &Utc::now().date_naive().to_string(), - ); + increment_files_scanned_in_object_store_calls_by_date("drive","PUT",1,&Utc::now().date_naive().to_string()); }
348-357
: DELETE/HEAD metrics: count call even on failures.Currently calls are incremented only on success. Emit increment_object_store_calls_by_date unconditionally after the async op, files_scanned only on success.
Example for delete_object:
- let result = tokio::fs::remove_file(path).await; - if result.is_ok() { + let result = tokio::fs::remove_file(path).await; + increment_object_store_calls_by_date("drive","DELETE",&Utc::now().date_naive().to_string()); + if result.is_ok() { increment_files_scanned_in_object_store_calls_by_date("drive","DELETE",1,&Utc::now().date_naive().to_string()); - increment_object_store_calls_by_date("localfs","DELETE",&Utc::now().date_naive().to_string()); }Apply same pattern to delete_prefix, delete_stream, try_delete_node_meta, and check (HEAD).
Also applies to: 363-377, 383-394, 399-409, 414-424
435-448
: LIST helpers: record LIST call at start; count files scanned.
- list_streams/list_old_streams/list_dirs/list_dates: move/add LIST call increment before obtaining ReadDirStream, not gated by Ok.
- list_dirs_relative: add LIST metrics (currently missing); count files_scanned as number of DirEntry read.
Example for list_dirs_relative:
- let read_dir = match result { - Ok(read_dir) => read_dir, + let read_dir = match result { + Ok(read_dir) => { + increment_object_store_calls_by_date("drive","LIST",&Utc::now().date_naive().to_string()); + read_dir + } Err(err) => { return Err(err.into()); } }; - - let dirs = ReadDirStream::new(read_dir) - .try_collect::<Vec<DirEntry>>() + let entries: Vec<DirEntry> = ReadDirStream::new(read_dir) + .try_collect::<Vec<DirEntry>>() .await? - .into_iter() - .map(dir_name); + .into_iter() + .collect(); + increment_files_scanned_in_object_store_calls_by_date("drive","LIST", entries.len() as u64, &Utc::now().date_naive().to_string()); + let dirs = entries.into_iter().map(dir_name);Also applies to: 499-514, 564-577
587-622
: Add LIST metrics for list_hours/list_minutes.These are LIST operations; record one LIST call and files_scanned = entries.len().
- let directories = ReadDirStream::new(fs::read_dir(&path).await?); - let entries: Vec<DirEntry> = directories.try_collect().await?; + let rd = fs::read_dir(&path).await?; + increment_object_store_calls_by_date("drive","LIST",&Utc::now().date_naive().to_string()); + let directories = ReadDirStream::new(rd); + let entries: Vec<DirEntry> = directories.try_collect().await?; + increment_files_scanned_in_object_store_calls_by_date("drive","LIST", entries.len() as u64, &Utc::now().date_naive().to_string());Repeat analogously in list_minutes.
635-646
: upload_file: also record files_scanned on success.- Ok(_) => { - increment_object_store_calls_by_date( - "localfs", - "PUT", - &Utc::now().date_naive().to_string(), - ); + Ok(_) => { + increment_object_store_calls_by_date("drive","PUT",&Utc::now().date_naive().to_string()); + increment_files_scanned_in_object_store_calls_by_date("drive","PUT",1,&Utc::now().date_naive().to_string()); Ok(()) }src/storage/gcs.rs (2)
484-496
: LIST call count should be recorded at start, not only at end.Increment LIST call once before consuming the stream; keep files_scanned at end. If list fails early, the attempt should still be counted.
- let mut list_stream = self.client.list(Some(&prefix)); + let mut list_stream = self.client.list(Some(&prefix)); + increment_object_store_calls_by_date("gcs","LIST",&Utc::now().date_naive().to_string()); @@ - increment_object_store_calls_by_date("gcs", "LIST", &Utc::now().date_naive().to_string()); + // already counted at startAlso applies to: 525-533
672-693
: HEAD files_scanned in list_old_streams should reflect success only.You're adding dirs.len() to HEAD files_scanned regardless of head outcome. Track a success counter inside tasks (AtomicU64) and increment it only on Ok.
Would you like a patch to switch to an AtomicU64 success counter in this block?
src/query/stream_schema_provider.rs (1)
852-866
: Safer string timestamp parsing (no unwrap): nice. Add a negative test.The match avoids panics. Suggest adding a test with an invalid string to assert None is returned.
src/storage/s3.rs (1)
646-653
: LIST call count should be recorded at start, not only at end.Increment LIST call once before consuming the stream; keep files_scanned at end.
- let mut list_stream = self.client.list(Some(&prefix)); + let mut list_stream = self.client.list(Some(&prefix)); + increment_object_store_calls_by_date("s3","LIST",&Utc::now().date_naive().to_string()); @@ - increment_object_store_calls_by_date("s3", "LIST", &Utc::now().date_naive().to_string()); + // already counted at startAlso applies to: 682-690
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
resources/formats.json
(1 hunks)src/catalog/mod.rs
(1 hunks)src/event/format/known_schema.rs
(1 hunks)src/event/mod.rs
(2 hunks)src/handlers/http/modal/ingest_server.rs
(0 hunks)src/handlers/http/modal/query_server.rs
(0 hunks)src/handlers/http/modal/server.rs
(0 hunks)src/handlers/http/query.rs
(3 hunks)src/metadata.rs
(3 hunks)src/metrics/mod.rs
(9 hunks)src/metrics/storage.rs
(0 hunks)src/query/listing_table_builder.rs
(2 hunks)src/query/stream_schema_provider.rs
(6 hunks)src/stats.rs
(3 hunks)src/storage/azure_blob.rs
(21 hunks)src/storage/gcs.rs
(21 hunks)src/storage/localfs.rs
(13 hunks)src/storage/metrics_layer.rs
(8 hunks)src/storage/object_storage.rs
(3 hunks)src/storage/s3.rs
(21 hunks)
💤 Files with no reviewable changes (4)
- src/handlers/http/modal/server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/modal/ingest_server.rs
- src/metrics/storage.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- src/catalog/mod.rs
- src/metadata.rs
- src/event/mod.rs
- src/handlers/http/query.rs
- resources/formats.json
- src/event/format/known_schema.rs
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/storage/gcs.rs
src/stats.rs
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/storage/localfs.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/storage/gcs.rs
src/stats.rs
src/metrics/mod.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/storage/localfs.rs
src/storage/s3.rs
src/storage/metrics_layer.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/storage/gcs.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/s3.rs
src/storage/azure_blob.rs
src/storage/object_storage.rs
src/storage/gcs.rs
src/query/stream_schema_provider.rs
📚 Learning: 2025-08-21T11:47:01.279Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.
Applied to files:
src/storage/s3.rs
src/storage/azure_blob.rs
src/storage/gcs.rs
📚 Learning: 2025-08-21T14:41:55.462Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.
Applied to files:
src/storage/s3.rs
src/storage/azure_blob.rs
src/storage/gcs.rs
📚 Learning: 2025-09-11T06:35:24.721Z
Learnt from: parmesant
PR: parseablehq/parseable#1424
File: src/storage/azure_blob.rs:736-742
Timestamp: 2025-09-11T06:35:24.721Z
Learning: The Azure Blob Storage client's `list_with_delimiter()` method handles Azure-specific implementation details internally, including proper root listing behavior and path normalization, so manual prefix handling is not needed when delegating to this method.
Applied to files:
src/storage/metrics_layer.rs
src/storage/azure_blob.rs
📚 Learning: 2025-09-14T15:17:59.234Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1432
File: src/storage/object_storage.rs:124-132
Timestamp: 2025-09-14T15:17:59.234Z
Learning: In Parseable's upload validation system (src/storage/object_storage.rs), the validate_uploaded_parquet_file function should not include bounded retries for metadata consistency issues. Instead, failed validations rely on the 30-second sync cycle for natural retries, with staging files preserved when manifest_file is set to None.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
Applied to files:
src/stats.rs
src/metrics/mod.rs
📚 Learning: 2025-08-21T12:04:38.398Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T12:04:38.398Z
Learning: In Parseable's timestamp extraction (extract_timestamp_for_date in src/storage/object_storage.rs), minute-level precision is sufficient for both first and latest event timestamps. Seconds are always set to 0, meaning both first and latest events within the same minute will have identical timestamps, which is the intended behavior.
Applied to files:
src/metrics/mod.rs
🧬 Code graph analysis (7)
src/storage/localfs.rs (5)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/object_storage.rs (6)
get_ingestor_meta_file_paths
(327-329)delete_prefix
(291-291)delete_object
(326-326)check
(292-292)delete_stream
(293-293)try_delete_node_meta
(330-330)src/storage/azure_blob.rs (6)
get_ingestor_meta_file_paths
(584-620)delete_prefix
(634-638)delete_object
(640-657)check
(659-679)delete_stream
(681-685)try_delete_node_meta
(687-708)src/storage/gcs.rs (6)
get_ingestor_meta_file_paths
(536-568)delete_prefix
(582-586)delete_object
(588-601)check
(603-620)delete_stream
(622-626)try_delete_node_meta
(628-645)src/storage/s3.rs (6)
get_ingestor_meta_file_paths
(693-725)delete_prefix
(739-743)delete_object
(745-758)check
(760-777)delete_stream
(779-783)try_delete_node_meta
(785-802)
src/storage/s3.rs (3)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/azure_blob.rs (7)
resp
(902-907)resp
(937-942)_delete_prefix
(261-311)_list_dates
(313-348)head
(494-511)get_ingestor_meta_file_paths
(584-620)check
(659-679)
src/storage/metrics_layer.rs (3)
src/storage/azure_blob.rs (1)
head
(494-511)src/storage/gcs.rs (1)
head
(454-467)src/storage/s3.rs (3)
head
(616-629)from
(1027-1035)from
(1039-1041)
src/storage/azure_blob.rs (6)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/localfs.rs (6)
new
(98-100)upload_multipart
(109-118)head
(130-137)get_ingestor_meta_file_paths
(191-236)from
(777-779)check
(383-394)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/mod.rs (4)
new
(204-206)new
(217-223)to_object_store_path
(293-295)from
(179-185)src/storage/gcs.rs (8)
resp
(818-823)resp
(850-855)_delete_prefix
(217-263)_list_dates
(265-296)upload_multipart
(446-452)head
(454-467)get_ingestor_meta_file_paths
(536-568)check
(603-620)src/storage/s3.rs (10)
resp
(974-979)resp
(1006-1011)_delete_prefix
(380-427)_list_dates
(429-460)upload_multipart
(608-614)head
(616-629)get_ingestor_meta_file_paths
(693-725)from
(1027-1035)from
(1039-1041)check
(760-777)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(713-717)increment_parquets_stored_size_by_date
(719-723)
src/storage/gcs.rs (3)
src/metrics/mod.rs (2)
increment_files_scanned_in_object_store_calls_by_date
(747-756)increment_object_store_calls_by_date
(741-745)src/storage/metrics_layer.rs (2)
new
(71-76)head
(251-265)src/storage/azure_blob.rs (8)
resp
(902-907)resp
(937-942)_delete_prefix
(261-311)_list_dates
(313-348)_upload_file
(350-371)head
(494-511)get_ingestor_meta_file_paths
(584-620)check
(659-679)
src/query/stream_schema_provider.rs (2)
src/metrics/mod.rs (2)
increment_bytes_scanned_in_query_by_date
(735-739)increment_files_scanned_in_query_by_date
(729-733)src/catalog/mod.rs (2)
file_size
(60-60)file_size
(74-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (18)
src/metrics/mod.rs (2)
125-159
: Per‑date counters for per‑stream look correctSwitching EVENTS_*_DATE and EVENTS_STORAGE_SIZE_DATE to IntCounterVec with explicit date labels aligns with append‑only semantics and simplifies deletions via remove_label_values.
161-195
: Global TOTAL_*_DATE gauges align with design (format,date only)Keeping these as IntGaugeVec for decrement-on-delete matches the intended global aggregation (no stream label).
src/stats.rs (1)
144-152
: Prevent negative global totals on delete (saturating gauge update).sub(...) can drive TOTAL_*_DATE below zero if prior increments were missed or manifests overlap. Use saturating get/set.
- TOTAL_EVENTS_INGESTED_DATE - .with_label_values(&["json", &manifest_date]) - .sub(manifest.events_ingested as i64); + { + let g = TOTAL_EVENTS_INGESTED_DATE.with_label_values(&["json", &manifest_date]); + let cur = g.get(); + let dec = manifest.events_ingested as i64; + g.set(cur.saturating_sub(dec)); + } - TOTAL_EVENTS_INGESTED_SIZE_DATE - .with_label_values(&["json", &manifest_date]) - .sub(manifest.ingestion_size as i64); + { + let g = TOTAL_EVENTS_INGESTED_SIZE_DATE.with_label_values(&["json", &manifest_date]); + let cur = g.get(); + let dec = manifest.ingestion_size as i64; + g.set(cur.saturating_sub(dec)); + } - TOTAL_EVENTS_STORAGE_SIZE_DATE - .with_label_values(&["parquet", &manifest_date]) - .sub(manifest.storage_size as i64); + { + let g = TOTAL_EVENTS_STORAGE_SIZE_DATE.with_label_values(&["parquet", &manifest_date]); + let cur = g.get(); + let dec = manifest.storage_size as i64; + g.set(cur.saturating_sub(dec)); + }src/storage/azure_blob.rs (3)
276-289
: Count DELETE only on success in delete streamfiles_deleted is incremented before attempting delete; overcounts on failures. Increment only when delete_resp is Ok.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); - let delete_resp = self.client.delete(&obj.location).await; + Ok(obj) => { + let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date( "azure_blob", "DELETE", &Utc::now().date_naive().to_string(), ); - if delete_resp.is_err() { + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { error!( "Failed to delete object during delete stream: {:?}", delete_resp ); - } + }Also applies to: 298-309
555-565
: Double-counting GETs in get_objects (dup of _get_object metrics)after get_object(), you increment GET call and scanned again. _get_object already records both. Remove the duplicates.
- increment_files_scanned_in_object_store_calls_by_date( - "azure_blob", - "GET", - 1, - &Utc::now().date_naive().to_string(), - ); - increment_object_store_calls_by_date( - "azure_blob", - "GET", - &Utc::now().date_naive().to_string(), - );
213-233
: Resolve: GET latency is already captured by MetricLayer
Theasync fn get
implementation inMetricLayer<T>
starts a timer (time::Instant::now()
) before delegating toinner.get(...)
, so GET requests are instrumented for latency. No changes needed.src/storage/object_storage.rs (2)
171-193
: Metric updates on upload are correct and completeUpdating per-stream counters, lifetime gauges, global TOTAL_*_DATE gauge, and billing counters (parquet count/size) looks good.
742-786
: First/latest timestamp derivation is efficient and UTC-stableDirectory traversal with min/max reduction and seconds fixed to 00 matches the intended minute precision and UTC normalization.
Also applies to: 788-864
src/storage/localfs.rs (1)
73-89
: LGTM: FSConfig implementation.Provider wiring and runtime env are fine.
src/storage/gcs.rs (3)
217-236
: Count deleted files on success only in delete_prefix.files_deleted is incremented before delete; move it after a successful delete.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date("gcs","DELETE",&Utc::now().date_naive().to_string()); - if delete_resp.is_err() { + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { error!("Failed to delete object during delete stream: {:?}", delete_resp); } }Also applies to: 250-261
489-523
: Remove duplicate GET metrics in get_objects loop.get_object() already emits GET metrics; the loop adds them again.
let byts = self .get_object( RelativePath::from_path(meta.location.as_ref()) .map_err(ObjectStorageError::PathError)?, ) .await?; - increment_files_scanned_in_object_store_calls_by_date("gcs","GET",1,&Utc::now().date_naive().to_string()); - increment_object_store_calls_by_date("gcs","GET",&Utc::now().date_naive().to_string()); res.push(byts);
127-147
: LGTM: Provider wiring and upload_file path.MetricLayer usage and upload_file metrics look good.
Also applies to: 298-315, 775-777
src/storage/metrics_layer.rs (1)
35-62
: Status mapping helper looks good.error_to_status_code covers common object_store errors adequately.
src/query/stream_schema_provider.rs (2)
326-332
: Query billing metrics: solid placement.Counting files and bytes scanned at partitioning time is reasonable; increments use UTC date correctly.
If you want, I can add an integration test to assert these counters after a small query run.
Also applies to: 406-410
282-299
: Async listing flow changes: LGTM.Switch to populate_via_listing(glob_storage, time_filters) reads clean and aligns with provider-agnostic listing.
src/storage/s3.rs (3)
380-406
: Count deleted files on success only in delete_prefix.files_deleted is incremented before delete; move it after a successful delete.
- Ok(obj) => { - files_deleted.fetch_add(1, Ordering::Relaxed); + Ok(obj) => { let delete_resp = self.client.delete(&obj.location).await; increment_object_store_calls_by_date("s3","DELETE",&Utc::now().date_naive().to_string()); - if delete_resp.is_err() { + if delete_resp.is_ok() { + files_deleted.fetch_add(1, Ordering::Relaxed); + } else { error!("Failed to delete object during delete stream: {:?}", delete_resp); } }Also applies to: 414-425
667-680
: Remove duplicate GET metrics in get_objects loop.get_object() already emits GET metrics; drop the extra increments here.
- increment_files_scanned_in_object_store_calls_by_date("s3","GET",1,&Utc::now().date_naive().to_string()); - increment_object_store_calls_by_date("s3", "GET", &Utc::now().date_naive().to_string());
298-316
: LGTM: Provider wiring and upload_file path.MetricLayer usage and upload_file metrics look good.
Also applies to: 932-934
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/stats.rs (2)
154-162
: Avoid negative gauges when deleting — clamp via saturating set.Direct sub on IntGaugeVec can drive values below zero under drift or partial deletes. Read current, saturating_sub, then set.
Apply:
EVENTS_INGESTED - .with_label_values(&[stream_name, "json"]) - .sub(num_row); + .with_label_values(&[stream_name, "json"]); + { + let cur = EVENTS_INGESTED.with_label_values(&[stream_name, "json"]).get(); + let newv = cur.saturating_sub(num_row); + if newv != cur - num_row { + warn!("Clamped EVENTS_INGESTED below zero for stream={} format=json", stream_name); + } + EVENTS_INGESTED.with_label_values(&[stream_name, "json"]).set(newv); + } EVENTS_INGESTED_SIZE - .with_label_values(&[stream_name, "json"]) - .sub(ingestion_size); + .with_label_values(&[stream_name, "json"]); + { + let cur = EVENTS_INGESTED_SIZE.with_label_values(&[stream_name, "json"]).get(); + let newv = cur.saturating_sub(ingestion_size); + if newv != cur - ingestion_size { + warn!("Clamped EVENTS_INGESTED_SIZE below zero for stream={} format=json", stream_name); + } + EVENTS_INGESTED_SIZE.with_label_values(&[stream_name, "json"]).set(newv); + } STORAGE_SIZE - .with_label_values(&["data", stream_name, "parquet"]) - .sub(storage_size); + .with_label_values(&["data", stream_name, "parquet"]); + { + let cur = STORAGE_SIZE.with_label_values(&["data", stream_name, "parquet"]).get(); + let newv = cur.saturating_sub(storage_size); + if newv != cur - storage_size { + warn!("Clamped STORAGE_SIZE below zero for stream={} format=parquet", stream_name); + } + STORAGE_SIZE.with_label_values(&["data", stream_name, "parquet"]).set(newv); + }
201-217
: Prefix match by values can over‑match; consider key+value matching.Current logic removes any series whose label values contain all prefix values, regardless of label keys. Low risk here, but if values overlap across labels, unintended series could be removed. Consider an overload that accepts (key,value) pairs.
src/metrics/mod.rs (1)
324-331
: Consider custom buckets for storage_request_response_time.Default histogram buckets may be suboptimal for object‑store latencies. Optional: define buckets tailored to your providers (e.g., ms‑to‑seconds spread).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/metadata.rs
(2 hunks)src/metrics/mod.rs
(8 hunks)src/stats.rs
(3 hunks)src/storage/object_storage.rs
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/metadata.rs
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-08-25T01:32:25.980Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Applied to files:
src/stats.rs
src/storage/object_storage.rs
src/metrics/mod.rs
📚 Learning: 2025-08-25T01:31:41.786Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
Applied to files:
src/stats.rs
src/storage/object_storage.rs
src/metrics/mod.rs
📚 Learning: 2025-09-18T09:59:20.177Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.
Applied to files:
src/stats.rs
src/metrics/mod.rs
📚 Learning: 2025-09-18T09:52:07.554Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-09-14T15:17:59.234Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1432
File: src/storage/object_storage.rs:124-132
Timestamp: 2025-09-14T15:17:59.234Z
Learning: In Parseable's upload validation system (src/storage/object_storage.rs), the validate_uploaded_parquet_file function should not include bounded retries for metadata consistency issues. Instead, failed validations rely on the 30-second sync cycle for natural retries, with staging files preserved when manifest_file is set to None.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-18T14:56:18.463Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-18T12:33:16.085Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1405
File: src/storage/object_storage.rs:145-166
Timestamp: 2025-08-18T12:33:16.085Z
Learning: In Parseable's staging and upload process, parquet file names are guaranteed to always contain the date part in the expected format (date=YYYY-MM-DD). The system ensures no deviation from this naming convention, making defensive parsing unnecessary for date extraction from filenames during storage metrics updates.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-20T17:01:25.791Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.
Applied to files:
src/storage/object_storage.rs
📚 Learning: 2025-08-21T12:04:38.398Z
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T12:04:38.398Z
Learning: In Parseable's timestamp extraction (extract_timestamp_for_date in src/storage/object_storage.rs), minute-level precision is sufficient for both first and latest event timestamps. Seconds are always set to 0, meaning both first and latest events within the same minute will have identical timestamps, which is the intended behavior.
Applied to files:
src/metrics/mod.rs
🧬 Code graph analysis (1)
src/storage/object_storage.rs (1)
src/metrics/mod.rs (2)
increment_parquets_stored_by_date
(493-497)increment_parquets_stored_size_by_date
(499-503)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (9)
src/stats.rs (1)
127-138
: Correct: delete only the targeted stream’s per‑date series.Using remove_label_values with ["stream","format","date"] (and ["type","stream","format","date"] for storage) ensures we drop just this stream’s day series instead of global totals. Alignment with the updated label sets looks right.
src/storage/object_storage.rs (5)
52-54
: Good addition: billing counters for parquet storage.Provider‑agnostic helpers for parquets stored and bytes stored by date look correct.
170-176
: Getting local size with explicit error mapping is fine given staging guarantees.Using path.metadata() to obtain compressed size and failing on error aligns with the earlier guarantee that staging files persist through this step.
Noting prior decision: relying on staging file availability is intentional here (based on team’s guarantee).
179-186
: Correct counter semantics for per‑date storage size.Switch to inc_by on EVENTS_STORAGE_SIZE_DATE with ["data", stream, "parquet", date] is consistent with the new counter type and labels.
186-189
: LGTM: global billing counters for parquet storage.These increments match the BY_DATE helpers and use the parsed YYYY‑MM‑DD date string.
251-261
: Trait surface change — ensure all providers updated.
Found impls in: src/storage/s3.rs, src/storage/gcs.rs, src/storage/localfs.rs, src/storage/azure_blob.rs. Confirm these compile against the updated ObjectStorageProvider bounds.src/metrics/mod.rs (3)
125-159
: Per‑stream daily metrics switched to counters with stream label.Using IntCounterVec for EVENTS_*_DATE and adding "stream" matches the per‑stream daily totals objective and avoids negative adjustments.
193-331
: Confirm intent: global BY_DATE counters now omit “format”.Previously, global totals used ["format","date"] (as per earlier design decisions). These new TOTAL_*_BY_DATE counters use only ["date"] (or provider/method as applicable). If dropping “format” is intentional for consolidation, this is fine; otherwise, consider adding it back.
480-549
: Helper API looks clean and consistent.The increment_*_by_date helpers centralize label handling and reduce call‑site errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/query/listing_table_builder.rs (3)
98-111
: Reintroduce bounded concurrency for object-store listings; also sort in-place to avoid extra alloc.Sequentially listing each prefix can significantly slow queries. Use a bounded concurrent stream and sort in-place.
Apply this diff:
- // Use storage.list_dirs_relative for all prefixes and flatten results - let mut listing = Vec::new(); - for prefix in prefixes { - match storage.list_dirs_relative(&prefix).await { - Ok(paths) => { - listing.extend(paths.into_iter().map(|p| prefix.join(p).to_string())); - } - Err(e) => { - return Err(DataFusionError::External(Box::new(e))); - } - } - } - - let listing = listing.into_iter().sorted().rev().collect_vec(); + // List prefixes with bounded concurrency and flatten results + let mut listing = Vec::new(); + let concurrency = 16usize; + let storage = storage.clone(); + let mut tasks = stream::iter(prefixes.into_iter().map(move |prefix| { + let storage = storage.clone(); + async move { + match storage.list_dirs_relative(&prefix).await { + Ok(children) => Ok::<Vec<String>, _>( + children + .into_iter() + .map(|c| prefix.join(c).to_string()) + .collect(), + ), + Err(e) => Err(e), + } + } + })) + .buffer_unordered(concurrency); + + while let Some(res) = tasks.next().await { + match res { + Ok(mut paths) => listing.append(&mut paths), + Err(e) => return Err(DataFusionError::External(Box::new(e))), + } + } + + // Sort descending without extra allocation + listing.sort_unstable_by(|a, b| b.cmp(a));Note: After switching to in-place sort,
itertools::Itertools
may be removable if unused elsewhere in this file.
90-97
: Build relative paths via join instead of string formatting.Avoids double slashes and is more robust/idiomatic.
- let prefixes: Vec<_> = prefixes - .into_iter() - .map(|prefix| { - relative_path::RelativePathBuf::from(format!("{}/{}", &self.stream, prefix)) - }) - .collect(); + let base = relative_path::RelativePathBuf::from(self.stream.as_str()); + let prefixes: Vec<_> = prefixes + .into_iter() + .map(|p| base.join(p.as_str())) + .collect();
19-19
: Add futures imports needed for bounded-concurrency stream.Required by the refactor above.
use std::{ops::Bound, sync::Arc}; +use futures::stream::{self, StreamExt};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/query/listing_table_builder.rs
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
src/query/listing_table_builder.rs (1)
101-104
: Resolved: parent prefix preserved when extending listing.Using
prefix.join(...)
fixes the previously reported bug where child entries lost their parent prefix. Good catch.
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Format