-
-
Notifications
You must be signed in to change notification settings - Fork 147
chore: find tables from DFParser, schema merge when required #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: find tables from DFParser, schema merge when required #1380
Conversation
find table list from DFParser create logical plan for the query if fails, create from storage, merge schema then create logical plan again
WalkthroughThis update refactors SQL query processing by replacing logical plan visitor-based table extraction with direct SQL parsing via Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler
participant QueryParser
participant StreamManager
participant LogicalPlanner
Client->>Handler: Submit SQL query
Handler->>QueryParser: resolve_stream_names(sql)
QueryParser-->>Handler: [stream1, stream2, ...]
Handler->>StreamManager: (async) create_stream_and_schema for each stream
StreamManager-->>Handler: (all done in parallel)
Handler->>LogicalPlanner: try create_logical_plan
alt Plan creation fails
Handler->>StreamManager: (async) create_stream_and_schema for missing streams
StreamManager-->>Handler: (all done in parallel)
Handler->>LogicalPlanner: retry create_logical_plan
end
LogicalPlanner-->>Handler: Logical plan or error
Handler-->>Client: Result or error
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/handlers/airplane.rs (1)
132-135
: Consider including the actual error in the responseWhile logging the error is good, returning a generic "Invalid SQL query syntax" message might make debugging harder for API users. Consider including the actual parser error in the response.
let streams = resolve_stream_names(&ticket.query).map_err(|e| { error!("Failed to extract table names from SQL: {}", e); - Status::invalid_argument("Invalid SQL query syntax") + Status::invalid_argument(format!("Invalid SQL query syntax: {}", e)) })?;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/alerts/alerts_utils.rs
(3 hunks)src/alerts/mod.rs
(3 hunks)src/correlation.rs
(2 hunks)src/handlers/airplane.rs
(3 hunks)src/handlers/http/correlation.rs
(1 hunks)src/handlers/http/query.rs
(9 hunks)src/query/mod.rs
(3 hunks)src/users/filters.rs
(1 hunks)src/utils/mod.rs
(3 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: de-sh
PR: parseablehq/parseable#1236
File: src/prism/logstream/mod.rs:332-332
Timestamp: 2025-03-13T11:39:52.587Z
Learning: SQL injection concerns can be ignored in this codebase as all SQL queries are run against immutable data streams, limiting the potential impact of any injection.
src/correlation.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
src/handlers/http/correlation.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
src/users/filters.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
src/handlers/airplane.rs (5)
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
src/alerts/alerts_utils.rs (2)
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
src/utils/mod.rs (6)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
src/query/mod.rs (1)
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
src/handlers/http/query.rs (3)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: parmesant
PR: parseablehq/parseable#1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.
🧬 Code Graph Analysis (4)
src/correlation.rs (1)
src/utils/mod.rs (1)
user_auth_for_datasets
(91-150)
src/handlers/http/correlation.rs (1)
src/utils/mod.rs (1)
user_auth_for_datasets
(91-150)
src/users/filters.rs (1)
src/utils/mod.rs (1)
user_auth_for_datasets
(91-150)
src/utils/mod.rs (2)
src/handlers/http/query.rs (1)
query
(111-144)src/query/mod.rs (1)
resolve_stream_names
(488-500)
🔇 Additional comments (13)
src/handlers/http/correlation.rs (1)
57-57
: LGTM! Async authorization check properly awaited.The change correctly awaits the
user_auth_for_datasets
function call, which is now async as part of the broader refactoring to handle stream existence checks asynchronously.src/users/filters.rs (1)
196-196
: LGTM! Consistent async authorization pattern.The async await is correctly added to the
user_auth_for_datasets
call, maintaining consistency with the codebase-wide refactoring to async authorization checks.src/alerts/mod.rs (3)
22-22
: LGTM! Import added for new error handling.The import for
ParserError
is correctly added to support the new error variant in theAlertError
enum.
864-865
: LGTM! Proper error variant addition.The
ParserError
variant is correctly added with the#[from]
attribute, enabling automatic conversion from DataFusion's parser errors. This aligns with the broader refactoring that introduces direct SQL parsing.
886-886
: LGTM! Appropriate HTTP status mapping.The
ParserError
is correctly mapped toBAD_REQUEST
status code, which is appropriate for SQL parsing errors.src/correlation.rs (2)
90-90
: LGTM! Async authorization properly awaited.The
user_auth_for_datasets
call is correctly awaited in thelist_correlations
method, consistent with the async authorization refactoring.
284-284
: LGTM! Consistent async pattern in validation.The async await is correctly added to the
user_auth_for_datasets
call in thevalidate
method, maintaining consistency with the codebase-wide async authorization changes.src/query/mod.rs (3)
27-28
: LGTM! New imports for enhanced table resolution.The imports for
resolve_table_references
andTransformed, TreeNode
support the new SQL parsing approach for stream name resolution.
37-38
: LGTM! SQL parser imports for direct parsing.The imports for
DFParser
andPostgreSqlDialect
enable direct SQL parsing, replacing the previous logical plan visitor pattern.
488-500
: LGTM! Well-implemented stream name resolution function.The
resolve_stream_names
function is well-designed and provides several improvements:
- SQL normalization: Correctly handles backticks by converting to double quotes
- Robust parsing: Uses DataFusion's SQL parser with PostgreSQL dialect
- Error handling: Properly propagates parser errors that can be handled by the new
ParserError
variant- Table extraction: Efficiently extracts table names from resolved references
This replaces the previous logical plan visitor pattern with a more direct and efficient approach that supports the async stream creation workflow.
src/alerts/alerts_utils.rs (1)
81-108
: Good implementation of parallel stream creation with proper error handlingThe use of
JoinSet
for concurrent stream creation is efficient, and the error handling with warnings allows the process to continue even if some streams fail to create.src/handlers/http/query.rs (2)
506-532
: Excellent implementation of fallback stream creationThe implementation correctly handles logical plan creation failures by:
- Creating missing streams in parallel using JoinSet
- Logging individual failures without stopping the process
- Retrying logical plan creation after stream creation
This pattern is consistent with the implementation in alerts_utils.rs.
391-402
: No action needed: update_schema_when_distributed performance is negligibleVerified that update_schema_when_distributed is only ever called with a single‐element Vec in:
- src/prism/logstream/mod.rs
- src/logstream/mod.rs
- src/handlers/http/logstream.rs
Iterating over one table has a negligible overhead. If you plan to batch multiple tables in the future, you could add a “skip if unchanged” check or document the behavior, but no change is needed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/utils/mod.rs (1)
107-111
: Potential issue: Variable name mismatch in error message.The error message uses
table_name
but the stream existence check is performed on thestream
variable from the permission. This could be confusing when the stream name differs from the table name.Consider if the error message should reference the
stream
variable instead to match what was actually checked:-"Stream not found: {table_name}" +"Stream not found: {stream}"
🧹 Nitpick comments (1)
src/alerts/alerts_utils.rs (1)
86-100
: Consider error handling strategy for stream creation failures.While the current implementation logs warnings for failed stream creation, it continues processing. Consider if certain stream creation failures should abort the entire operation.
The current approach of logging warnings and continuing may be appropriate, but verify if there are scenarios where stream creation failure should result in early termination:
// Current: Logs warning and continues if let Err(e) = &result { warn!("Failed to create stream '{}': {}", stream_name, e); } // Alternative: Could track critical failures if let Err(e) = &result { if is_critical_error(&e) { return Err(AlertError::CustomError(format!("Critical stream creation failure: {}", e))); } warn!("Failed to create stream '{}': {}", stream_name, e); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/alerts/alerts_utils.rs
(3 hunks)src/handlers/airplane.rs
(3 hunks)src/handlers/http/query.rs
(9 hunks)src/users/filters.rs
(1 hunks)src/utils/mod.rs
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/users/filters.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/handlers/airplane.rs
- src/handlers/http/query.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: de-sh
PR: parseablehq/parseable#1236
File: src/prism/logstream/mod.rs:332-332
Timestamp: 2025-03-13T11:39:52.587Z
Learning: SQL injection concerns can be ignored in this codebase as all SQL queries are run against immutable data streams, limiting the potential impact of any injection.
src/utils/mod.rs (7)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Learnt from: de-sh
PR: parseablehq/parseable#1236
File: src/prism/logstream/mod.rs:332-332
Timestamp: 2025-03-13T11:39:52.587Z
Learning: SQL injection concerns can be ignored in this codebase as all SQL queries are run against immutable data streams, limiting the potential impact of any injection.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
src/alerts/alerts_utils.rs (2)
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: coverage
🔇 Additional comments (7)
src/utils/mod.rs (4)
30-30
: LGTM: Import aligns with the refactoring approach.The import of
resolve_stream_names
from the query module supports the shift from logical plan visitor-based table extraction to direct SQL parsing.
84-88
: LGTM: Improved error handling for stream name extraction.The direct use of
resolve_stream_names
with proper error mapping toBadRequest
is more efficient than the previous logical plan visitor approach and provides clearer error messages.
91-94
: LGTM: Async function signature aligns with stream existence checks.Making
user_auth_for_datasets
async is necessary to support the asynchronous stream existence checks added in the function body.
107-111
: No changes needed:check_or_load_stream
follows the established query vs. standalone pattern
Verified thatcheck_or_load_stream
only attempts to load from storage whenoptions.mode
isQuery
orPrism
, matching all other usages and the intended error conditions in standalone mode.src/alerts/alerts_utils.rs (3)
32-41
: LGTM: Imports support the new async parallel processing.The new imports for
JoinSet
,warn
,resolve_stream_names
, andupdate_schema_when_distributed
properly support the enhanced functionality.
82-109
: LGTM: Robust fallback mechanism for stream creation.The enhanced
prepare_query
function implements a solid fallback strategy:
- Attempts to create logical plan first
- On failure, concurrently creates missing streams and schemas
- Logs errors appropriately with
warn!
for failed stream creation- Retries logical plan creation after stream setup
The use of
JoinSet
for parallel stream creation is efficient and the error handling is appropriate.
121-137
: LGTM: Consistent stream name extraction and schema updates.The changes to
execute_base_query
properly:
- Use
resolve_stream_names
for consistent stream extraction- Call
update_schema_when_distributed
for all streams before proceeding- Maintain proper error handling and propagation
The logic aligns with the broader refactoring approach across the codebase.
find table list from DFParser
create logical plan for the query
if fails, create from storage, merge schema
then create logical plan again
similar logic in user_auth_for_dataset -
check if stream exists locally, if not create from storage
then perform the check
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores