-
-
Notifications
You must be signed in to change notification settings - Fork 135
fix: query staging(in-mem) when concerned with the past 5 minutes #1194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces significant changes to the Changes
Sequence Diagram(s)sequenceDiagram
participant QE as Query Engine
participant TP as TableProvider
QE->>TP: extract_primary_filter(filters, time_partition)
TP->>TP: Process filters and return time_filters
QE->>TP: is_within_staging_window(time_filters)
TP->>TP: Evaluate if current time is within staging window
TP-->>QE: Return inclusion decision
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
src/query/stream_schema_provider.rs (2)
957-957
: 🛠️ Refactor suggestionAdd test coverage for
is_within_staging_window
.The new function lacks test coverage. Consider adding tests to verify:
- Behavior with filters within the 5-minute window
- Behavior with filters outside the window
- Behavior with no upper bound
- Edge cases around the window boundary
Here's a suggested test implementation:
#[test] fn test_staging_window() { let now = Utc::now().naive_utc(); let four_mins_ago = (now - TimeDelta::minutes(4)); let six_mins_ago = (now - TimeDelta::minutes(6)); // Test case 1: Filter within 5 minutes let filters = vec![PartialTimeFilter::High(Bound::Included(four_mins_ago))]; assert!(is_within_staging_window(&filters)); // Test case 2: Filter outside 5 minutes let filters = vec![PartialTimeFilter::High(Bound::Included(six_mins_ago))]; assert!(!is_within_staging_window(&filters)); // Test case 3: No upper bound let filters = vec![PartialTimeFilter::Low(Bound::Included(six_mins_ago))]; assert!(is_within_staging_window(&filters)); // Test case 4: Multiple filters let filters = vec![ PartialTimeFilter::Low(Bound::Included(six_mins_ago)), PartialTimeFilter::High(Bound::Included(four_mins_ago)) ]; assert!(is_within_staging_window(&filters)); }
736-757
: 💡 Verification agent🧩 Analysis chain
LGTM! Well-implemented staging window check.
The implementation correctly:
- Calculates a 5-minute window from the current time
- Checks if any filter's upper bound falls within this window
- Handles cases with no upper bound
However, there's a potential edge case to consider:
🏁 Script executed:
#!/bin/bash # Description: Check if there are any tests that verify the behavior of is_within_staging_window # with different time filter combinations. # Search for test cases rg -A 5 "is_within_staging_window" --type rustLength of output: 1906
Staging window check approved – please add explicit unit tests for edge-case scenarios
The implementation correctly:
- Calculates a 5‑minute window from the current time
- Determines when any filter’s high bound (or equality) falls within that window
- Handles cases when no upper bound is present
However, a repository-wide search (using
rg -A 5 "is_within_staging_window" --type rust
) did not reveal dedicated tests verifying this function’s behavior under various time filter combinations. I recommend adding unit tests that cover scenarios such as:
- A time filter exactly on the 5‑minute threshold
- A mix of filters including both high bounds and non‑high bounds
- Cases with no upper bound at all
This will help ensure that the edge-case behavior is validated and maintained over time.
🧹 Nitpick comments (1)
src/query/stream_schema_provider.rs (1)
733-735
: Documentation could be more descriptive.The current documentation doesn't fully explain the function's purpose and behavior. Consider expanding it to include:
- The purpose of the 5-minute window
- The impact on data retrieval
- Examples of when data will/won't be considered
Apply this diff to improve the documentation:
-/// We should consider data in staging for queries concerning a time period, -/// ending within 5 minutes from now. e.g. If current time is 5 -pub fn is_within_staging_window(time_filters: &[PartialTimeFilter]) -> bool { +/// Determines if data should be retrieved from staging based on the query's time filters. +/// +/// This function checks if any of the time filters indicate a query that ends within +/// the last 5 minutes. This ensures that recent data, which might still be in the +/// staging area and not yet written to permanent storage, is included in query results. +/// +/// # Arguments +/// * `time_filters` - A slice of time-based filters from the query +/// +/// # Returns +/// * `true` if any filter's upper bound is within the last 5 minutes or if there's no upper bound +/// * `false` otherwise +/// +/// # Example +/// If current time is 10:05:00: +/// - Query ending at 10:01:00 -> returns true (within 5 minutes) +/// - Query ending at 09:59:00 -> returns false (outside 5 minutes) +/// - Query with no end time -> returns true (assumed current) +pub fn is_within_staging_window(time_filters: &[PartialTimeFilter]) -> bool {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/query/stream_schema_provider.rs
(4 hunks)src/utils/arrow/flight.rs
(2 hunks)
🔇 Additional comments (3)
src/utils/arrow/flight.rs (2)
23-23
: LGTM! Import changes align with the new timestamp handling approach.The imports reflect the shift from
include_now
to the newis_within_staging_window
function, which better handles data within the current minute.
134-136
: LGTM! Improved timestamp handling logic.The changes correctly implement the new approach for handling data within the current minute by:
- Using
extract_primary_filter
to process time filters- Using
is_within_staging_window
to check if data falls within the staging windowsrc/query/stream_schema_provider.rs (1)
829-829
: LGTM! Appropriate visibility change forextract_primary_filter
.Making the function public is necessary as it's now used by the
send_to_ingester
function inflight.rs
.
Fixes #XXXX.
Description
Currently we aren't considering the possibility of data that is in staging due to slow network or slow compaction.
Another PR will consider files inside parquet files not pushed into object store as well.
This PR has:
Summary by CodeRabbit