-
-
Notifications
You must be signed in to change notification settings - Fork 135
refactor: schema mismatch check #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes modify the Changes
Sequence Diagram(s)sequenceDiagram
participant C as Caller
participant E as EventFormat::into_recordbatch
participant V as Schema Validator
C->>E: Invoke into_recordbatch(new_schema, storage_schema, static_schema_flag)
E->>V: For each field, check using new is_schema_matching
alt Schema mismatch found
V-->>E: Return mismatch error
E-->>C: Return error response ("Schema mismatch")
else All fields valid
V-->>E: Validation passed
E->>E: Prepare new_schema
E-->>C: Return record batch
end
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (4)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/event/format/mod.rs (1)
174-183
: Improved schema validation logic, but consider enhancing the error message.The refactored code effectively integrates the schema validation directly into the
into_recordbatch
method, which simplifies the control flow. The use ofis_none_or
with a predicate is elegant for checking both missing fields and field mismatches in a single operation.However, the error message "Schema mismatch" is quite generic and doesn't provide details about which fields failed to match, making debugging potentially challenging.
Consider enhancing the error message to include specific field information:
- return Err(anyhow!("Schema mismatch")); + let mismatched_fields: Vec<String> = schema + .iter() + .filter_map(|field| { + let name = field.name(); + storage_schema + .get(name) + .filter(|storage_field| **storage_field != *field) + .map(|_| name.clone()) + }) + .collect(); + return Err(anyhow!("Schema mismatch on fields: {}", mismatched_fields.join(", ")));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/event/format/mod.rs
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
🔇 Additional comments (1)
src/event/format/mod.rs (1)
184-185
: Good addition of explanatory comment.The added comment "prepare the record batch and new fields to be added" clearly explains the purpose of this code block, which aids in code readability and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/event/format/mod.rs (1)
174-177
: Consider adding more descriptive context to the error message.
By returning a generic"Schema mismatch"
error, it might be harder to debug exactly where or why the mismatch occurs. Providing the mismatched field name or type difference in the error could speed up troubleshooting.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/event/format/mod.rs
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (3)
src/event/format/mod.rs (3)
178-185
: Verify ignoring the existing schema during type updates.
Here,existing_schema
is passed asNone
, which means any logic that relies on the actual stored schema for type inferences or timestamp overrides will be skipped. Double-check that this is intentional and doesn't hide mismatch or override opportunities that might be relevant for your schema.
212-229
: Great clarity and seamless logic foris_schema_matching
.
The inline documentation aligns well with the implementation. The usage ofis_none_or
succinctly captures the condition where either the storage schema entry is absent or different. Overall, this improves maintainability and readability of the static vs. flexible schema checks.
374-490
: Commendable test coverage foris_schema_matching
.
The tests thoroughly cover corner cases, including static vs. flexible schema modes, missing fields, type mismatches, and empty schemas. This robust suite will greatly reduce the risk of regressions.
Signed-off-by: Devdutt Shenoi <devdutt@parseable.com>
Fixes #XXXX.
Description
ref: #1218 (comment)
This PR has:
Summary by CodeRabbit