Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): introduce data control attributes for wren MDL base #1014

Merged
merged 1 commit into from
Dec 26, 2024

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Dec 26, 2024

Describe

Introduce the RowLevelSecurity and ColumnLevelSecurity for MDL.
Now, we can define a column in the MDL JSON like

        {
          "name": "rls_orderkey",
          "type": "integer",
          "expression": "o_orderkey",
          "rls": {
            "name": "SESSION_STATUS",
            "operator": "EQUALS"
          }
        },
        {
          "name": "cls_orderkey",
          "type": "integer",
          "expression": "o_orderkey",
            "cls": {
                "name": "SESSION_LEVEL",
                "operator": "EQUALS",
                "threshold": "'NORMAL'"
            }
        }

TODO works

  • Implement the functionality of the data control in the planner.

Copy link

coderabbitai bot commented Dec 26, 2024

Walkthrough

The pull request introduces a comprehensive enhancement to the data model security and expression normalization in the Rust codebase. New procedural macros and structs are added to support row-level and column-level security features. The changes include the implementation of comparison operators, normalized expression handling, and methods for evaluating security constraints. These modifications enable more granular access control and expression validation within the data modeling framework, with support for both Python and non-Python configurations.

Changes

File Change Summary
wren-core-base/manifest-macro/src/lib.rs Added procedural macros for row/column level security and normalized expressions
wren-core-base/src/mdl/builder.rs Added methods row_level_security and column_level_security to ColumnBuilder
wren-core-base/src/mdl/cls.rs Implemented ColumnLevelSecurity evaluation and NormalizedExpr with comparison methods
wren-core-base/src/mdl/manifest.rs Registered new macros for Python and non-Python bindings
wren-core-base/src/mdl/mod.rs Added public module declaration for cls
wren-core-base/tests/data/mdl.json Added new models with security features and relationships

Sequence Diagram

sequenceDiagram
    participant ColumnBuilder
    participant NormalizedExpr
    participant ColumnLevelSecurity

    ColumnBuilder->>NormalizedExpr: Create normalized expression
    NormalizedExpr-->>ColumnBuilder: Return normalized expr
    ColumnBuilder->>ColumnLevelSecurity: Set security parameters
    ColumnLevelSecurity->>ColumnLevelSecurity: Evaluate input
    ColumnLevelSecurity-->>ColumnBuilder: Return evaluation result
Loading

Poem

🐰 Secure and swift, our data flows,
With macros dancing in neat rows,
Row and column, now controlled,
Security's story sweetly told!
Rabbit's code, a fortress bold! 🔒


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
wren-core-base/src/mdl/cls.rs (2)

26-37: Consider handling parse errors gracefully.
Right now, numeric parsing inside eval can panic if input_expr or the threshold cannot be converted to a float. This might be okay for strictly validated inputs, but in production, it’s safer to handle potential parsing errors.


72-82: Avoid using unwrap() in Numeric comparisons.
Using unwrap() may cause unwanted panics on invalid numeric inputs. Consider using parse().ok() to handle errors gracefully or to return false when parsing fails.

wren-core-base/src/mdl/builder.rs (2)

210-216: Encourage stronger validation for rls config.
The row_level_security method sets name and operator; consider verifying that name is not empty if that is a requirement.


218-230: Check threshold usage in column_level_security.
Currently, NormalizedExpr::new(threshold) will panic on empty strings. Consider handling an empty or invalid numeric threshold gracefully.

wren-core-base/tests/data/mdl.json (4)

67-71: Consider performance implications of nested aggregation

The totalcost calculation performs a nested aggregation through relationships (sum(customer.orders.o_totalprice)). This could impact performance with large datasets. Consider:

  1. Adding appropriate indexes
  2. Implementing materialization if the calculation is frequently used
  3. Adding filters to limit the aggregation scope

113-117: Document the purpose of hash_orderkey

The purpose of hashing the order key isn't clear. If this is for security purposes, MD5 might not be the best choice as it's cryptographically broken.

Consider adding a comment explaining the intended use case.


156-159: Consider explicit column selection in view

Using SELECT * is generally discouraged as it:

  1. Makes the view sensitive to schema changes
  2. Might expose unnecessary columns
  3. Could impact performance

Consider explicitly listing required columns.


161-162: Consider specifying MySQL version compatibility

Adding a minimum supported MySQL version would help ensure compatibility with features used in the schema (especially for security features and functions like MD5).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f79314f and ea8bc72.

📒 Files selected for processing (6)
  • wren-core-base/manifest-macro/src/lib.rs (2 hunks)
  • wren-core-base/src/mdl/builder.rs (6 hunks)
  • wren-core-base/src/mdl/cls.rs (1 hunks)
  • wren-core-base/src/mdl/manifest.rs (3 hunks)
  • wren-core-base/src/mdl/mod.rs (1 hunks)
  • wren-core-base/tests/data/mdl.json (1 hunks)
🔇 Additional comments (14)
wren-core-base/src/mdl/cls.rs (3)

40-59: Validate string detection logic.
is_string relies on single quotes as a strict delimiter. This could be prone to edge cases (e.g., escaped quotes). Consider allowing or validating possible alternative string formats, or re-checking if this is the intended strict usage.


96-102: Check logical consistency of gte and lte.
gte invokes gt or eq; lte invokes lt or eq. This is correct but verify that the type mismatch logic aligns with your overall error handling strategy (i.e., no parse fallback or different data types yield false).


122-264: Thorough test coverage.
The test suite appears robust, covering string vs. numeric types, boundary conditions, and scenarios with mismatched types. This is a good practice to ensure correctness.

wren-core-base/src/mdl/manifest.rs (2)

Line range hint 27-52: Macros usage looks consistent.
All newly introduced macros (e.g., column_level_operator, column_level_security) align with the existing pattern. Their usage in Python and non-Python bindings is coherent.


47-52: Validate serde_with usage.
DeserializeFromStr and SerializeDisplay are used for advanced serialization logic. Be sure to confirm that it doesn’t conflict with existing custom deserializers.

wren-core-base/manifest-macro/src/lib.rs (3)

172-173: New fields rls and cls in Column.
Adding optional security fields is a minimal intrusiveness approach. Confirm that existing code handles these new fields gracefully in older manifests or partial updates.


348-367: Row-level security macro.
Generates a straightforward struct with name and operator. Implementation looks consistent with the pattern used in other macros.


447-467: NormalizedExpr macro approach is flexible.
Using SerializeDisplay and DeserializeFromStr in the generated struct ensures smooth serialization and deserialization. Good job.

wren-core-base/src/mdl/builder.rs (2)

398-399: Test coverage for row_level_security.
The test test_column_roundtrip includes a row-level security scenario. This further confirms that RLS is integrated into the serialization/deserialization flow.


693-708: Full integration test for RLS and CLS.
Verifying the behavior in mdl.json merges row-level and column-level security with expression-based logic. This is a comprehensive integration test.

wren-core-base/src/mdl/mod.rs (1)

21-21: Public cls module export.
Exposing cls publicly is consistent with the code changes for column-level security.

wren-core-base/tests/data/mdl.json (3)

1-4: LGTM!

The schema configuration follows standard patterns.


12-41: LGTM!

The customer model is well-structured with:

  • Appropriate column types
  • Clear calculated column definition
  • Well-documented properties

141-154: LGTM!

Relationships are well-defined with:

  • Clear join conditions using primary keys
  • Appropriate join types (ONE_TO_MANY, ONE_TO_ONE)

wren-core-base/tests/data/mdl.json Show resolved Hide resolved
wren-core-base/tests/data/mdl.json Show resolved Hide resolved
@goldmedal goldmedal requested a review from wwwy3y3 December 26, 2024 05:07
@goldmedal goldmedal merged commit cd89fd1 into Canner:main Dec 26, 2024
11 checks passed
@goldmedal goldmedal deleted the feature/add-cls-rls branch December 26, 2024 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants