Skip to content

Conversation

@OlegWock
Copy link
Collaborator

@OlegWock OlegWock commented Oct 15, 2025

@coderabbitai ignore

Contributes to BLU-4774

There are two changes lumped together in this pull request.

First, we update code to use MaterializedTaskValue where it makes more sense instead of just TaskValue, because TaskValue can have a variant Plan which is not serializable in the current implementation and current version of DataFusion. I.e. we can't serialize it to protobuf to send in messages. Those protofbuf definitions were updated to MaterializedTaskValue which can send either a scalar or table. We didn't send plan across process boundaries anyway, so this change doesn't break anything.

Second change is an update to expression compilation to make it asynchronous. Some expressions might need access to data. This data might not be materialized yet. To materialize this data, a custom executor is used and materialization is an asynchronous function. To call it, we needed to make the whole chain of compilation asynchronous. All call sites which compiled those expressions were already async, so we didn't have to do a lot of changes. But it's just quite a bit of adding async and await where necessary.

Summary by CodeRabbit

  • Refactor

    • Core expression compilation converted to asynchronous flows for improved performance and scalability.
    • Task value serialization now only supports materialized values (scalars/tables); intermediate "plan" values must be materialized before being serialized.
  • Documentation

    • Clarified temporal handling: Spark dates require UTC (comment/docs updated).

@linear
Copy link

linear bot commented Oct 15, 2025

@OlegWock
Copy link
Collaborator Author

@coderabbitai summary

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

✅ Actions performed

Summary regeneration triggered.

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

📝 Walkthrough

Walkthrough

This PR renames the protobuf TaskValue to MaterializedTaskValue (removing the Plan variant) and updates all Rust conversions/usages to use the new materialized type. It removes Plan serialization/deserialization paths and associated fingerprinting for Plan values. Separately, the expression compiler entrypoint and nearly all compile_* functions were changed from synchronous to asynchronous (using async recursion), and all callers were updated to await compile results. Minor async fixes applied in task/runtime/transform call sites and tests.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant ProtoLayer as Protobuf
    participant Serializer as Serializer
    participant TaskGraph as TaskGraph

    Note over ProtoLayer,Serializer #DDEBF7: MaterializedTaskValue replaces TaskValue (no Plan)
    Caller->>Serializer: build Task with Value
    Serializer->>ProtoLayer: ProtoMaterializedTaskValue::try_from(&value)
    ProtoLayer-->>Serializer: bytes (scalar|table)
    Serializer->>TaskGraph: embed MaterializedTaskValue in Task proto
    TaskGraph-->>Caller: serialized Task
Loading
sequenceDiagram
    participant Requester
    participant Compile as compile (async)
    participant Variants as compile_* (async)
    participant Eval as eval callers

    Requester->>+Compile: compile(expr,...).await
    Compile->>+Variants: compile_xxx(...).await
    Variants->>Variants: await nested compile(...) calls
    Variants-->>-Compile: Result<Expr>
    Compile-->>-Requester: Result<Expr>
    Note over Eval,Compile #F7F6DD: Call sites updated to use .await? when invoking compile
Loading

Possibly related PRs

Suggested reviewers

  • mfranczel
  • saltenasl

Pre-merge checks

❌ Failed checks (4 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The PR title focuses on addressing TODOs related to a custom executor but does not reflect the substantial migration of TaskValue to MaterializedTaskValue or the widespread conversion of compile functions to async, making it vague and misleading. Rename the PR title to reflect the primary changes, for example “Migrate TaskValue to MaterializedTaskValue and convert compile functions to async,” so it clearly summarizes the main modifications in the changeset.
Linked Issues Check ⚠️ Warning The linked issue BLU-4774 calls for Python-side LogicalPlan serialization fixes, adding tests for the new changes, and resolving scattered TODOs, but this PR mainly performs internal refactors (proto message renaming and async migration) without introducing any tests or addressing the Python API serialization problem. Add tests covering the new TaskValue and async compile code paths, implement or document the LogicalPlan serialization improvements for the Python API, and explicitly resolve the remaining TODOs that relate to those objectives.
Out of Scope Changes Check ⚠️ Warning A significant portion of this PR—such as converting all compiler functions to async and overhauling TaskValue proto types—goes beyond resolving TODOs or addressing Python API issues, indicating that many changes are unrelated to the linked issue’s stated objectives. Split out the large async migrations and proto reorganizations into separate, focused PRs with clear scopes or link them directly to the specific TODOs or API improvements they address.
Docstring Coverage ⚠️ Warning Docstring coverage is 59.65% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Comment @coderabbitai help to get the list of available commands and usage tips.

@OlegWock OlegWock force-pushed the oleh/blu-4774-address-todos branch from ade982e to b1f5d1d Compare October 16, 2025 07:38
@OlegWock OlegWock changed the base branch from oleh/blu-4774-remove-unusable-public-api to main October 16, 2025 07:39
@OlegWock
Copy link
Collaborator Author

@coderabbitai full review but no yapping please, only highlight big issues re security, correctness, or performance

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

@OlegWock: Triggering full review focused on security, correctness, and performance issues only.

✅ Actions performed

Full review triggered.

@OlegWock OlegWock marked this pull request as ready for review October 16, 2025 10:22
@OlegWock OlegWock requested a review from mfranczel October 16, 2025 11:15
Copy link

@mfranczel mfranczel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; with a few non-blocking comments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider adding a test for plan materialization

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be in separate PR along with other tests for custom executor

@OlegWock OlegWock merged commit 965e9eb into main Oct 17, 2025
9 checks passed
@OlegWock OlegWock deleted the oleh/blu-4774-address-todos branch October 17, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants