Skip to content

Conversation

@shcheklein
Copy link
Member

@shcheklein shcheklein commented Oct 31, 2025

Makes it a bit easier to understand what is going on in certain user scenarios.

Summary by Sourcery

Disallow complex object columns in SQL functions by raising descriptive errors and add fallback resolution for complex column types.

Enhancements:

  • Guard against passing Pydantic model columns to SQL functions (min, collect, greatest) by raising a DataChainParamsError with guidance on using leaf fields or UDFs
  • Add fallback logic in get_db_col_type to resolve complex object column types via subtree lookup when initial name resolution fails

Tests:

  • Add functional tests in aggregate and conditional modules to assert DataChainParamsError is raised for collect, min, and greatest with complex object columns
  • Add unit tests for func.get_column to verify errors on complex object columns

@shcheklein shcheklein requested a review from dreadatour October 31, 2025 23:22
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

This PR introduces strict guards in SQL function handling to disallow Pydantic-based complex object columns with clear error messages, adds a fallback resolution path for unresolved columns, and includes targeted tests to ensure these safeguards in aggregate, conditional, and unit contexts.

Sequence diagram for error handling when complex column is passed to SQL function

sequenceDiagram
    participant User
    participant Func
    participant SignalSchema
    participant ModelStore
    participant Error
    User->>Func: Call SQL function with column
    Func->>SignalSchema: get_column_type(arg, with_subtree=True)
    SignalSchema-->>Func: Return column type
    Func->>ModelStore: is_pydantic(column_type)
    ModelStore-->>Func: Return True (if complex)
    Func->>Error: Raise DataChainParamsError with message
    Error-->>User: Error message: "Function X doesn't support complex object columns..."
Loading

Class diagram for updated error handling in func.py

classDiagram
    class Func {
        +get_column(signals_schema, label, table)
        +_db_cols
        +name
    }
    class SignalSchema {
        +get_column_type(name, with_subtree)
    }
    class ModelStore {
        +is_pydantic(type)
    }
    class DataChainParamsError
    Func --> SignalSchema : uses
    Func --> ModelStore : uses
    Func --> DataChainParamsError : raises
Loading

Class diagram for updated get_db_col_type fallback logic

classDiagram
    class Func {
        +get_db_col_type(signals_schema, col)
    }
    class SignalSchema {
        +get_column_type(name)
        +get_column_type(name, with_subtree)
    }
    class SignalResolvingError
    Func --> SignalSchema : calls
    Func --> SignalResolvingError : handles
Loading

File-Level Changes

Change Details Files
Prevent usage of complex Pydantic object columns in SQL functions and improve type resolution fallbacks
  • Added a guard in get_column that detects Pydantic types and raises DataChainParamsError with usage guidance
  • Wrapped get_column_type in get_db_col_type with try/catch on SignalResolvingError to fallback to subtree resolution
src/datachain/func/func.py
Add tests for disallowed complex object columns in aggregate, conditional, and unit functions
  • Inserted pytest cases in test_aggregate for collect and min on complex objects
  • Added test in test_conditional to assert error for func.greatest with complex objects
  • Created unit test in test_func to validate get_column raises when passed complex types
tests/func/functions/test_aggregate.py
tests/func/functions/test_conditional.py
tests/unit/test_func.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@shcheklein shcheklein requested a review from a team October 31, 2025 23:22
@shcheklein shcheklein self-assigned this Oct 31, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider extracting the complex‐object guard logic into a shared helper so you can reuse it in both get_column and get_db_col_type and keep the error messaging consistent.
  • The SignalResolvingError import inside get_db_col_type could be moved to the module level, and you may want to narrow or log the fallback catch so it doesn’t silently mask other schema resolution issues.
  • It might help to centralize the error message text (e.g. in a constant) to ensure that all SQL functions report the same actionable guidance when a complex column is passed.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the complex‐object guard logic into a shared helper so you can reuse it in both `get_column` and `get_db_col_type` and keep the error messaging consistent.
- The `SignalResolvingError` import inside `get_db_col_type` could be moved to the module level, and you may want to narrow or log the fallback catch so it doesn’t silently mask other schema resolution issues.
- It might help to centralize the error message text (e.g. in a constant) to ensure that all SQL functions report the same actionable guidance when a complex column is passed.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@shcheklein shcheklein force-pushed the guard-complex-types-func branch from a18c0de to 2ce1776 Compare October 31, 2025 23:23
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 31, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9ee76ff
Status: ✅  Deploy successful!
Preview URL: https://dee042d2.datachain-documentation.pages.dev
Branch Preview URL: https://guard-complex-types-func.datachain-documentation.pages.dev

View logs

@shcheklein shcheklein force-pushed the guard-complex-types-func branch from 2ce1776 to f9a9c58 Compare October 31, 2025 23:27
@shcheklein shcheklein force-pushed the guard-complex-types-func branch from f9a9c58 to 9ee76ff Compare November 1, 2025 00:51
@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.91%. Comparing base (cdf8ee9) to head (9ee76ff).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1441   +/-   ##
=======================================
  Coverage   87.90%   87.91%           
=======================================
  Files         160      160           
  Lines       15300    15309    +9     
  Branches     2206     2210    +4     
=======================================
+ Hits        13450    13459    +9     
  Misses       1336     1336           
  Partials      514      514           
Flag Coverage Δ
datachain 87.86% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/func/func.py 85.10% <100.00%> (+0.59%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍

@shcheklein shcheklein merged commit 5deb278 into main Nov 3, 2025
38 checks passed
@shcheklein shcheklein deleted the guard-complex-types-func branch November 3, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants