Skip to content

20222: Provide session to the udtf call#230

Open
martin-augment wants to merge 1 commit intomainfrom
pr-20222-2026-02-09-07-42-48
Open

20222: Provide session to the udtf call#230
martin-augment wants to merge 1 commit intomainfrom
pr-20222-2026-02-09-07-42-48

Conversation

@martin-augment
Copy link
Owner

20222: To review by AI

This patch adds the passing of the current session to the UDTF call.
This helps implement session-dependent functions, for example, a function
that returns the list of registered tables.
@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Walkthrough

The PR refactors the TableFunctionImpl trait to pass additional context to table functions. A new TableFunctionArgs<'a> struct is introduced containing both the function arguments and a session reference. The trait method signature changes from call(&self, exprs: &[Expr]) to call_with_args(&self, args: TableFunctionArgs). All implementations of TableFunctionImpl across multiple crates are updated to use the new method, with expressions extracted from args.args. Call sites are updated to construct and pass the new TableFunctionArgs structure. The deprecated call method receives a default implementation returning an error. A new example table_list_udtf is added, and documentation is updated to reflect the API changes.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20222-2026-02-09-07-42-48

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the User-Defined Table Function (UDTF) mechanism by providing UDTF implementations with access to the current session context. This change allows UDTFs to become more dynamic and context-aware, enabling them to interact with the DataFusion session state, such as listing available tables or accessing session-specific configurations. The update involves an API modification to the TableFunctionImpl trait, requiring existing UDTFs to adapt to the new signature, and includes a new example demonstrating the utility of this feature.

Highlights

  • UDTF API Change: The TableFunctionImpl trait has been updated to include a new method, call_with_args, which now accepts a TableFunctionArgs struct. This struct encapsulates both the function arguments (Exprs) and a reference to the current Session.
  • Session Context for UDTFs: User-Defined Table Functions (UDTFs) can now access the Session context, enabling them to perform operations that depend on the session state, such as querying registered catalogs, schemas, and tables.
  • Deprecation of Old API: The previous call method in TableFunctionImpl and create_table_provider in TableFunction have been deprecated in favor of the new call_with_args and create_table_provider_with_args methods, respectively.
  • New Example UDTF: A new example UDTF, table_list_udtf, has been added to demonstrate how to leverage the session context to list all tables registered within the current session.
  • FFI Support: The Foreign Function Interface (FFI) for UDTFs has been extended to support passing the session context to external UDTF implementations, ensuring compatibility with foreign language bindings.
Changelog
  • datafusion-cli/src/functions.rs
    • Imported TableFunctionArgs.
    • Updated TableFunctionImpl implementations (e.g., ParquetMetadataFunc, MetadataCacheFunc) to use call_with_args and extract exprs from the new TableFunctionArgs struct.
  • datafusion-examples/examples/udf/main.rs
    • Added table_list_udtf module.
    • Included TableListUdtf in the ExampleKind enum and the example runner.
  • datafusion-examples/examples/udf/simple_udtf.rs
    • Imported TableFunctionArgs.
    • Updated LocalCsvTableFunc to use call_with_args and extract exprs from TableFunctionArgs.
  • datafusion-examples/examples/udf/table_list_udtf.rs
    • Added a new file implementing TableListUdtf, a UDTF that lists tables from the session state, demonstrating the use of the new TableFunctionArgs.
  • datafusion/catalog/src/table.rs
    • Introduced TableFunctionArgs struct to hold UDTF arguments and a reference to the Session.
    • Added call_with_args method to TableFunctionImpl trait and marked the old call method as deprecated.
    • Added create_table_provider_with_args to TableFunction and deprecated create_table_provider.
  • datafusion/core/src/execution/session_state.rs
    • Imported TableFunctionArgs.
    • Modified create_table_source to use create_table_provider_with_args and pass the session context to UDTFs.
  • datafusion/core/tests/user_defined/user_defined_table_functions.rs
    • Imported TableFunctionArgs.
    • Updated SimpleCsvTableFunc, NoOpTableFunc, and TestUDTF implementations to use the new call_with_args signature.
    • Modified test calls to UDTFs to pass TableFunctionArgs including the session context.
  • datafusion/ffi/src/udtf.rs
    • Imported TableFunctionArgs and Session.
    • Added call_with_args to FFI_TableFunction struct for foreign UDTF implementations.
    • Implemented call_with_args_wrapper to correctly handle session passing for FFI-based UDTFs.
    • Modified ForeignTableFunction to use the new call_with_args and pass the session reference.
  • datafusion/functions-table/src/generate_series.rs
    • Imported TableFunctionArgs.
    • Updated GenerateSeriesFuncImpl, GenerateSeriesFunc, and RangeFunc to use call_with_args and extract exprs from TableFunctionArgs.
  • docs/source/library-user-guide/functions/adding-udfs.md
    • Updated documentation to reflect the API change from call to call_with_args and the introduction of TableFunctionArgs.
    • Updated code examples to match the new UDTF implementation signature.
Activity
  • The core TableFunctionImpl trait in datafusion/catalog/src/table.rs was modified to introduce call_with_args and deprecate the old call method, enabling session context passing.
  • The TableFunctionArgs struct was added to encapsulate UDTF arguments and the session reference.
  • All existing UDTF implementations across various DataFusion crates (datafusion-cli, datafusion-examples, datafusion/core, datafusion/ffi, datafusion/functions-table) were updated to conform to the new call_with_args signature.
  • A new example UDTF, table_list_udtf, was added to datafusion-examples/examples/udf/table_list_udtf.rs to showcase the new session-aware capabilities.
  • The FFI layer (datafusion/ffi/src/udtf.rs) was adapted to correctly pass the session context to foreign UDTF implementations.
  • Documentation in docs/source/library-user-guide/functions/adding-udfs.md was updated to guide users on implementing UDTFs with the new API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by allowing User-Defined Table Functions (UDTFs) to access session state. This is achieved by introducing a new call_with_args method on the TableFunctionImpl trait, which accepts TableFunctionArgs containing both the expressions and the session. The old call method is deprecated, and a smooth transition path is provided. The changes are consistently applied across the codebase, including core logic, FFI, examples, and documentation.

I've found one critical issue in the new example table_list_udtf.rs, where block_on is used in a way that will likely cause a panic at runtime. This needs to be addressed before merging.

continue;
};
for table_name in schema.table_names() {
let Some(provider) = block_on(schema.table(&table_name))? else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using futures::executor::block_on inside call_with_args is problematic. call_with_args is a synchronous function, but it's called during query planning, which is often initiated from an asynchronous context (e.g., SessionContext::sql). The futures::executor::block_on function will panic if it's called from within an existing async executor, which will be the case here, causing the query to fail at runtime.

This seems to expose a larger issue with the sync nature of call_with_args and the async nature of other APIs it needs to call, like schema.table(). While a full fix might require making call_with_args async (a larger change), this example should avoid this pattern. Perhaps there's a synchronous way to achieve this, or the example needs to be rethought to not require calling an async method from this synchronous context.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! It is not a good idea to mix several async runtimes in one application. Since Tokio runtime is already in use it would be good to re-use it here too. Prevents runtime errors due to mixing more than one async runtimes.

@claude
Copy link

claude bot commented Feb 9, 2026

PR Review: Provide session to the UDTF call

Summary

This PR adds session context to User-Defined Table Functions (UDTFs), enabling session-dependent functionality. The implementation is well-structured with proper backward compatibility handling.


✅ Strengths

1. Excellent Backward Compatibility Design

  • Deprecates old call() method instead of breaking changes
  • Provides default implementation that delegates to old method
  • All existing UDTFs continue working without modification
  • Clear migration path with helpful deprecation messages

2. Good API Design

The new TableFunctionArgs struct (datafusion/catalog/src/table.rs:489-494) is clean and extensible, allowing future additions without breaking changes.

3. Comprehensive FFI Support

The FFI layer (datafusion/ffi/src/udtf.rs:54-58) properly handles the new API with proper session serialization across FFI boundaries.

4. Good Example Implementation

The table_list_udtf.rs example effectively demonstrates the feature with a real use case.


🔍 Issues & Concerns

1. CRITICAL: Blocking Async Call in Example

Location: datafusion-examples/examples/udf/table_list_udtf.rs:99

Using block_on() in a potentially async context can cause deadlocks, poor performance, and runtime panics.

Recommendation: Make call_with_args async, OR document that implementations must not use block_on, OR provide an async-safe alternative.

2. Session Downcasting Fragility

Location: datafusion-examples/examples/udf/table_list_udtf.rs:76-82

The downcast to SessionState is fragile across FFI boundaries. Consider adding helper methods to Session trait for common operations.

3. Missing Memory Efficiency in Example

The example clones catalog/schema/table names repeatedly which could be memory-intensive for large catalogs.

4. FFI Session Handling Complexity

Location: datafusion/ffi/src/udtf.rs:123-136

The session handling pattern is complex and error-prone. Consider refactoring for clarity.


🧪 Test Coverage

What's Covered:

  • Basic UDTF type coercion test updated
  • SimpleCsvTableFunc migrated
  • FFI test updated

What's Missing:

  1. No test for session access pattern
  2. No FFI session test
  3. No concurrent access test

🔒 Security Considerations

No significant security concerns, but note:

  • Session access gives UDTFs broad permissions to enumerate all tables/catalogs
  • FFI downcasting could potentially cause issues with malicious session implementations
  • Consider documenting security implications

✅ Recommendations Summary

Must Fix (P0):

  1. Address block_on() usage in example

Should Fix (P1):
2. Add helper methods to Session trait to avoid fragile downcasting
3. Improve test coverage for session access patterns
4. Add more comprehensive documentation

Nice to Have (P2):
5. Refactor FFI session handling for clarity
6. Optimize example for memory efficiency
7. Add migration guide


🎯 Final Verdict

Overall Assessment: Good implementation with some concerns

The PR successfully adds session access to UDTFs with:

  • ✅ Excellent backward compatibility
  • ✅ Clean API design
  • ✅ Comprehensive FFI support
  • ⚠️ Example has blocking async issue
  • ⚠️ Needs better test coverage
  • ⚠️ Documentation could be more comprehensive

Recommendation: Approve with changes - address the block_on() issue and improve test coverage before merging.

Great work on maintaining backward compatibility! The deprecation strategy is exemplary.

@augmentcode
Copy link

augmentcode bot commented Feb 9, 2026

🤖 Augment PR Summary

Summary: This PR extends DataFusion’s UDTF/table-function interface to provide access to the calling session, enabling session-aware table functions.

Changes:

  • Introduce TableFunctionArgs (arguments + session) and add TableFunctionImpl::call_with_args, deprecating the legacy call API.
  • Update table-function planning/execution to pass the session into UDTF calls.
  • Migrate existing UDTF implementations (CLI functions, generate_series/range, tests, examples) to the new call_with_args signature.
  • Extend the FFI UDTF ABI with a session-aware call_with_args entrypoint.
  • Add a new table_list UDTF example that lists tables visible in the current session.
  • Refresh the user guide’s UDTF section to show the updated API.

Technical Notes: Existing implementations can keep using call via the provided shims, but session-dependent UDTFs should implement call_with_args and use args.session.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

note = "Implement `TableFunctionImpl::call_with_args` instead"
)]
fn call(&self, _args: &[Expr]) -> Result<Arc<dyn TableProvider>> {
internal_err!("unimplemented")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default deprecated TableFunctionImpl::call implementation returns internal_err!("unimplemented"), which can leak out to callers of the deprecated API as an “internal” engine error. Consider making this error more descriptive (e.g., pointing users to call_with_args) so it’s clearer what action is needed.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The call() method is deprecated by using the macro attribute but its error message could be improved to mention the new call_with_args() method too. This will spare the time of the developer to look in the rustdoc/sources to figure out the new recommendation.

The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the
argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a
single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`.
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says TableFunctionImpl has a “single method” and that TableFunctionArgs only includes the argument Exprs, but TableFunctionArgs also carries the calling session (and call still exists as deprecated). It may be worth updating the wording to reflect the actual API surface so readers don’t miss the session parameter.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The documentation is updated by replacing call() with call_with_args(), so it will be good to mention that the session state is also available now.

));

#[expect(deprecated)]
let table_provider = rresult_return!(udtf_inner.call(&args));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFI_TableFunction::call still dispatches to the deprecated TableFunctionImpl::call, so a UDTF that only implements call_with_args (as now recommended) will return "unimplemented" when invoked via the legacy FFI entrypoint. This seems like it could surprise FFI consumers unless they reliably switch to call_with_args when available.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The FFI users should be notified somehow that the call() function is now deprecated and they should use the new call_with_args() method instead.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
datafusion-examples/examples/udf/main.rs (1)

70-81: ⚠️ Potential issue | 🟡 Minor

Update the module-level usage list to include table_list_udtf.
The new example is wired in, but the doc list near Line 24 doesn't mention it, which makes the CLI usage docs stale.

✏️ Suggested doc update
-//! cargo run --example udf -- [all|adv_udaf|adv_udf|adv_udwf|async_udf|udaf|udf|udtf|udwf]
+//! cargo run --example udf -- [all|adv_udaf|adv_udf|adv_udwf|async_udf|udaf|udf|udtf|udwf|table_list_udtf]
@@
 //! - `udtf`
 //!   (file: simple_udtf.rs, desc: Simple UDTF example)
+//!
+//! - `table_list_udtf`
+//!   (file: table_list_udtf.rs, desc: Session-aware UDTF table list example)
🤖 Fix all issues with AI agents
In `@datafusion-examples/examples/udf/table_list_udtf.rs`:
- Line 99: The sync UDTF method call_with_args currently uses
futures::executor::block_on when calling schema.table(&table_name), which will
panic inside a tokio runtime; replace that blocking call with a tokio-safe
approach: either run the async call inside tokio::task::block_in_place combined
with tokio::runtime::Handle::current().block_on(schema.table(&table_name)) or,
better, refactor to avoid async work in call_with_args by precomputing/caching
the table metadata outside the call_with_args implementation and using that
cache instead; update the call site in call_with_args (the
block_on(schema.table(&table_name)) invocation) to one of these tokio-safe
alternatives and ensure error handling remains consistent.

In `@docs/source/library-user-guide/functions/adding-udfs.md`:
- Around line 1391-1395: Update the documentation sentence describing the
`call_with_args` method to note that `TableFunctionArgs` carries both the
function arguments (a slice of `Expr`s) and the `session` field; mention
`session` as the key addition in this PR and, where relevant, suggest using
`session` for any validation or context needed when `call_with_args` parses
`Expr`s and returns a `Result<Arc<dyn TableProvider>>`.
🧹 Nitpick comments (2)
datafusion/ffi/src/udtf.rs (1)

48-58: call_with_args field is private while call is public — is this intentional?

call on line 50 is pub, but call_with_args on line 54 is private. This works because ForeignTableFunction is in the same module, but the asymmetry is notable. If this is intentional (to discourage direct use of the raw FFI function pointer), consider adding a brief comment explaining the rationale.

datafusion-examples/examples/udf/table_list_udtf.rs (1)

76-82: Downcast to SessionState limits portability across session implementations.

The downcast_ref::<SessionState>() will fail for non-standard session types (e.g., ForeignSession from the FFI path). This is acceptable for an example, but consider adding a doc comment noting this limitation so users adapting this pattern for production are aware.

continue;
};
for table_name in schema.table_names() {
let Some(provider) = block_on(schema.table(&table_name))? else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n 'block_on' --type rust -C3 -g '!target/**' | head -100

Repository: martin-augment/datafusion

Length of output: 7540


🏁 Script executed:

# Check if the file actually uses block_on and understand the context
cat datafusion-examples/examples/udf/table_list_udtf.rs | head -130 | tail -50

Repository: martin-augment/datafusion

Length of output: 1766


🏁 Script executed:

# Check if there are other UDTF examples or table functions that handle async differently
find datafusion-examples -name "*.rs" -type f | xargs grep -l "TableFunctionImpl" | head -5

Repository: martin-augment/datafusion

Length of output: 168


🏁 Script executed:

# Check the actual usage in main.rs of the examples to understand execution context
find datafusion-examples -name "main.rs" -type f | xargs grep -l "table_list_udtf\|tokio" | head -3

Repository: martin-augment/datafusion

Length of output: 224


🏁 Script executed:

# Check the full table_list_udtf.rs file to see how it's executed
cat datafusion-examples/examples/udf/table_list_udtf.rs

Repository: martin-augment/datafusion

Length of output: 4380


🏁 Script executed:

# Check the simple_udtf.rs example for comparison
cat datafusion-examples/examples/udf/simple_udtf.rs | head -80

Repository: martin-augment/datafusion

Length of output: 3055


🏁 Script executed:

# Look for any main function or how the example uses tokio
rg -n "tokio::|#\[tokio|#\[test\]|fn main" datafusion-examples/examples/udf/table_list_udtf.rs

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check if there's documentation about TableFunctionImpl trait
rg -n "trait TableFunctionImpl" --type rust -A 10 | head -30

Repository: martin-augment/datafusion

Length of output: 818


🏁 Script executed:

# Look for main.rs files in examples/udf directory
find datafusion-examples/examples/udf -name "*.rs" -type f

Repository: martin-augment/datafusion

Length of output: 547


🏁 Script executed:

# Check if there's an integration test or main file that shows how table_list_udtf is called
rg -n "table_list_udtf\|async fn main" datafusion-examples/examples/ -A 3 | head -50

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check Cargo.toml to understand how examples are meant to be run
cat datafusion-examples/Cargo.toml | head -60

Repository: martin-augment/datafusion

Length of output: 2259


🏁 Script executed:

# Search for any documentation on how UDTF planning happens - is it sync or within async context?
rg -n "call_with_args|planning" datafusion/catalog/src/table.rs -B 2 -A 5 | head -40

Repository: martin-augment/datafusion

Length of output: 1156


🏁 Script executed:

# Check the main.rs file in examples/udf
cat datafusion-examples/examples/udf/main.rs

Repository: martin-augment/datafusion

Length of output: 4102


🏁 Script executed:

# Check if call_with_args is invoked during planning or execution - look at how table functions are invoked
rg -n "call_with_args|TableFunctionImpl" datafusion/core/src/ -B 2 -A 3 | head -60

Repository: martin-augment/datafusion

Length of output: 3928


🏁 Script executed:

# Check if the invocation happens within a tokio context or async context
rg -n "fn analyze_table_function|impl.*Analyzer" datafusion/core/src/ -B 2 -A 8 | head -50

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Find where call_with_args is invoked in the codebase
rg -n "call_with_args" datafusion/core/src/ -B 3 -A 3 | head -80

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check if table functions are resolved during planning or execution
rg -n "TableFunctionImpl|table_function" datafusion-optimizer/src/ -B 2 -A 2 | head -60

Repository: martin-augment/datafusion

Length of output: 134


🏁 Script executed:

# Look for the logical plan building for table functions
rg -n "TableFunction|table_list" datafusion-sql/src/ -B 1 -A 3 | head -50

Repository: martin-augment/datafusion

Length of output: 128


🏁 Script executed:

# Find where call_with_args is invoked
rg -n "call_with_args" --type rust

Repository: martin-augment/datafusion

Length of output: 2945


🏁 Script executed:

# Check which directories exist in datafusion
ls -la datafusion/ | head -20

Repository: martin-augment/datafusion

Length of output: 1205


🏁 Script executed:

# Look for table function resolution during planning
rg -n "TableFunction|\.table\(" datafusion/ -B 2 -A 3 --type rust | head -100

Repository: martin-augment/datafusion

Length of output: 7789


🏁 Script executed:

# Find get_table_function_source implementation
rg -n "get_table_function_source" --type rust -B 2 -A 10 | head -80

Repository: martin-augment/datafusion

Length of output: 4007


🏁 Script executed:

# Check if table functions are resolved sync or async
rg -n "fn get_table_function_source|impl.*ContextProvider" datafusion/core/src/ -A 8 | head -80

Repository: martin-augment/datafusion

Length of output: 5734


🏁 Script executed:

# Look at how SessionState provides table functions
rg -n "table_functions|get_table_function" datafusion/core/src/execution/session_state.rs -B 2 -A 5 | head -80

Repository: martin-augment/datafusion

Length of output: 3651


🏁 Script executed:

# Get the full implementation of get_table_function_source
cat datafusion/core/src/execution/session_state.rs | sed -n '1827,1860p'

Repository: martin-augment/datafusion

Length of output: 1327


🏁 Script executed:

# Check if schema.table() is async
rg -n "async.*fn table\(" datafusion/catalog/src/ --type rust -B 2 -A 5 | head -50

Repository: martin-augment/datafusion

Length of output: 3533


🏁 Script executed:

# Check how ctx.sql() works - does it synchronously plan while in async context?
rg -n "async fn sql\|pub async fn sql" datafusion/core/src/ -A 10 | head -60

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Look for tests of table functions to see if they handle block_on safely
rg -n "block_on|TableFunction" datafusion/core/tests/user_defined/user_defined_table_functions.rs -B 2 -A 5 | head -80

Repository: martin-augment/datafusion

Length of output: 1177


🏁 Script executed:

# Check if the example actually runs and whether it panics
cat datafusion-examples/examples/udf/table_list_udtf.rs | grep -A 5 "pub async fn table_list_udtf"

Repository: martin-augment/datafusion

Length of output: 300


🏁 Script executed:

# Check if there are any tests that actually run table_list_udtf
rg -n "table_list" datafusion/core/tests/ --type rust -B 2 -A 2

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check if block_on can somehow be safe in this context by looking at tokio version
cat Cargo.lock | grep -A 3 "name = \"tokio\"" | head -10

Repository: martin-augment/datafusion

Length of output: 245


🏁 Script executed:

# Verify the actual behavior: does planning happen within tokio context?
cat datafusion/core/src/execution/context/mod.rs | sed -n '1,100p' | grep -A 20 "async fn sql"

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Check if table_list_udtf is tested anywhere
rg -n "table_list_udtf\|TableListUdtf" --type rust

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

# Look for integration tests that would run the example
find . -name "*.rs" -path "*/tests/*" | xargs grep -l "examples\|table_list" 2>/dev/null | head -5

Repository: martin-augment/datafusion

Length of output: 241


🏁 Script executed:

# Check documentation or comments about block_on safety
rg -n "futures::executor::block_on" --type rust -B 5 -A 2 | head -100

Repository: martin-augment/datafusion

Length of output: 3489


Use a tokio-safe alternative to block on async work within a sync context.

futures::executor::block_on() panics if called from within an active tokio runtime. Since this UDTF's call_with_args is invoked during SQL planning (which runs within the tokio context from #[tokio::main]), consider using tokio::task::block_in_place() combined with Handle::current().block_on(), or refactor to pre-compute table metadata before planning.

Suggested fix
-use futures::executor::block_on;
+use tokio::task;

At the call site:

-                    let Some(provider) = block_on(schema.table(&table_name))? else {
+                    let Some(provider) = task::block_in_place(|| {
+                        tokio::runtime::Handle::current().block_on(schema.table(&table_name))
+                    })? else {

Alternatively, if you know a tokio runtime is always available, you could cache the table list outside the table function to avoid calling async code in the sync trait method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let Some(provider) = block_on(schema.table(&table_name))? else {
let Some(provider) = task::block_in_place(|| {
tokio::runtime::Handle::current().block_on(schema.table(&table_name))
})? else {
🤖 Prompt for AI Agents
In `@datafusion-examples/examples/udf/table_list_udtf.rs` at line 99, The sync
UDTF method call_with_args currently uses futures::executor::block_on when
calling schema.table(&table_name), which will panic inside a tokio runtime;
replace that blocking call with a tokio-safe approach: either run the async call
inside tokio::task::block_in_place combined with
tokio::runtime::Handle::current().block_on(schema.table(&table_name)) or,
better, refactor to avoid async work in call_with_args by precomputing/caching
the table metadata outside the call_with_args implementation and using that
cache instead; update the call site in call_with_args (the
block_on(schema.table(&table_name)) invocation) to one of these tokio-safe
alternatives and ensure error handling remains consistent.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! It is not a good idea to mix several async runtimes in one application. Since Tokio runtime is already in use it would be good to re-use it here too. Prevents runtime errors due to mixing more than one async runtimes.

Comment on lines +1391 to 1395
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Passed struct includes function arguments as a slice of `Expr`s.

In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mention that TableFunctionArgs also carries the session.
The new sentence only references args, but the struct also exposes session, which is the key addition of this PR.

✏️ Suggested wording update
-single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
-Passed struct includes function arguments as a slice of `Expr`s.
+single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
+The struct includes the function arguments (`args`) and the current session (`session`).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Passed struct includes function arguments as a slice of `Expr`s.
In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
The struct includes the function arguments (`args`) and the current session (`session`).
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
🤖 Prompt for AI Agents
In `@docs/source/library-user-guide/functions/adding-udfs.md` around lines 1391 -
1395, Update the documentation sentence describing the `call_with_args` method
to note that `TableFunctionArgs` carries both the function arguments (a slice of
`Expr`s) and the `session` field; mention `session` as the key addition in this
PR and, where relevant, suggest using `session` for any validation or context
needed when `call_with_args` parses `Expr`s and returns a `Result<Arc<dyn
TableProvider>>`.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The documentation is updated by replacing call() with call_with_args(), so it will be good to mention that the session state is also available now.

@martin-augment
Copy link
Owner Author

1. CRITICAL: Blocking Async Call in Example

Location: datafusion-examples/examples/udf/table_list_udtf.rs:99

Using block_on() in a potentially async context can cause deadlocks, poor performance, and runtime panics.

Recommendation: Make call_with_args async, OR document that implementations must not use block_on, OR provide an async-safe alternative.

value:useful; category:bug; feedback: The Claude AI reviewer is correct! It is not a good idea to mix several async runtimes in one application. Since Tokio runtime is already in use it would be good to re-use it here too. Prevents runtime errors due to mixing more than one async runtimes.

@martin-augment
Copy link
Owner Author

2. Session Downcasting Fragility

Location: datafusion-examples/examples/udf/table_list_udtf.rs:76-82

The downcast to SessionState is fragile across FFI boundaries. Consider adding helper methods to Session trait for common operations.

3. Missing Memory Efficiency in Example

The example clones catalog/schema/table names repeatedly which could be memory-intensive for large catalogs.

value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! This is just an example that shows how to use the provided session state in a UDTF. It is not supposed to be used in FFI environment or to be optimal.

@martin-augment
Copy link
Owner Author

70-81: ⚠️ Potential issue | 🟡 Minor

Update the module-level usage list to include table_list_udtf. The new example is wired in, but the doc list near Line 24 doesn't mention it, which makes the CLI usage docs stale.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The "usage" of the UDTF examples should be updated to mention the new table_list_udtf example too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments