Skip to content

Conversation

@wmTJc9IK0Q
Copy link

This would allow a table UDF to use an arrow to directly fill a data chunk in the FillChunk callback.

Copy link
Author

@wmTJc9IK0Q wmTJc9IK0Q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests TODO, but opening for feedback in conjunction with the linked duckdb-go PR: duckdb/duckdb-go#75

// The returned DataChunk must be destroyed with DestroyDataChunk.
// The returned ErrorData must be checked for errors and destroyed with DestroyErrorData.
func DataChunkFromArrow(conn Connection, rec arrow.RecordBatch, schema ArrowConvertedSchema) (DataChunk, ErrorData) {
func NewDataChunkFromArrow(conn Connection, rec arrow.RecordBatch) (DataChunk, ErrorData) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was renamed for clarity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the new name is more clear but unfortunately we need to stay consistent with the function names of the C API header we're wrapping (in case we ever want to generate this package). Also, the duckdb-go-bindings should be as-thin-as-possible of a wrapper of the C API functions - we're now changing the function signature by no longer passing the schema. 🤔

Maybe we can leave the current functions as they are + add one helper functions (MoveArrowToDataChunk)? Maybe two, if we really want to omit the schema, too.

And do we need the helpers on this package's level? I.e., they can't happen higher up, say in duckdb-go?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taniabogatsch Ok, that design goal makes sense. I think in that case then the right path forward would be to just change the existing DataChunkFromArrow to not create a new data chunk and instead take one to be passed in as an arg (and add back the schema arg). I will move forward with this change.

I think we still want this in this package to avoid all of the C.* and alloc calls in the duckdb-go package?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds good. 👍

// MoveArrowToDataChunk moves an Arrow RecordBatch into the provided DuckDB DataChunk using the provided Connection and ArrowConvertedSchema.
// The returned DataChunk must be destroyed with DestroyDataChunk.
// The returned ErrorData must be checked for errors and destroyed with DestroyErrorData.
func MoveArrowToDataChunk(conn Connection, rec arrow.RecordBatch, chunk DataChunk) ErrorData {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this also implicitly set the chunk size? Maybe this should happen on the duckdb side?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure - it could be an oversight when exposing the new functionality? If so, then it should be patched on the DuckDB side.

@wmTJc9IK0Q wmTJc9IK0Q marked this pull request as ready for review December 1, 2025 05:35
Copilot AI review requested due to automatic review settings December 1, 2025 05:35
Copilot finished reviewing on behalf of wmTJc9IK0Q December 1, 2025 05:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Arrow-to-DataChunk conversion API to support table UDFs that need to populate existing DataChunks with Arrow data. It introduces MoveArrowToDataChunk as a new function and renames the existing DataChunkFromArrow to NewDataChunkFromArrow.

Key Changes

  • Introduced MoveArrowToDataChunk function that moves Arrow RecordBatch data into an existing DataChunk
  • Renamed DataChunkFromArrow to NewDataChunkFromArrow to better reflect its purpose
  • Refactored the implementation to separate chunk creation from data population

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
bindings_arrow.go Adds MoveArrowToDataChunk function and refactors NewDataChunkFromArrow to use it
bindings_arrow_test.go Updates test to call renamed function NewDataChunkFromArrow
darwin-amd64/bindings_arrow.go Platform-specific variant with identical changes
darwin-amd64/bindings_arrow_test.go Platform-specific test update
darwin-arm64/bindings_arrow.go Platform-specific variant with identical changes
darwin-arm64/bindings_arrow_test.go Platform-specific test update
linux-amd64/bindings_arrow.go Platform-specific variant with identical changes
linux-amd64/bindings_arrow_test.go Platform-specific test update
linux-arm64/bindings_arrow.go Platform-specific variant with identical changes
linux-arm64/bindings_arrow_test.go Platform-specific test update
windows-amd64/bindings_arrow.go Platform-specific variant with identical changes
windows-amd64/bindings_arrow_test.go Platform-specific test update

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wmTJc9IK0Q
Copy link
Author

Copilot lost it's mind and posted the same 2 comments 12 times.

@VGSML
Copy link
Contributor

VGSML commented Dec 1, 2025

Thanks! Good point!

@VGSML
Copy link
Contributor

VGSML commented Dec 1, 2025

For me it would be better to have here the two DuckDB functions: (New)DataChunkFromArrow (with chunk creation) and FillDataChunkFromArrow. @taniabogatsch In any case it should be your decision.

@wmTJc9IK0Q
Copy link
Author

@VGSML I think we can have these both in the duckdb-go layer and only have one here.

@wmTJc9IK0Q
Copy link
Author

This isn't working right yet with the duckdb-go PR

@wmTJc9IK0Q wmTJc9IK0Q marked this pull request as draft December 1, 2025 19:19
Copy link
Collaborator

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit and then this is good to go from my side. 👍

@taniabogatsch taniabogatsch marked this pull request as ready for review December 2, 2025 14:15
@taniabogatsch
Copy link
Collaborator

Looks good from my side - could you just check the failing tests? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants