Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose named_struct in python #692

Closed
timsaucer opened this issue May 13, 2024 · 0 comments · Fixed by #700
Closed

Expose named_struct in python #692

timsaucer opened this issue May 13, 2024 · 0 comments · Fixed by #700
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we can only create a struct of expressions using datafusion.functions.struct which assigns fixed field names of c0, c1, and so on. This is difficult to work with. In the rust implementation there is a named_struct function which would serve the purpose.

Describe the solution you'd like
In an ideal world, the name of the field in a struct would come from the name of the expression. It would be great to do something like

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

And then the struct would contain field names a, b, and c.

From a brief look at the code this may not be simple to implement. If that is not feasible, I would at least like to expose the named_struct function in the python code.

Describe alternatives you've considered
No additional alternatives I have considered beyond the two described above.

Additional context
Minimal example showing current state:

from datafusion import SessionContext, col, functions as F
import pyarrow as pa

ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
    [pa.array([1, 2, 3]), pa.array([4, 5, 6]), pa.array([7, 8, 9])],
    names=["a", "b", "c"],
)

df = ctx.create_dataframe([[batch]])

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

df.show()

Creates

DataFrame()
+---+---+---+-----------------------+
| a | b | c | d                     |
+---+---+---+-----------------------+
| 1 | 4 | 7 | {c0: 1, c1: 4, c2: 7} |
| 2 | 5 | 8 | {c0: 2, c1: 5, c2: 8} |
| 3 | 6 | 9 | {c0: 3, c1: 6, c2: 9} |
+---+---+---+-----------------------+
@timsaucer timsaucer added the enhancement New feature or request label May 13, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue May 15, 2024
andygrove pushed a commit that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant