Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FromSlice in favor of From impl in upstream arrow-rs code #6587

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 7, 2023

Which issue does this PR close?

Part of #1754 (see #1754 (comment))

Rationale for this change

I am trying to extract physical_plan into its own crate to improve modularity and compile times -- FromSlice is currently defined in the core crate and used in physical_plan

I am also trying to make sure the organization of the DataFusion crate is easier to understand for beginners (to make it easier to start) and thus the more we can rely on existing functionality (rather than something that is DataFusion specific) the better.

What changes are included in this PR?

Remove the DataFusion specific FromSlice trait in favor of the arrow From impl

I also added some more docs in arrow-rs to try and make them easier to use: apache/arrow-rs#4379

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

from_slice is no longer available (users should use the From impl in arrow-rs instead)

@alamb alamb marked this pull request as ready for review June 7, 2023 17:52
@tustvold
Copy link
Contributor

tustvold commented Jun 7, 2023

As an added bonus for non-nullable primitives this conversion will now be zero-copy

@alamb
Copy link
Contributor Author

alamb commented Jun 8, 2023

Since this is an API change I plan to leave it open for at least 24 hours in case anyone else wants to comment

@alamb alamb merged commit 3d36dfe into apache:main Jun 9, 2023
@alamb alamb deleted the alamb/remove_from_slice branch June 9, 2023 13:29
jayzhan211 pushed a commit to jayzhan211/arrow-datafusion that referenced this pull request Jun 12, 2023
@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jul 12, 2023

Hi @alamb,
I was using LargeBinaryArray::from_slice for performance reasons (to avoid copying). E.g.

let v = vec![vec![1u8,2,3]];
let lba = LargeBinaryArray::from_slice(&v);

Now the API is gone, I was wondering if you could advise on alternatives. I can copy the old code back into my code or I can contribute to implement From<Vec<Vec<u8>>> to arrow-rs.
I wanted to check if you may have any insight on how other users adapted this change.

Thank you in advance.

@alamb
Copy link
Contributor Author

alamb commented Jul 13, 2023

Hi @kazuyukitanimura

How about using https://docs.rs/arrow/latest/arrow/array/struct.GenericByteArray.html#method.from_iter_values

This works for me:

    let v = vec![vec![1u8,2,3]];
    let lba = LargeBinaryArray::from_iter_values(v);

@kazuyukitanimura
Copy link
Contributor

Thank you @alamb for the reply. I somehow remembered from_iter_values was slower, but now I tested it again. Both performs similarly. Not sure where I got the wrong impression. I will go with from_iter_values. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants