Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add a compute function to hash inputs #31876

Closed
asfimport opened this issue May 9, 2022 · 3 comments
Closed

[C++] Add a compute function to hash inputs #31876

asfimport opened this issue May 9, 2022 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented May 9, 2022

We have a lot of internal logic for hashing inputs and it might be nice to expose some of this to users (e.g. https://stackoverflow.com/questions/72177022/how-to-get-hash-of-string-column-in-polars-or-pyarrow)

The HashBatch method in key_hash.h (not quite merged but close) is likely to be the most performant. However, it does make some sacrifices on uniqueness of hashes in the spirit of performance (so we should make sure to document these).

Reporter: Weston Pace / @westonpace

Related issues:

Note: This issue was originally created as ARROW-16513. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
CC @michalursa

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Isn't this a duplicate of ARROW-8991 ?

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
Yes it is. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant