Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Thread Safe, Fast Hashing Method #30790

Closed
pickypg opened this issue May 22, 2018 · 5 comments
Closed

Provide Thread Safe, Fast Hashing Method #30790

pickypg opened this issue May 22, 2018 · 5 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement

Comments

@pickypg
Copy link
Member

pickypg commented May 22, 2018

With things like GDPR on the horizon, the desire to hash certain data fields at the Elasticsearch level seems like a very attractive feature to natively support. It would be ideal if whatever did was available to Ingest and hopefully Painless.

Just exposing an instance of MessageDigest is likely to be very inefficient since it is not thread safe. Most systems expect SHA-256 these days rather than MD5 or SHA-1.

Workaround

It's possible to whitelist the MessageDigest class in Painless via a whitelist plugin. Alternatively, you could write a separate plugin and whitelist that code, or implement a custom Ingest processor plugin.

@pickypg pickypg added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels May 22, 2018
@pickypg pickypg self-assigned this May 22, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@rjernst
Copy link
Member

rjernst commented May 22, 2018

I don't think this should be available by default in eg search scripts (filter or scoring). @talevy has mentioned before the desire to add additional whitelisted methods/classes to ingest. This should go in there.

@pickypg
Copy link
Member Author

pickypg commented May 22, 2018

I don't think this should be available by default in eg search scripts (filter or scoring).

I completely agree.

@jasontedor
Copy link
Member

We have org.elasticsearch.common.hash.MessageDigests which provides thread local MessageDigest instances. I benchmarked this a long time ago (for uses in the security codebase) and doing it in this way was the winning play.

@pickypg pickypg removed their assignment Oct 31, 2018
@martijnvg
Copy link
Member

Closing in favour of #34085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement
Projects
None yet
Development

No branches or pull requests

5 participants