Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] Fingerprint ingest processor #69042

Merged

Conversation

danhermann
Copy link
Contributor

Adds a fingerprint processor that computes hashes of document content for content fingerprinting use cases.

E.g.:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "fingerprint": {
          "fields": ["user"]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "user": {
          "last_name": "Smith",
          "first_name": "John",
          "date_of_birth": "1980-01-15",
          "is_active": true
        }
      }
    }
  ]
}

Which produces:

"_source" : {
  "fingerprint" : "WbSUPW4zY1PBPehh2AA/sSxiRjw=",
  "user" : {
    "last_name" : "Smith",
    "first_name" : "John",
    "date_of_birth" : "1980-01-15",
    "is_active" : true
  }
}

Supports any number of document fields, nested document content, any hash from [MD5, SHA-1, SHA-256, SHA-512], and a per-processor salt.

Closes #53578 though it addresses only content fingerprinting and not anonymization use cases.

Backport of #68415

@danhermann danhermann added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP backport v7.12.0 labels Feb 16, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 16, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann danhermann merged commit a8669e7 into elastic:7.x Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team v7.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants