Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance multi-field lookup enrichment #44

Open
acchen97 opened this issue Mar 18, 2017 · 9 comments
Open

Enhance multi-field lookup enrichment #44

acchen97 opened this issue Mar 18, 2017 · 9 comments

Comments

@acchen97
Copy link

Translate supports JSON, CSV, and YAML file lookups. Each of these formats support some type of multi-field lookup; for JSON and YAML its heirarchical, and in CSV, a lookup on a key could reference multiple values in the row.

Currently, these lookups are possible, but will result in a complex object in the "destination" or self defined field. We should allow for these multi-field lookups to just add new top level fields for enriching the event.

@jordansissel
Copy link
Contributor

jordansissel commented Mar 18, 2017 via email

@shreyasrk
Copy link

I agree. Basically, what I need is this -

{ translate { dictionary_path => '/some/field/path/to/lookup/as/reference (JSON|YAML|CSV)' fields => ['event_field_1', event_field_2] destination => ['new_event_field_1_replaced', 'new_event_field_2_replaced'] } }

The objective is to use the same reference file to replace multiple fields with values.

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

@Chandanvatsa
Copy link

+1

@jordansissel
Copy link
Contributor

If I use multiple translate filters it will re-load the same file multiple times(?) Kindly confirm.

Yes. What is your concern with this?

@coregear
Copy link

+1 I need this.

@alesnav
Copy link

alesnav commented Apr 25, 2018

This would be a nice new feature for data enrichment!

For example, for username data enrichment using a CSV/JSON file, you would be able to add full name, department, office, etc, at the same time with just one call to translate filter.

@guyboertje
Copy link

It seems like the requested feature links multiple source fields to destination fields.
It would be tricky to validate 1 to 1 mapping of field array elements to destination array elements.
We could consider a new setting mapping (hash).

  translate {
    mapping {
      [f1] => [d1]
      [f2] => [d2]
    }
    ...
  }

This, however, would mean that the dictionary holds keys and values from multiple domains.
I would argue that separate translate filters per domain is a cleaner approach.

On the other hand I can see scenarios where an event has several field values in the same domain, e.g. src_ip/dest_ip or from_id/to_id.

@guyboertje
Copy link

As regards the original proposal of having multi-valued translations added to the root of an event, the problem lies with the fallback setting. It is a string.

The question is how to accommodate a multi-field lookup value with a string fallback.
Should there be a no match fallback substitution then there will be an ES mapping conflict.

My advice would be to use a CSV dictionary followed by a Dissect filter. The lookup value and fallback value should have the same structure then one can apply the Dissect filter regardless of match or no match.

@guyboertje
Copy link

I have created a PR #67 that adds support for iterate_on, a new setting that handles fields with an array of values (strings).

With this one can achieve multiple field translations. First build an field with array values , say, ips by using add_field => { "[ips][0]" => "%{src_ip}" "[ips][1]" => "%{dest_ip}" } then iterate_on ips, you will have a translated array. Then add_field again. add_field => { "[src_name]" => "%{[translated][0]}" "[dest_name]" => "%{[translated][1]}" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants