Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Add support for EQL based conditions #20994

Merged
merged 5 commits into from
Sep 9, 2020

Conversation

blakerouse
Copy link
Contributor

@blakerouse blakerouse commented Sep 4, 2020

What does this PR do?

This adds the ability for a condition to be defined any where in the inputs configuration to make that dictionary conditional on the resulting EQL evaluation. If the evaluation is false the dictionary is removed from the parent type and if its true the dictionary remains.

This implements EQL with the same variable syntax used in input variable substitution ${ .. }. The following is implemented for EQL.

  • Full PEMDAS math support for + - * / %.
  • Compares < <= >= > == !=
  • Booleans true false
  • and and or
  • Array functions arrayContains
  • Dict functions hasKey (not in EQL)
  • Length functions length
  • Math functions add, subtract, multiply, divide, modulo.
  • String functions concat, endsWith, indexOf, match, number, startsWith, string, stringContains.

Why is it important?

To support condition enablement on inputs or even any part of the input configuration. The same conditions can be applied to processors or streams or anything inside of the inputs configuration.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Add a condition to your inputs and see that they are not rendered when the condition fails.

Condition on input

inputs:
  - type: logfile
    streams:
      - paths:
         - /var/log/syslog
    condition: ${host.platform} == 'linux'

Condition on stream

inputs:
  - type: system/metrics
    streams:
      - metricset: load
        data_stream.dataset: system.cpu
        condition: ${host.platform} != 'windows'

Condition on processor

inputs:
  - type: system/metrics
    streams:
      - metricset: load
        data_stream.dataset: system.cpu
        condition: ${host.platform} != 'windows'
    processors:
      - add_fields:
          fields:
            platform: ${host.platform}
          to: host
        condition: ${host.platform} != 'windows'

Related issues

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 4, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 4, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20994 reopened]

  • Start Time: 2020-09-09T12:59:15.826+0000

  • Duration: 78 min 42 sec

Test stats 🧪

Test Results
Failed 0
Passed 20174
Skipped 1837
Total 22011

@blakerouse
Copy link
Contributor Author

This is a very large branch, but I didn't want to break master by landing it in chunks. By refactoring Boolexp to be based on EQL, meant that it caused a lot of changes all across the code base.

I might have a little too much fun as well adding as many features to EQL as possible. Like support arrays and dictionaries inside of the condition logic.

@ruflin
Copy link
Member

ruflin commented Sep 7, 2020

@blakerouse I wonder if we could start with a very small subset of EQL conditions and if this would simplify things? I think we could get started with just the compares, booleans and and/or?

@blakerouse
Copy link
Contributor Author

@ruflin Is the reason to start with a subset is because of the size of the PR? That will not actually make it much smaller, as the diff is misleading in size.

All the code in pkg/eql/parser is auto generated by the antlr4 (same as Boolexp) was. We also use the same EQL logic in the spec files so its needs a good amount of support to work correctly.

The diff size is also large do to the large about of unit tests I have created, to cover all parts of the code. I took the approach in this PR (because of the nature of add EQL), if its not unit tests then it doesn't work.

@ruflin
Copy link
Member

ruflin commented Sep 8, 2020

I'm not really concerned about the PR size but about documenting and having to support all these options. Our users will come up with very creative way on how to combine all these together which one one hand is exciting but in case of bugs can also become "interesting". The part I'm most concerned are the functions as I would rather add one by one on specific feature requests. I definitively can see a use case for startsWith on the docker image case.

Perhaps there is an other way here: Get it in as is but define that we only support the ones that are documented?

@blakerouse
Copy link
Contributor Author

blakerouse commented Sep 8, 2020

@ruflin That does make sense, I think we can remove some from this PR. But not too many as some are already used/required in the spec files and/or I think they will be critical for docker/kubernetes use case.

  • math (keep all) add/subtract/multiply/divide/modulo - These all match there operators (being operators were added in 0.8 of EQL, I keep the functions as well just to be like EQL). They share the same language operators code path, so we are not gaining any thing by removing these.

  • arrayContains (keep) - This is currently not used but I think it will be used by docker/kubernetes. The providers for those will push the labels/annotations as an array. So this will allow the package/user to have a conditional like so:

inputs:
  - type: logfile
    streams:
      - paths:
         - /var/lib/docker/containers/${docker.container.id}/*-json.log
    condition: arrayContains(${docker.container.labels}, 'co.elastic.logs/enabled')
  • hasKey (must keep) - Already in-use by the spec files to ensure that the output for that specification is correct and that program should be turned on. I also think this will be useful.

  • length (must keep) - Already in-use by the spec files to ensure that an input exists for that program. Used like so length(${inputs}) > 0.

  • concat (removable) - Don't really have a use-case in mind at the moment. I think it could be removed. It's also rather simple of just joining strings together.

  • endsWith (removable) - Currently don't have a use-case for it, but seems useful. If we keep startsWith we should keep endsWith.

  • indexOf (removable) - No use-case at the moment, idk really when you would need the index of the substring.

  • match (keep) - I think this will be useful with docker/kubernetes and very powerful in general to perform some regex matching on strings.

  • number (removable) - No use-case at the moment.

  • startsWith (removable) - Same as endsWith.

  • string (keep) - Convert any type into a string, I think this is useful and we should keep it.

  • stringContains (removable) - Really the same as startsWith or endsWith. I think it could be useful.

@ruflin Let me know which ones you think I should remove.

@ph ph requested a review from ruflin September 8, 2020 19:42
@ph ph added needs_backport PR is waiting to be backported to other branches. review labels Sep 8, 2020
@ruflin
Copy link
Member

ruflin commented Sep 9, 2020

@blakerouse The list you proposed above SGTM. I would add stringContains/startsWith especially for the docker image name:

condition: startsWith(${docker.image.name}, 'nginx:')

Like this the exact version of the image does not matter. This is often use in the beats conditions today AFAIK.

One function we probably need to add on our end is about comparing versions. But I think @michalpristas has already implemented this.

@@ -0,0 +1,97 @@
// eql.g4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse Would be nice to have a link to where this is copied from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin I didn't copy it, I wrote it. I added on to the Boolexp.g4 that was already present, I assume originally implemented by @ph

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the original implementation before we knew about eql.

I've done a bit more digging, endpoint did create a grammar file for antl4r, in https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/eql/src/main/antlr/EqlBase.g4 but this its more and less at the same time.

I don't think we should use that directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have this reference. I good in keeping them separate as there are small variations like the variable syntax but it is expected.

@@ -57,4 +57,4 @@ rules:
- output
- revision

when: HasAny('fleet') && HasItems(%{[inputs]}) && HasNamespace('output', 'elasticsearch')
when: length(${fleet}) > 0 and length(${inputs}) > 0 and hasKey(${output}, 'elasticsearch')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that our spec files now also use EQL conditions. Makes it much easier to just have 1 implementation.

@blakerouse blakerouse closed this Sep 9, 2020
@mukeshelastic
Copy link

Very exciting to see we leveraging EQL for condition definitions than creating something bespoke.. looking forward to test it when it's available.

@blakerouse blakerouse reopened this Sep 9, 2020
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM.

  • I really like that we can reuse the same implementation / logic for processors, inputs, streams and even the internal spec and get rid of the existing boolexp
  • We need to follow up with docs. Could you file an issue for it so we don't forget.
  • The reading of the conditions is pretty strict and has good error messages. So it should be pretty easy for users to figure out what they did wrong. In combination with the "simulate" commands for the providers, this will make testing of conditions really simple.

@@ -199,21 +207,24 @@ func groupByOutputs(single *transpiler.AST) (map[string]*transpiler.AST, error)
return nil, fmt.Errorf("unknown configuration output with name %s", targetName)
}

streams := config[inputsKey].([]map[string]interface{})
streams := config.config[inputsKey].([]map[string]interface{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.config? :-D

@costin
Copy link
Member

costin commented Sep 9, 2020

Folks, I've just learned about this effort through the Observability update email.

I don't want to hijack this PR, rather mention some relevant topics since EQL has been mentioned.

  1. EQL reference

Moving forward EQL on Elasticsearch is the EQL reference not EQL Python. In practice this means certain things that were/are possible in EQL Python are not (and may never be) supported in EQL in ES.
A prime example is arrayContains which does not make sense in ES.
The functionality of EQL on ES is documented here (if something is not in the docs, it does not exist):
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/eql-search.html

  1. string case insensitivity

EQL expects certain things to be case-insensitive which affects both operators (==) and (functions). To what degree we can implement that in Elasticsearch and whether it makes sense is still up for debate.
See elastic/elasticsearch#61883

  1. upcoming grammar changes

EQL currently supports 4 ways to declare strings, something under investigation:
elastic/elasticsearch#61659
Moving forward support for ' is going to be dropped in favor of ". It's still undecided what happens to raw string declarations.

I'd like to sync up and learn more about this effort. Tagging @philkra and @sethpayne

@blakerouse
Copy link
Contributor Author

Folks, I've just learned about this effort through the Observability update email.

I don't want to hijack this PR, rather mention some relevant topics since EQL has been mentioned.

  1. EQL reference

Moving forward EQL on Elasticsearch is the EQL reference not EQL Python. In practice this means certain things that were/are possible in EQL Python are not (and may never be) supported in EQL in ES.
A prime example is arrayContains which does not make sense in ES.
The functionality of EQL on ES is documented here (if something is not in the docs, it does not exist):
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/eql-search.html

Thank you for point this out. Following that documentation is rather confusing, is there an overview page that breaks down the entire language?

I think there will be cases where we need to add some functions that EQL doesn't support. Another one we added, because it was needed was hasKey.

I also think the goal is to be similar to EQL, but not exact. As we also are using variable substitution with ${ .. } to match the variable substitution in input strings.

  1. string case insensitivity

EQL expects certain things to be case-insensitive which affects both operators (==) and (functions). To what degree we can implement that in Elasticsearch and whether it makes sense is still up for debate.
See elastic/elasticsearch#61883

Interesting point about string sensitivity. At the moment this code is case-sensitive.

  1. upcoming grammar changes

EQL currently supports 4 ways to declare strings, something under investigation:
elastic/elasticsearch#61659
Moving forward support for ' is going to be dropped in favor of ". It's still undecided what happens to raw string declarations.

I'd like to sync up and learn more about this effort. Tagging @philkra and @sethpayne

At the moment this code only supports ' and ".

@blakerouse blakerouse merged commit af91b5e into elastic:master Sep 9, 2020
@blakerouse blakerouse deleted the agent-input-condition branch September 9, 2020 14:51
blakerouse added a commit to blakerouse/beats that referenced this pull request Sep 9, 2020
* Refactor Boolexp to Eql.

* Connect new Eql to specs and input emitter.

* Fix compare with null.

* Fix notice and go.mod.

(cherry picked from commit af91b5e)
@blakerouse blakerouse added v7.10.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 9, 2020
blakerouse added a commit that referenced this pull request Sep 9, 2020
…conditions (#21039)

* [Elastic Agent] Add support for EQL based conditions (#20994)

* Refactor Boolexp to Eql.

* Connect new Eql to specs and input emitter.

* Fix compare with null.

* Fix notice and go.mod.

(cherry picked from commit af91b5e)

* Fix go.mod from cherry-pick resolve.

* Add changelog.
@rw-access
Copy link

We also have the EQL in the Endpoint implementation which is ongoing development and will have the full feature set of EQL, unlike Elasticsearch EQL.

RE: hasKey -- I don't understand the new need for this function. Currently we just do some.field != null, and I think we should aim to be consistent.

I think a little more collaboration could help and make sure we're all doing the same thing. There's also a test suite for EQL, and I don't think that's being used here to test against the EQL specification.

I think we all have unique scenarios we are addressing: Elasticsearch is focused on historical search, Endpoint on realtime detection/prevention, and Beats is using expressions to describe filters. For that reason I think it might be a mistake to have one of those teams fully own EQL, but instead collaborate on what is in the specification, and what is an implementation detail.

@ph
Copy link
Contributor

ph commented Sep 10, 2020

@rw-access Totally agree on that.

Concerning hasKey vs some.field I think we should would have to think about it a bit more, I doubt it will make it for 7.10.

Concerning collaboration, would you be open having a "specification" repository where a representative of Endpoint, Elasticsearch, and Ingest management could collaborate, this would be similar to the dissect effort?

@rw-access
Copy link

@ph
Copy link
Contributor

ph commented Sep 10, 2020

@rw-access can you take the lead on create theses repository, we could revisit that on our end for 7.11.

v1v added a commit to v1v/beats that referenced this pull request Sep 14, 2020
* upstream/master: (362 commits)
  Add vendoring to Google Cloud Functions again (elastic#21070)
  [Elastic Agent] Add fleet.host.id for sending to endpoint. (elastic#21042)
  Do not need Google credentials before using it (elastic#21072)
  [Filebeat][New Module] Zoom webhook module (elastic#20414)
  Add support for GMT timezone offset in decode_cef (elastic#20993)
  Filebeat: Fix random error on harvester close (elastic#21048)
  Add ingress controller dashboards (elastic#21052)
  Fix loggers in composable module. (elastic#21047)
  [Ingest Manager] Increase kibana client timeout to 5 minutes (elastic#21037)
  Add changelog. (elastic#21041)
  [Elastic Agent] Add support for EQL based conditions (elastic#20994)
  Disable Kafka metricsets based on Jolokia (elastic#20989)
  Update apm agent (elastic#21031)
  Add container ECS fields in kubernetes metadata (elastic#20984)
  Sanitize event.host in Metricbeat (elastic#21022)
  Update api-keys.asciidoc - API key prerequisites (elastic#21026)
  [Filebeat][suricata] Map x509 for suricata/eve fileset (elastic#20973)
  [Filebeat][santa] Map x509 fields in santa module (elastic#20976)
  [Filebeat][fortinet] Map x509 ecs fields for fortinet fw fileset (elastic#20983)
  Bump zeek kerberos/ssl/x509 ecs version (elastic#21003)
  ...
v1v added a commit to v1v/beats that referenced this pull request Sep 14, 2020
* upstream/master: (364 commits)
  Add vendoring to Google Cloud Functions again (elastic#21070)
  [Elastic Agent] Add fleet.host.id for sending to endpoint. (elastic#21042)
  Do not need Google credentials before using it (elastic#21072)
  [Filebeat][New Module] Zoom webhook module (elastic#20414)
  Add support for GMT timezone offset in decode_cef (elastic#20993)
  Filebeat: Fix random error on harvester close (elastic#21048)
  Add ingress controller dashboards (elastic#21052)
  Fix loggers in composable module. (elastic#21047)
  [Ingest Manager] Increase kibana client timeout to 5 minutes (elastic#21037)
  Add changelog. (elastic#21041)
  [Elastic Agent] Add support for EQL based conditions (elastic#20994)
  Disable Kafka metricsets based on Jolokia (elastic#20989)
  Update apm agent (elastic#21031)
  Add container ECS fields in kubernetes metadata (elastic#20984)
  Sanitize event.host in Metricbeat (elastic#21022)
  Update api-keys.asciidoc - API key prerequisites (elastic#21026)
  [Filebeat][suricata] Map x509 for suricata/eve fileset (elastic#20973)
  [Filebeat][santa] Map x509 fields in santa module (elastic#20976)
  [Filebeat][fortinet] Map x509 ecs fields for fortinet fw fileset (elastic#20983)
  Bump zeek kerberos/ssl/x509 ecs version (elastic#21003)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Elastic Agent] Support EQL condition on an input
7 participants