-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat: ingest Elasticsearch structured audit logs #8852
Filebeat: ingest Elasticsearch structured audit logs #8852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially thinking of a different implementation in which the decoding of the json happens on the Filebeat side. The advantage is that then user can filter on ingest time based on these fields. The disadvantage is that we need to find out on the fileset side if it's JSON or not. Would it be possible instead of having all these "if" statements two have 2 different pipelines instead to have a cleaner code? I think there are multiple options to get to the same result and not sure yet which one is the best implementation.
This change will also need an addition to the docs and changelog.
}, | ||
{ | ||
"grok": { | ||
"if": "ctx.first_char != '{'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will required Elasticsearch 6.5 or newer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
filebeat/module/elasticsearch/audit/test/test.log-expected.json
Outdated
Show resolved
Hide resolved
I would like that as well, but I'm not sure how to achieve it :) Ideally we could use https://www.elastic.co/guide/en/elasticsearch/reference/master/pipeline-processor.html but I'm not sure how to push multiple ingest pipelines into ES for the same fileset. AFAICT that's not currently supported in Filebeat but maybe we should add support for that? |
You are correct that at the moment it's not supported but we should add it as this will happen in other places too. We probably still need a "root" pipeline that we send all data to and which routes then the events. Or would you do the separation on the Ingest side already? |
Okay, great. I'm going to suspend this PR and create a new one just to introduce this multi-pipeline functionality. This PR will then depend on the new PR.
I was thinking Beats' job would be just to create the necessary pipelines. The separation would then happen in the root pipeline. |
As noted in my previous comment, I've started work on teaching Filebeat to support multiple ingest pipelines here: #8914. |
Motivated by #8852 (comment). Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to: - run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and - conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html). These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline. This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline. #### Example with multiple pipelines ```yaml ingest_pipeline: - pipeline-ze-boss.json - pipeline-plain.json - pipeline-json.json ``` #### Example with a single pipeline _This is just to show that the existing functionality will continue to work as-is._ ```yaml ingest_pipeline: pipeline.json ``` Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows: ```json { "pipeline" : { "name": "{< IngestPipeline "pipeline-plain" >}" } } ``` This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.
#9811) Cherry-pick of PR #8914 to 6.x branch. Original message: Motivated by #8852 (comment). Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to: - run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and - conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html). These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline. This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline. #### Example with multiple pipelines ```yaml ingest_pipeline: - pipeline-ze-boss.json - pipeline-plain.json - pipeline-json.json ``` #### Example with a single pipeline _This is just to show that the existing functionality will continue to work as-is._ ```yaml ingest_pipeline: pipeline.json ``` Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows: ```json { "pipeline" : { "name": "{< IngestPipeline "pipeline-plain" >}" } } ``` This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing a needs_backport
label here I think we should discuss what our compatibility promise here is.
If someone runs gets logs from ES 6.3 with FB 6.7 and sends data to 6.3, I assume the pipeline would stop working? Or in other words, a user upgrading FB from 6.3 to 6.7, the ingestions would stop.
} | ||
}, | ||
{ | ||
"dot_expander": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only need to make the event look nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually (and unfortunately IMO), it is required for the next processor (rename
) to work. If I remove this dot_expander
processor entry, I will get an error like so from ES when it tries to execute the rename
processor:
field [elasticsearch.audit.event.type] doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should file an enhancement request around this with ES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"elasticsearch.audit.action": "cluster:admin/xpack/security/realm/cache/clear", | ||
"elasticsearch.audit.event_type": "access_granted", | ||
"elasticsearch.audit.layer": "transport", | ||
"elasticsearch.audit.origin_address": "127.0.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like quite a few fields here we should map to ECS (follow up PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've never done an ECS conversion before. Would you mind pointing me to a PR that did a similar conversion and I could use as a reference? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you have quite a list of PR's: #8655
Yes, this is true (and not ideal, of course). The whole reason behind wanting to get this change into 6.x was so that the ES team could deprecate the plaintext audit log in 6.7 and then remove it in 7.0. If you recall, this PR is built on top of #8914, which introduces the ability for Filebeat modules to have multiple ingest pipelines with an entrypoint pipeline. In that PR you had brought up the version compatibility issue as well: #8914 (comment). @urso brought this up with me off-PR as well, so we decided that I would make a follow up PR to #8914 and add a version check. If the user is running Filebeat against an ES < 6.5.0 and using a Filebeat module with multiple pipelines, we will throw an error and stop. Now, obviously this means that this is a breaking change in a minor version. However, the only module to use this feature would be the Elasticsearch module and it is currently marked as Thoughts? |
Ok, let's move forward with this. Also that it is still in beta helps here. Let's make sure we state this very cleary in the CHANGELOG and also our breaking changes list for each release. |
@ycombinator An alternative implementation for 6.x would be what we have in LS. But it would mean users have to configure it manually. #9959 We should probably move LS in master to the same behaviour with 1 pipeline with sub pipelines? |
Now waiting for #10317 to be merged. Then I will rebase this PR on 6.x and get it back into good shape. |
jenkins, test this |
1 similar comment
jenkins, test this |
This is a "forward port" of #8852. In #8852, we taught Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conformed to the 6.x mapping structure. In this PR we also teach Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conform to the 7.0 (ECS-based) mapping structure.
Resolves #8831.
This PR teaches the
elasticsearch/audit
fileset to ingest structured audit logs in addition to the semi-structured audit logs, which it already knows how to ingest.