Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat: ingest Elasticsearch structured audit logs #8852

Merged
merged 21 commits into from
Jan 26, 2019
Merged

Filebeat: ingest Elasticsearch structured audit logs #8852

merged 21 commits into from
Jan 26, 2019

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Oct 31, 2018

Resolves #8831.

This PR teaches the elasticsearch/audit fileset to ingest structured audit logs in addition to the semi-structured audit logs, which it already knows how to ingest.

@ycombinator ycombinator added Filebeat Filebeat v7.0.0-alpha1 v6.6.0 needs_backport PR is waiting to be backported to other branches. in progress Pull request is currently in progress. review and removed in progress Pull request is currently in progress. labels Oct 31, 2018
@ycombinator ycombinator requested a review from ruflin November 1, 2018 02:05
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking of a different implementation in which the decoding of the json happens on the Filebeat side. The advantage is that then user can filter on ingest time based on these fields. The disadvantage is that we need to find out on the fileset side if it's JSON or not. Would it be possible instead of having all these "if" statements two have 2 different pipelines instead to have a cleaner code? I think there are multiple options to get to the same result and not sure yet which one is the best implementation.

This change will also need an addition to the docs and changelog.

},
{
"grok": {
"if": "ctx.first_char != '{'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will required Elasticsearch 6.5 or newer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

filebeat/module/elasticsearch/audit/test/test.log Outdated Show resolved Hide resolved
@ycombinator
Copy link
Contributor Author

ycombinator commented Nov 1, 2018

Would it be possible instead of having all these "if" statements two have 2 different pipelines instead to have a cleaner code?

I would like that as well, but I'm not sure how to achieve it :) Ideally we could use https://www.elastic.co/guide/en/elasticsearch/reference/master/pipeline-processor.html but I'm not sure how to push multiple ingest pipelines into ES for the same fileset. AFAICT that's not currently supported in Filebeat but maybe we should add support for that?

@ruflin
Copy link
Contributor

ruflin commented Nov 2, 2018

You are correct that at the moment it's not supported but we should add it as this will happen in other places too. We probably still need a "root" pipeline that we send all data to and which routes then the events. Or would you do the separation on the Ingest side already?

@ycombinator
Copy link
Contributor Author

You are correct that at the moment it's not supported but we should add it as this will happen in other places too.

Okay, great. I'm going to suspend this PR and create a new one just to introduce this multi-pipeline functionality. This PR will then depend on the new PR.

We probably still need a "root" pipeline that we send all data to and which routes then the events. Or would you do the separation on the Ingest side already?

I was thinking Beats' job would be just to create the necessary pipelines. The separation would then happen in the root pipeline.

@ycombinator
Copy link
Contributor Author

As noted in my previous comment, I've started work on teaching Filebeat to support multiple ingest pipelines here: #8914.

@ycombinator ycombinator added in progress Pull request is currently in progress. and removed review labels Nov 2, 2018
@ycombinator ycombinator added v6.7.0 and removed v6.6.0 labels Dec 21, 2018
ycombinator added a commit that referenced this pull request Dec 27, 2018
Motivated by #8852 (comment).

Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to:
- run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and
- conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html).

These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline.

This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline.

#### Example with multiple pipelines
```yaml
ingest_pipeline:
  - pipeline-ze-boss.json 
  - pipeline-plain.json
  - pipeline-json.json
```
#### Example with a single pipeline
_This is just to show that the existing functionality will continue to work as-is._
```yaml
ingest_pipeline: pipeline.json
```

Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows:

```json
{
  "pipeline" : {
    "name": "{< IngestPipeline "pipeline-plain" >}"
  }
}
```

This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.
ycombinator added a commit that referenced this pull request Dec 28, 2018
#9811)

Cherry-pick of PR #8914 to 6.x branch. Original message: 

Motivated by #8852 (comment).

Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to:
- run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and
- conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html).

These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline.

This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline.

#### Example with multiple pipelines
```yaml
ingest_pipeline:
  - pipeline-ze-boss.json 
  - pipeline-plain.json
  - pipeline-json.json
```
#### Example with a single pipeline
_This is just to show that the existing functionality will continue to work as-is._
```yaml
ingest_pipeline: pipeline.json
```

Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows:

```json
{
  "pipeline" : {
    "name": "{< IngestPipeline "pipeline-plain" >}"
  }
}
```

This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.
@ycombinator ycombinator requested a review from a team as a code owner January 2, 2019 18:49
@ycombinator
Copy link
Contributor Author

ycombinator commented Jan 2, 2019

Will rebase on master once #9851 #9855 has been merged.

@ycombinator
Copy link
Contributor Author

@ruflin After #8914 was merged recently, I resurrected this PR to use the functionality introduced in #8914. Would love your review when you get a chance. Thanks!

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a needs_backport label here I think we should discuss what our compatibility promise here is.

If someone runs gets logs from ES 6.3 with FB 6.7 and sends data to 6.3, I assume the pipeline would stop working? Or in other words, a user upgrading FB from 6.3 to 6.7, the ingestions would stop.

}
},
{
"dot_expander": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only need to make the event look nicer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually (and unfortunately IMO), it is required for the next processor (rename) to work. If I remove this dot_expander processor entry, I will get an error like so from ES when it tries to execute the rename processor:

field [elasticsearch.audit.event.type] doesn't exist

Copy link
Contributor

@ruflin ruflin Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should file an enhancement request around this with ES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"elasticsearch.audit.action": "cluster:admin/xpack/security/realm/cache/clear",
"elasticsearch.audit.event_type": "access_granted",
"elasticsearch.audit.layer": "transport",
"elasticsearch.audit.origin_address": "127.0.0.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like quite a few fields here we should map to ECS (follow up PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've never done an ECS conversion before. Would you mind pointing me to a PR that did a similar conversion and I could use as a reference? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have quite a list of PR's: #8655

@ycombinator
Copy link
Contributor Author

Seeing a needs_backport label here I think we should discuss what our compatibility promise here is.

If someone runs gets logs from ES 6.3 with FB 6.7 and sends data to 6.3, I assume the pipeline would stop working? Or in other words, a user upgrading FB from 6.3 to 6.7, the ingestions would stop.

Yes, this is true (and not ideal, of course).

The whole reason behind wanting to get this change into 6.x was so that the ES team could deprecate the plaintext audit log in 6.7 and then remove it in 7.0.

If you recall, this PR is built on top of #8914, which introduces the ability for Filebeat modules to have multiple ingest pipelines with an entrypoint pipeline. In that PR you had brought up the version compatibility issue as well: #8914 (comment). @urso brought this up with me off-PR as well, so we decided that I would make a follow up PR to #8914 and add a version check. If the user is running Filebeat against an ES < 6.5.0 and using a Filebeat module with multiple pipelines, we will throw an error and stop.

Now, obviously this means that this is a breaking change in a minor version. However, the only module to use this feature would be the Elasticsearch module and it is currently marked as beta. Given the benefit I mentioned earlier about letting the ES team deprecate the plaintext audit log in 6.7.0 and remove it in 7.0.0, I would like to suggest that we allow this breaking change with the Elasticsearch Filebeat module, provided the follow up PR with the version check is done first.

Thoughts?

@ycombinator ycombinator requested review from a team as code owners January 8, 2019 13:00
@ruflin
Copy link
Contributor

ruflin commented Jan 9, 2019

Ok, let's move forward with this. Also that it is still in beta helps here. Let's make sure we state this very cleary in the CHANGELOG and also our breaking changes list for each release.

@ruflin
Copy link
Contributor

ruflin commented Jan 9, 2019

@ycombinator An alternative implementation for 6.x would be what we have in LS. But it would mean users have to configure it manually. #9959 We should probably move LS in master to the same behaviour with 1 pipeline with sub pipelines?

@ycombinator
Copy link
Contributor Author

Now waiting for #10317 to be merged. Then I will rebase this PR on 6.x and get it back into good shape.

@ycombinator
Copy link
Contributor Author

This PR is rebased on 6.x now and in better shape. However, while working on it, I discovered an issue which ought to be fixed first. I'm doing that in #10340. Once #10340 is merged, I'll rebase this PR on 6.x again to get it into its final shape.

@ycombinator
Copy link
Contributor Author

jenkins, test this

1 similar comment
@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator ycombinator merged commit 3334adf into elastic:6.x Jan 26, 2019
@ycombinator ycombinator deleted the filebeat-elasticsearch-structured-audit-log branch January 26, 2019 14:50
ycombinator added a commit that referenced this pull request Jan 29, 2019
This is a "forward port" of #8852.

In #8852, we taught Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conformed to the 6.x mapping structure.

In this PR we also teach Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conform to the 7.0 (ECS-based) mapping structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants