-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to require an ingest pipeline #46847
Conversation
This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.
Pinging @elastic/es-core-features |
@jasontedor could you add a near clone of https://github.com/elastic/elasticsearch/blob/master/modules/ingest-common/src/test/resources/rest-api-spec/test/ingest/200_default_pipeline.yml to help test required pipelines from the various ways to issue index request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left a few comments.
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Outdated
Show resolved
Hide resolved
@elasticmachine run elasticsearch-ci/2 |
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - did some manual testing with multi-node forwarding and all works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.
Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from elastic#46847
Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847
Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847
Being able to force data to run through a pipeline before indexing is really useful. However, I worry about making the required pipeline the only one that executes when set. As a concrete example, we've been talking about using the ingest timestamp instead of the event timestamp for running detection rules (searches/queries, but also ML jobs) in SIEM. This would make sure that we always consider all newly arrived data and are never fooled, e.g. by an attacker manipulating system time to send data pretending to be from the distant past or future. For that, a required ingest pipeline with a The |
I am not sure I follow - is this a trade off between regular field parsing vs. having an ingest timestamp and using sort of alternative parsing scheme? For signal purposes, if we are going to use ingest timestamps, we would need to follow the general convention of giving each event both timestamps - ingest time and event time (the latter being the timestamp in the original event being ingested.) We would also need the events to be parsed into ECS fields as they are now in order for signals to evaluate them. Independent of the ingest time idea, we always need the original event timestamps in order to create forensic timelines as analysts work on cases. If we had to make a Sophie’s choice, the original timestamp is more critical, even if it means running long and expensive searches in order to ensure solving for correctness. |
@randomuserid In my example, the ingest timestamp would not replace the event timestamp - the event timestamp would still be in |
OK what are are applications for the ingest pipeline - maybe the ability to make a signal on an event tagged as important or special by an endpoint agent without consideration of time? As an alternative method of ensuring important events with weird timestamps are made into signals? |
@randomuserid It could be that, yeah. More straightforward maybe is adding an ingestion timestamp, or dropping a field (e.g. for privacy reasons), and probably many other things. Today, this kind of central control is often exercised in centrally managed Logstash pipelines, but |
The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (#46847) Resolves: #49106
The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (elastic#46847) Resolves: elastic#49106
The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (#46847) Resolves: #49106
It's about *final* pipelines (not *required* pipelines) -- see elastic#46847 and elastic#49470 for the history here, 'required' pipelines were renamed to 'final' pipelines.
This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.