-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RiskScoreEndgineData client + install ds and other resources for risk scoring #158422
Conversation
x-pack/plugins/security_solution/server/lib/risk_engine/configurations.ts
Show resolved
Hide resolved
|
||
const delay = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); | ||
|
||
export const retryTransientEsErrors = async <T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered using p-retry
utility, already available in Kibana repo?
It has rich configuration options within it, for example: https://github.com/elastic/kibana/blob/8.8/src/plugins/content_management/server/event_stream/es/init/es_event_stream_initializer.ts#L48-L60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retry_es.ts is copy past from alerting plugin, with the intention just use the creation of DS in the future from this plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this file to be more accurate/similar to its origin: retry_transient_es_errors.ts
@elasticmachine merge upstream |
@elasticmachine merge upstream |
…-ref HEAD~1..HEAD --fix'
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
x-pack/plugins/security_solution/server/lib/risk_engine/configurations.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/security_solution/server/lib/risk_engine/configurations.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/security_solution/server/lib/risk_engine/configurations.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/security_solution/server/lib/risk_engine/risk_engine_data_client.ts
Outdated
Show resolved
Hide resolved
namespace, | ||
}; | ||
|
||
await createOrUpdateIlmPolicy({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can perform some of these tasks in parallel via Promise.all
, we should.
* 2.0. | ||
*/ | ||
|
||
// This file is a copy of x-pack/plugins/alerting/server/alerts_service/lib/create_concrete_write_index.ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would prefer this file be named less ambiguously: utils/create_datastream.ts
await updateIndexMappings({ logger, esClient, totalFieldsLimit, concreteIndices }); | ||
} | ||
|
||
// check if a concrete write ds already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// check if a concrete write ds already exists | |
// check if a datastream write index already exists |
|
||
const delay = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); | ||
|
||
export const retryTransientEsErrors = async <T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this file to be more accurate/similar to its origin: retry_transient_es_errors.ts
const es = getService('es'); | ||
|
||
describe('install risk engine resources', () => { | ||
it('should install resources on startup', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a test where these resources already exist on startup!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if we already have such tests? I am not sure that we can do that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of a test that loads a legacy signals index.
We don't have the ability to load index templates or ILM via the es_archiver
service, but you could certainly do that manually with the es
service. There are lots of integration tests that use esClient.indices.putIndexTemplate
etc.
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I had a few nits about explicit typings and more tests, but nothing that should block this from getting merged. Nice work!
|
||
private async initialiseWriter(namespace: string) { | ||
const writer: Writer = { | ||
bulk: async () => {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, just a suggestion. Being explicit is usually the safest approach, but you're correct that this will be implemented momentarily.
const es = getService('es'); | ||
|
||
describe('install risk engine resources', () => { | ||
it('should install resources on startup', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of a test that loads a legacy signals index.
We don't have the ability to load index templates or ILM via the es_archiver
service, but you could certainly do that manually with the es
service. There are lots of integration tests that use esClient.indices.putIndexTemplate
etc.
} | ||
} | ||
|
||
const isDataStreamsExist = dataStreams.length > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super duper nit: omitting the is
here makes this more legible; we already have a verb in Exist
so the is
is redundant.
const isDataStreamsExist = dataStreams.length > 0; | |
const dataStreamsExist = dataStreams.length > 0; |
|
||
const delay = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms)); | ||
|
||
export const retryTransientEsErrors = async <T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to copy this code, we might as well also copy the tests.
return writer; | ||
} | ||
|
||
public async initializeResources({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to explicitly type the return value on this public method.
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Public APIs missing exports
Page load bundle
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
## Summary * Introduces a new API, POST `/api/risk_scores/calculate`, that triggers the code introduced here * As with the [preview route](elastic#155966), this endpoint is behind the `riskScoringRoutesEnabled` feature flag * We intend to __REMOVE__ this endpoint before 8.10 release; it's mainly a convenience/checkpoint for testing the existing code. The next PR will introduce a scheduled Task Manager task that invokes this code periodically. * Updates to the /preview route: * `data_view_id` is now a required parameter on both endpoints. If a dataview is not found by that ID, the id is used as the general index pattern to the query. * Response has been updated to be more similar to the [ECS risk fields](elastic/ecs#2236) powering this data. * Mappings created by the [Data Client](elastic#158422) have been updated to be aligned to the ECS risk fields (linked above) * Adds/updates the [OpenAPI spec](https://github.com/elastic/kibana/blob/main/x-pack/plugins/security_solution/server/lib/risk_engine/schema/risk_score_apis.yml) for these endpoints; useful starting point if you're trying to get oriented here. ## Things to review * [PR Demo environment](https://rylnd-pr-161503-risk-score-task-api.kbndev.co/app/home) * Preview API and related UI still works as expected * Calculation/Persistence API correctly bootstraps/persists data * correct mappings/ILM are created * things work in non-default spaces ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### Risk Matrix Delete this section if it is not applicable to this PR. Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release. When forming the risk matrix, consider some of the following examples and how they may potentially impact the change: | Risk | Probability | Severity | Mitigation/Notes | |---------------------------|-------------|----------|-------------------------| | Multiple Spaces—unexpected behavior in non-default Kibana Space. | Low | High | Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces. | | Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. | High | Low | Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure. | | Code should gracefully handle cases when feature X or plugin Y are disabled. | Medium | High | Unit tests will verify that any feature flag or plugin combination still results in our service operational. | | [See more potential risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Risc score resources bootstrap
ES PR: elastic/elasticsearch#96348
This PR introduces RiskEngineDataClient, which purpose to install resources per namespace, including ilm policy, component template, index template and datastream for risk score.
Some view demo/overview of the steps we do to initialise RiskEngineDataClient and resources
Screen.Recording.2023-05-26.at.15.31.36.mp4
For default space, it installs indexes when the security_soluition plugin is set up.
For other spaces, it initialises the resource when you call
getWriter
.This data client was passed to
RequestContextFactory
So in any request, it can be called like
What is generated
GET _ilm/policy/.risk-score-ilm-policy
GET _component_template/risk-score-mappings
GET _index_template/.risk-score.risk-score-default-index-template
GET risk-score.risk-score-default
- where isdefault
is space namereturn