[META] Changes to host and user risk score features #2477

nastasha-solomon · 2022-09-20T18:34:04Z

Description

In 8.5, improvements were made to the onboarding workflow for the host and user risk score features. These changes streamline previously complicated steps that required some technical knowledge and sometimes took users outside of the Security app. Now, users can enable both features and generate risk scores with a single click.

In addition, users can upgrade host and user risk score features in a single click. The Upgrade button will be display anywhere host and user risk scores are available. This includes the:

Host Risk Scores and User Risk Scores sections on the Entity analytics dashboard
Host risk tab on the Hosts page
User risk tab on the Users page
Host risk tab on a host's details page
User risk tab on a user's details page

The following sections outline changes to the host and user risk score features in addition to caveats that may need to be doc'd.

Host risk score

Changes introduced in 8.5:

Host risk score card removed - The host risk score card will no longer display on the Overview dashboard. This will require multiple text and screenshot updates within the Host risk score topic.
Simpler way to enable host risk score - In 8.5, users only need to click the Enable Host Risk Score button in the "Host Risk Scores" section of the Entity Analytics dashboard to turn on the feature. When they click this button, host risk scores are generated as well if alert data is present in the environment. To account for this change, we'll need to revise the docs for deploying host risk score and viewing host risk score data.
New way to upgrade the host risk score feature: If the user had enabled the host risk score feature in 8.4 or earlier, they will see the an Upgrade button within the "Host Risk Scores" section instead of the Enable Host Risk Score button. When they click Upgrade, their “old” host risk scores are deleted and new ones are created. The “old” data is not preserved as that data type is no longer supported. This will likely need to be doc'd as a breaking change in the 8.5 release notes.
- IMPORTANT: If the user wants to keep their “old” host risk scores, they need to reindex the data before upgrading the host risk score feature. They can use the [Elasticsearch Reindex API] (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html) to reindex their "old" data. Their data will then be stored in the ml_{host|user}_risk_score_{spaceId} and ml_${host|user}_risk_score_latest_{spaceId} indices. Here is an example reindexing request:

POST _reindex
{
  "source": {
    "index": "ml_{host|user}_risk_score_{spaceId}"
  },
  "dest": {
    "index": "my_new_ml_{host|user}_risk_score_{spaceId}"
  }
}

User risk score

Changes introduced in 8.5:

Simpler way to enable user risk score - Similar to the host risk score, in 8.5, users only need to click the Enable User Risk Score button in the "Host Risk Scores" section of the Entity Analytics dashboard to turn on the feature. We might need to refresh the docs for deploying the user risk score package to account for these changes.
New way to upgrade the user risk score feature: If the user had enabled the user risk score feature in 8.4 or earlier, they will see the an Upgrade button within the "User Risk Scores" section instead of the Enable User Risk Score button. When they click Upgrade, their “old” user risk scores are deleted and new ones are created. The “old” data is not preserved as that data type is no longer supported. This will likely need to be doc'd as a breaking change in the 8.5 release notes.
- IMPORTANT: Users will need to reindex "old" user risk scores to keep them. Steps are same as what mentioned for host risk score.

Related issues/PRs

Onboarding button - Allows users to enable the host and user risk score features with a single click: [SecuritySolution] Onboard hosts and users risk score module kibana#140377
NOTE: The PR description contains useful videos that demo what happens when users enable and upgrade the features.
Upgrading host/user risk score features: https://github.com/elastic/security-team/issues/4899

Additional notes/questions:

Should we be directing users to the Entity Analytics dashboard in order to enable/upgrade the host and user risk score features? Are there any other places users can do this?
Do users still need to enable feature flags for the user and host risk score features to turn them on?
Outside of reindexing "old" risk scores, do users need to perform extra steps before upgrading the host or user risk score features?
In what situations would users encounter a transform error (i.e., the transform already exists) when they install the host or user risk score feature?
Will the host and user risk score docs be linked to in any newly developed error messages? If yes, which ones and which doc sections will be linked in the code?

The deployment has three spaces for a quick view and we can add new spaces. Please note that we've been asked to not add data, enable or upgrade any of the spaces. (Please create a new space if you'd like to try it out)

Default - No data
Installation panel
Legacy data - upgrade panel

The text was updated successfully, but these errors were encountered:

peluja1012 · 2022-09-20T18:50:21Z

Hi @nastasha-solomon, we are also adding a feature called "alert enrichments" where we store and display the entity (host/user) risk score. Here is the PR. I was wondering if that should be tracked as part of this github issue or if a separate one should be created.

nastasha-solomon · 2022-09-20T19:48:00Z

@peluja1012 thanks for pinging us about this! I think a separate doc issue is better for organization purposes. This doc issue is mainly for describing what changed in the host and user risk score onboarding/upgrading workflows. Correct me if I'm wrong, but it seem as though the docs for the new alert enrichments feature should describe the type of host/user risk data that's displayed on the Alert details flyout after the host/user risk score features are enabled.

peluja1012 · 2022-09-20T20:15:28Z

Correct me if I'm wrong, but it seem as though the docs for the elastic/kibana#139478 should describe the type of host/user risk data that's displayed on the Alert details flyout after the host/user risk score features are enabled.

Hey @nastasha-solomon, yes that's correct.

angorayc · 2022-09-22T10:14:08Z

Should we be directing users to the Entity Analytics dashboard in order to enable/upgrade the host and user risk score features? Are there any other places users can do this?

Users can enable both risk scores from Entity Analytics page

Users can enable host risk score from hosts / host details

Users can enable host risk score via dev tools by visiting the url (The content will be appended to the bottom of the end of current content, please make sure you scroll to the bottom to find the content):
{{kibanaUrl}}/s/{spaceId}/app/dev_tools#/console?load_from={{kibanaUrl}}/s/{spaceId}/internal/risk_score/prebuilt_content/dev_tool/enable_host_risk_score

e.g.

http://localhost:5601/s/default/app/dev_tools#/console?load_from=http://localhost:5601/s/default/internal/risk_score/prebuilt_content/dev_tool/enable_host_risk_score

Users can enable user risk score users / users details

Users can enable user risk score via dev tools by visiting the url (The content will be appended to the bottom of the end of current content, please make sure you scroll to the bottom to find the content):
{{kibanaUrl}}/s/{spaceId}/app/dev_tools#/console?load_from={{kibanaUrl}}/s/{spaceId}/internal/risk_score/prebuilt_content/dev_tool/enable_user_risk_score

e.g.

http://localhost:5601/s/default/app/dev_tools#/console?load_from=http://localhost:5601/s/default/internal/risk_score/prebuilt_content/dev_tool/enable_user_risk_score

Do users still need to enable feature flags for the user and host risk score features to turn them on?

Feature flags are removed in this release. As long as users have platinum (or above) or trial license, they can access the feature.

IMPORTANT: If users have kibana configuration set in pre 8.5, they have to remove this before upgrading to 8.5 or it will failed.

Users must remove xpack.securitySolution.enableExperimental:[‘riskyUsersEnabled','riskyHostsEnabled'] to upgrade to 8.5

https://github.com/elastic/security-team/issues/4935

Outside of reindexing "old" risk scores, do users need to perform extra steps before upgrading the host or user risk score features?

Before this releases, all the scripts and ingest pipelines were created to share across spaces. If users installed the module for more than one space, make sure they reindex for all the spaces if they want to keep the old data before starting upgrading for any single space.

In what situations would users encounter a transform error (i.e., the transform already exists) when they install the host or user risk score feature?

Common error during installation:

Stored script cannot be installed:
Hosts risk score creates these four scripts: ml_hostriskscore_levels_script_{spaceId}, ml_hostriskscore_init_script_{spaceId}, ml_hostriskscore_map_script_{spaceId}, ml_hostriskscore_reduce_script_{spaceId}

Users risk score creates these three scripts: ml_userriskscore_levels_script_{spaceId}, ml_userriskscore_map_script_{spaceId}, ml_userriskscore_reduce_script_{spaceId}

We use create stored update stored script api to create the scripts. Users must have the manage cluster privilege to use this API.

Note: If users had any stored scripts with the same id as above before installation, they will be updated (without showing error).

Ingest pipeline cannot be installed:

Hosts risk score creates this ingest pipeline: ml_hostriskscore_ingest_pipeline_{spaceId}
Users risk score creates this ingest pipeline: ml_userriskscore_ingest_pipeline_{spaceId}

We use create or update pipeline api to create the ingest pipelines. Users must have the manage cluster privilege to use this API.

Indices cannot be installed:

Hosts risk score creates these indices: ml_host_risk_score_{spaceId}, ml_host_risk_score_latest_{spaceId}
Users risk score creates these indices: ml_user_risk_score_{spaceId}, ml_user_risk_score_latest_{spaceId}

We use create index api to create the indices. Users must have the create_index or manage index privilege for the target index.

Transforms cannot be installed:

Hosts risk score creates these transforms: ml_hostriskscore_pivot_transform_{spaceId}, ml_hostriskscore_latest_transform_{spaceId}
Users risk score creates these transforms: ml_userriskscore_pivot_transform_{spaceId}, ml_userriskscore_latest_transform_{spaceId}

We use create transform api to create the transforms. This Requires the following privileges:

cluster: manage_transform (the transform_admin built-in role grants this privilege)
source indices: read, view_index_metadata
destination index: read, create_index, index. If a retention_policy is configured, the delete privilege is also required.

Saved objects cannot be installed:

We import saved objects and the end of the installation.

To access Saved Objects, you must have the required Saved Objects Management Kibana privilege.
To add the privilege, open the main menu, and then click Stack Management > Roles.

It creates a tag {Host | User} Risk Score - {spaceId} and links all the relevant saved objects to it.

Hosts risk score creates:

Users risk score creates:

Will the host and user risk score docs be linked to in any newly developed error messages? If yes, which ones and which doc sections will be linked in the code?

No. all the docs are linked to
https://www.elastic.co/guide/en/security/current/host-risk-score.html
https://www.elastic.co/guide/en/security/current/user-risk-score.html

It'd be great if we could put the reference of error message somewhere obvious on the page, or please let me know if we have a particular page for it.

angorayc · 2022-09-22T15:27:00Z

No risk score data available to display - installation was finished without error but transforms haven't piked up data.

Please check your injected data and alerts data is available:

We rely on transforms to generate data for host / user risk score.

The transforms we installed for host risk score are (please check /app/management/data/transform and find the relevant transforms are displayed as started):

ml_hostriskscore_pivot_transform_{space_id}, its source index is .alerts-security.alerts-{{space_id}}, and its destination is ml_host_risk_score_{space_id}
ml_hostriskscore_latest_transform_{space_id}, its source index is ml_host_risk_score_{{space_id}}, and its destination is ml_host_risk_score_latest_{space_id}

The transforms we installed for user risk score are (please check /app/management/data/transform and find the relevant transforms are displayed as started):

ml_userriskscore_pivot_transform_{space_id}, its source index is .alerts-security.alerts-{{space_id}}, its destination is ml_user_risk_score_{space_id}
ml_userriskscore_latest_transform_{space_id}, its source index is ml_user_risk_score_{{space_id}}, its destination is ml_user_risk_score_latest_{space_id}

To know if there's data generated for host risk score, we can do some queries over these indices:

GET ml_host_risk_score_{{space_id}}/_search
GET ml_host_risk_score_latest_{{space_id}}/_search

To know if there's data generated for user risk score, we can do some queries over these indices:

GET ml_user_risk_score_{{space_id}}/_search
GET ml_user_risk_score_latest_{{space_id}}/_search

If no data returns from ml_host_risk_score_latest_{{space_id}} for host risk score or no data returns from ml_user_risk_score_latest_{{space_id}} for user risk score explains why we see no data detected prompt in the UI.

When we have no data in ml_host_risk_score_latest_{{space_id}} and ml_user_risk_score_latest_{{space_id}}, usually there is also no data from their source indices, ml_host_risk_score_{{space_id}} and ml_user_risk_score_{{space_id}}.

The source indices of ml_host_risk_score_{{space_id}} and ml_user_risk_score_{{space_id}} is .alerts-security.alerts-{{space_id}}, so we might want to check if there's data when ml_hostriskscore_pivot_transform_{space_id} and ml_userriskscore_pivot_transform_{space_id} are trying to pick up data.

If you are checking for ml_hostriskscore_pivot_transform_{space_id}:

GET _transform/ml_hostriskscore_pivot_transform_{space_id}/_stats?human=true

If you are checking for ml_userriskscore_pivot_transform_{space_id}:

GET _transform/ml_userriskscore_pivot_transform_{space_id}/_stats?human=true

You might have a response like:

{
  "count": 1,
  "transforms": [
    {
      "id": "ml_hostriskscore_pivot_transform_default",
      "state": "started",
      "node": {
        "id": "H1tlwfTyRkWls-C0sarmHw",
        "name": "instance-0000000000",
        "ephemeral_id": "SBqlp5ywRuuop2gtcdCljA",
        "transport_address": "10.43.255.164:19635",
        "attributes": {}
      },
      "stats": {
        "pages_processed": 29,
        "documents_processed": 11805,
        "documents_indexed": 8,
        "documents_deleted": 0,
        "trigger_count": 9,
        "index_time_in_ms": 52,
        "index_total": 7,
        "index_failures": 0,
        "search_time_in_ms": 201,
        "search_total": 29,
        "search_failures": 0,
        "processing_time_in_ms": 14,
        "processing_total": 29,
        "delete_time_in_ms": 0,
        "exponential_avg_checkpoint_duration_ms": 59.02353261024906,
        "exponential_avg_documents_indexed": 0.8762710605864747,
        "exponential_avg_documents_processed": 1664.7724779548555
      },
      "checkpointing": {
        "last": {
          "checkpoint": 8,
          "timestamp": "2022-10-17T14:49:50.315Z",
          "timestamp_millis": 1666018190315,
          "time_upper_bound": "2022-10-17T14:47:50.315Z",
          "time_upper_bound_millis": 1666018070315
        },
        "operations_behind": 380,
        "changes_last_detected_at_string": "2022-10-17T14:49:50.113Z",
        "changes_last_detected_at": 1666018190113,
        "last_search_time_string": "2022-10-17T14:49:50.113Z",
        "last_search_time": 1666018190113
      }
    }
  ]
}

And then use the value of the time_upper_bound_millis from the response as a range query for the alerts index:

GET .alerts-security.alerts-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": <time_upper_bound_millis>
      }
    }
  }
}

for example:

GET .alerts-security.alerts-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": 1666018070315
      }
    }
  }
}

If no response return, please check if alerts are generated properly. If data exists, please restart transforms.

angorayc · 2022-09-22T15:32:14Z

During the installation / upgrading process, if there's any error message below:

Ingest pipeline already exists
Transform already exists
Saved object already exists

Please manually delete the module and enable it again:

Manually delete the module:

Remove saved objects: in /app/management/kibana/objects delete the created tag {{Host|User}} Risk Score - {{spaceId}} and the linked saved objects.

Remove transforms in /app/management/data/transform (ml_{{host|user}}riskscore_latest_transform_{{spaceId}} and ml_{{host|user}}riskscore_pivot_transform_{{spaceId}}). (stop them first then delete them, also delete the dest. index it created)

You can also delete transforms via dev tools:

# Stop and delete the latest transform
POST _transform/ml_{{host|user}}riskscore_latest_transform_{{space_id}}/_stop
DELETE _transform/ml_{{host|user}}riskscore_latest_transform_{{space_id}}

# Stop and delete the pivot transform
POST _transform/ml_{{host|user}}riskscore_pivot_transform_{{space_id}}/_stop
DELETE _transform/ml_{{host|user}}riskscore_pivot_transform_{{space_id}}

Remove ingest pipeline ml_{{host|user}}riskscore_ingest_pipeline_{spaceId} in /app/management/ingest/ingest_pipelines.

You can also delete ingest pipeline via dev tools:

DELETE /_ingest/pipeline/ml_{{host|user}}riskscore_ingest_pipeline_{spaceId}

Remove stored scripts via dev tools with delete stored script api:

Hosts risk score: ml_hostriskscore_levels_script_{spaceId}, ml_hostriskscore_init_script_{spaceId}, ml_hostriskscore_map_script_{spaceId}, ml_hostriskscore_reduce_script_{spaceId}

DELETE _scripts/ml_hostriskscore_levels_script_{spaceId}
DELETE _scripts/ml_hostriskscore_init_script_{spaceId}
DELETE _scripts/ml_hostriskscore_map_script_{spaceId}
DELETE _scripts/ml_hostriskscore_reduce_script_{spaceId}

Users risk score: ml_userriskscore_levels_script_{spaceId}, ml_userriskscore_map_script_{spaceId}, ml_userriskscore_reduce_script_{spaceId}

DELETE _scripts/ml_userriskscore_levels_script_{spaceId}
DELETE _scripts/ml_userriskscore_map_script_{spaceId}
DELETE _scripts/ml_userriskscore_reduce_script_{spaceId}

ajosh0504 · 2022-09-26T14:39:58Z

@nastasha-solomon Is there an open PR for this yet? We also have this old, and now outdated documentation for this feature in detection-rules.

We would like to make the official documentation as exhaustive as possible so users don't have to look in multiple places- hopefully we can include everything Angela has detailed above in the main docs?

SourinPaul · 2022-10-03T21:21:27Z

@jmikell821 per our conversation linking the documentation task for 8.5 breaking changes
here for traceability https://github.com/elastic/security-team/issues/4898

angorayc · 2022-10-06T17:10:32Z

Known issue: @jmikell821 please include the known issue to the doc if it's ok, thank you!
The feature is working in 8.5, it's just it takes more time to complete. Please notify users to allow some time to install / upgrade. The performance is enhanced in elastic/kibana#142434, and here's the reason why 8.5 is slower and 8.6 is faster.

In 8.5, all the actions are done by client side requests, so it takes a while to enable / upgrade the module.
For example, after clicking on enable button, you can see lots of client side request sent (from the first create to the last hostRiskScoreDashboards), these requests create the indices, (an) ingest pipeline, transforms, and saved objects to enable the risk scores.

We have a fix in 8.6 for that to move most of the actions to the server side, this will reduce the time for enable / upgrade process. The main differences are: We do all the actions for creating indices, (an) ingest pipeline, transforms to a single api, which displayed as risk_score in the network panel (the 1st one), and creating saved objects remain in hostRiskScoreDashboards.

elastic/kibana#142434

nastasha-solomon · 2022-10-06T18:45:41Z

Thank you, @angorayc ! We'll make sure this gets doc' in the 8.5 release notes and will keep an eye on it when 8.6 rolls around.

cc: @benironside

angorayc · 2022-10-12T10:42:02Z

https://github.com/elastic/detection-rules/blob/main/docs/experimental-machine-learning/host-risk-score.md and https://github.com/elastic/detection-rules/blob/main/docs/experimental-machine-learning/user-risk-score.md we might want to clarify in these two pages are for pre v8.5 only.

The content on the github pages are no longer true in 8.5, so it’s good to always point users to elastic docs. Just in case users landing on those two github pages somehow, could we add some notice on the github page says that it’s deprecated in 8.5 and point them to https://www.elastic.co/guide/en/security/master/host-risk-score.html or https://www.elastic.co/guide/en/security/master/user-risk-score.html

ajosh0504 · 2022-10-18T20:58:46Z

@angorayc Any reason why we're asking users to delete everything in case of a conflict? I'm also unable to think of a situation when this would occur. Could you please elaborate on what you're thinking here?

angorayc · 2022-10-19T12:23:21Z

@angorayc Any reason why we're asking users to delete everything in case of a conflict? I'm also unable to think of a situation when this would occur. Could you please elaborate on what you're thinking here?

As the current process continue even if error occurs in the middle of the process For example, the script continues to the end even if ingest pipeline conflict happens in the middle of the process. Indices and transforms will still be created, if they don't delete everything, then next time they might see indices already exists and transforms already exists. Therefore delete everything should be the safest way to avoid that.

nastasha-solomon added Team: Docs Team: Threat Hunting Formerly Data Visibility v8.5.0 labels Sep 20, 2022

nkhristinin mentioned this issue Sep 21, 2022

[DOCS] Host and user risk score alert enrichments #2480

Closed

angorayc mentioned this issue Sep 26, 2022

[SecuritySolution] Onboard hosts and users risk score module elastic/kibana#140377

Merged

2 tasks

jmikell821 self-assigned this Sep 28, 2022

jmikell821 added Feature: Host Risk Score Feature: User Risk Score meta labels Sep 28, 2022

jmikell821 changed the title ~~[DOCS] Changes to host and user risk score features~~ [META] Changes to host and user risk score features Sep 28, 2022

This was referenced Sep 28, 2022

[DOCS] Add Entity analytics dashboard #2517

Closed

[DOCS] Alert counts added to Entity pages #2525

Closed

angorayc mentioned this issue Oct 4, 2022

Risk score installation refactory elastic/kibana#142434

Merged

1 task

This was referenced Oct 16, 2022

[DOCS] Update Overview dashboard #2578

Closed

[DOCS] Update risk score read.me modules #2579

Closed

[DOCS] Risk score enhancements #2580

Merged

angorayc mentioned this issue Nov 9, 2022

[SecuritySolution] Failed to upgrade host risk score elastic/kibana#144916

Closed

jmikell821 closed this as completed Jan 27, 2023

nastasha-solomon mentioned this issue Oct 4, 2023

[BUG] Docs for user/host risk score missing steps for manually delete the risk score module #4006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] Changes to host and user risk score features #2477

[META] Changes to host and user risk score features #2477

nastasha-solomon commented Sep 20, 2022 •

edited

Loading

peluja1012 commented Sep 20, 2022

nastasha-solomon commented Sep 20, 2022 •

edited

Loading

peluja1012 commented Sep 20, 2022

angorayc commented Sep 22, 2022 •

edited

Loading

angorayc commented Sep 22, 2022 •

edited

Loading

angorayc commented Sep 22, 2022 •

edited

Loading

ajosh0504 commented Sep 26, 2022

SourinPaul commented Oct 3, 2022

angorayc commented Oct 6, 2022 •

edited

Loading

nastasha-solomon commented Oct 6, 2022

angorayc commented Oct 12, 2022 •

edited

Loading

ajosh0504 commented Oct 18, 2022

angorayc commented Oct 19, 2022

[META] Changes to host and user risk score features #2477

[META] Changes to host and user risk score features #2477

Comments

nastasha-solomon commented Sep 20, 2022 • edited Loading

Description

Host risk score

User risk score

Related issues/PRs

Additional notes/questions:

peluja1012 commented Sep 20, 2022

nastasha-solomon commented Sep 20, 2022 • edited Loading

peluja1012 commented Sep 20, 2022

angorayc commented Sep 22, 2022 • edited Loading

angorayc commented Sep 22, 2022 • edited Loading

angorayc commented Sep 22, 2022 • edited Loading

ajosh0504 commented Sep 26, 2022

SourinPaul commented Oct 3, 2022

angorayc commented Oct 6, 2022 • edited Loading

nastasha-solomon commented Oct 6, 2022

angorayc commented Oct 12, 2022 • edited Loading

ajosh0504 commented Oct 18, 2022

angorayc commented Oct 19, 2022

nastasha-solomon commented Sep 20, 2022 •

edited

Loading

nastasha-solomon commented Sep 20, 2022 •

edited

Loading

angorayc commented Sep 22, 2022 •

edited

Loading

angorayc commented Sep 22, 2022 •

edited

Loading

angorayc commented Sep 22, 2022 •

edited

Loading

angorayc commented Oct 6, 2022 •

edited

Loading

angorayc commented Oct 12, 2022 •

edited

Loading