Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Changes to host and user risk score features #2477

Closed
nastasha-solomon opened this issue Sep 20, 2022 · 13 comments
Closed

[META] Changes to host and user risk score features #2477

nastasha-solomon opened this issue Sep 20, 2022 · 13 comments

Comments

@nastasha-solomon
Copy link
Contributor

nastasha-solomon commented Sep 20, 2022

Description

In 8.5, improvements were made to the onboarding workflow for the host and user risk score features. These changes streamline previously complicated steps that required some technical knowledge and sometimes took users outside of the Security app. Now, users can enable both features and generate risk scores with a single click.

In addition, users can upgrade host and user risk score features in a single click. The Upgrade button will be display anywhere host and user risk scores are available. This includes the:

  • Host Risk Scores and User Risk Scores sections on the Entity analytics dashboard
  • Host risk tab on the Hosts page
  • User risk tab on the Users page
  • Host risk tab on a host's details page
  • User risk tab on a user's details page

The following sections outline changes to the host and user risk score features in addition to caveats that may need to be doc'd.

Host risk score

Changes introduced in 8.5:

  • Host risk score card removed - The host risk score card will no longer display on the Overview dashboard. This will require multiple text and screenshot updates within the Host risk score topic.
  • Simpler way to enable host risk score - In 8.5, users only need to click the Enable Host Risk Score button in the "Host Risk Scores" section of the Entity Analytics dashboard to turn on the feature. When they click this button, host risk scores are generated as well if alert data is present in the environment. To account for this change, we'll need to revise the docs for deploying host risk score and viewing host risk score data.
  • New way to upgrade the host risk score feature: If the user had enabled the host risk score feature in 8.4 or earlier, they will see the an Upgrade button within the "Host Risk Scores" section instead of the Enable Host Risk Score button. When they click Upgrade, their “old” host risk scores are deleted and new ones are created. The “old” data is not preserved as that data type is no longer supported. This will likely need to be doc'd as a breaking change in the 8.5 release notes.
    • IMPORTANT: If the user wants to keep their “old” host risk scores, they need to reindex the data before upgrading the host risk score feature. They can use the [Elasticsearch Reindex API] (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html) to reindex their "old" data. Their data will then be stored in the ml_{host|user}_risk_score_{spaceId} and ml_${host|user}_risk_score_latest_{spaceId} indices. Here is an example reindexing request:
POST _reindex
{
  "source": {
    "index": "ml_{host|user}_risk_score_{spaceId}"
  },
  "dest": {
    "index": "my_new_ml_{host|user}_risk_score_{spaceId}"
  }
}

User risk score

Changes introduced in 8.5:

  • Simpler way to enable user risk score - Similar to the host risk score, in 8.5, users only need to click the Enable User Risk Score button in the "Host Risk Scores" section of the Entity Analytics dashboard to turn on the feature. We might need to refresh the docs for deploying the user risk score package to account for these changes.
  • New way to upgrade the user risk score feature: If the user had enabled the user risk score feature in 8.4 or earlier, they will see the an Upgrade button within the "User Risk Scores" section instead of the Enable User Risk Score button. When they click Upgrade, their “old” user risk scores are deleted and new ones are created. The “old” data is not preserved as that data type is no longer supported. This will likely need to be doc'd as a breaking change in the 8.5 release notes.
    • IMPORTANT: Users will need to reindex "old" user risk scores to keep them. Steps are same as what mentioned for host risk score.

Related issues/PRs

Additional notes/questions:

  • Should we be directing users to the Entity Analytics dashboard in order to enable/upgrade the host and user risk score features? Are there any other places users can do this?
  • Do users still need to enable feature flags for the user and host risk score features to turn them on?
  • Outside of reindexing "old" risk scores, do users need to perform extra steps before upgrading the host or user risk score features?
  • In what situations would users encounter a transform error (i.e., the transform already exists) when they install the host or user risk score feature?
  • Will the host and user risk score docs be linked to in any newly developed error messages? If yes, which ones and which doc sections will be linked in the code?

The deployment has three spaces for a quick view and we can add new spaces. Please note that we've been asked to not add data, enable or upgrade any of the spaces. (Please create a new space if you'd like to try it out)

  • Default - No data
  • Installation panel
  • Legacy data - upgrade panel
@peluja1012
Copy link
Contributor

Hi @nastasha-solomon, we are also adding a feature called "alert enrichments" where we store and display the entity (host/user) risk score. Here is the PR. I was wondering if that should be tracked as part of this github issue or if a separate one should be created.

@nastasha-solomon
Copy link
Contributor Author

nastasha-solomon commented Sep 20, 2022

@peluja1012 thanks for pinging us about this! I think a separate doc issue is better for organization purposes. This doc issue is mainly for describing what changed in the host and user risk score onboarding/upgrading workflows. Correct me if I'm wrong, but it seem as though the docs for the new alert enrichments feature should describe the type of host/user risk data that's displayed on the Alert details flyout after the host/user risk score features are enabled.

@peluja1012
Copy link
Contributor

Correct me if I'm wrong, but it seem as though the docs for the elastic/kibana#139478 should describe the type of host/user risk data that's displayed on the Alert details flyout after the host/user risk score features are enabled.

Hey @nastasha-solomon, yes that's correct.

@angorayc
Copy link
Contributor

angorayc commented Sep 22, 2022

  • Should we be directing users to the Entity Analytics dashboard in order to enable/upgrade the host and user risk score features? Are there any other places users can do this?

Users can enable both risk scores from Entity Analytics page
Screenshot 2022-09-22 at 11 04 27

Users can enable host risk score from hosts / host details
Screenshot 2022-09-22 at 11 04 52

Users can enable host risk score via dev tools by visiting the url (The content will be appended to the bottom of the end of current content, please make sure you scroll to the bottom to find the content):
{{kibanaUrl}}/s/{spaceId}/app/dev_tools#/console?load_from={{kibanaUrl}}/s/{spaceId}/internal/risk_score/prebuilt_content/dev_tool/enable_host_risk_score

e.g.

http://localhost:5601/s/default/app/dev_tools#/console?load_from=http://localhost:5601/s/default/internal/risk_score/prebuilt_content/dev_tool/enable_host_risk_score

Screenshot 2022-09-22 at 11 10 54

Users can enable user risk score users / users details
Screenshot 2022-09-22 at 11 11 56

Screenshot 2022-09-22 at 11 12 12

Users can enable user risk score via dev tools by visiting the url (The content will be appended to the bottom of the end of current content, please make sure you scroll to the bottom to find the content):
{{kibanaUrl}}/s/{spaceId}/app/dev_tools#/console?load_from={{kibanaUrl}}/s/{spaceId}/internal/risk_score/prebuilt_content/dev_tool/enable_user_risk_score

e.g.

http://localhost:5601/s/default/app/dev_tools#/console?load_from=http://localhost:5601/s/default/internal/risk_score/prebuilt_content/dev_tool/enable_user_risk_score
  • Do users still need to enable feature flags for the user and host risk score features to turn them on?

Feature flags are removed in this release. As long as users have platinum (or above) or trial license, they can access the feature.

IMPORTANT: If users have kibana configuration set in pre 8.5, they have to remove this before upgrading to 8.5 or it will failed.

Users must remove xpack.securitySolution.enableExperimental:[‘riskyUsersEnabled','riskyHostsEnabled'] to upgrade to 8.5
4_add_flag

https://github.com/elastic/security-team/issues/4935

  • Outside of reindexing "old" risk scores, do users need to perform extra steps before upgrading the host or user risk score features?

Before this releases, all the scripts and ingest pipelines were created to share across spaces. If users installed the module for more than one space, make sure they reindex for all the spaces if they want to keep the old data before starting upgrading for any single space.

  • In what situations would users encounter a transform error (i.e., the transform already exists) when they install the host or user risk score feature?

Common error during installation:

  1. Stored script cannot be installed:
    Hosts risk score creates these four scripts: ml_hostriskscore_levels_script_{spaceId}, ml_hostriskscore_init_script_{spaceId}, ml_hostriskscore_map_script_{spaceId}, ml_hostriskscore_reduce_script_{spaceId}

Users risk score creates these three scripts: ml_userriskscore_levels_script_{spaceId}, ml_userriskscore_map_script_{spaceId}, ml_userriskscore_reduce_script_{spaceId}

We use create stored update stored script api to create the scripts. Users must have the manage cluster privilege to use this API.

Note: If users had any stored scripts with the same id as above before installation, they will be updated (without showing error).

  1. Ingest pipeline cannot be installed:

Hosts risk score creates this ingest pipeline: ml_hostriskscore_ingest_pipeline_{spaceId}
Users risk score creates this ingest pipeline: ml_userriskscore_ingest_pipeline_{spaceId}

We use create or update pipeline api to create the ingest pipelines. Users must have the manage cluster privilege to use this API.

  1. Indices cannot be installed:

Hosts risk score creates these indices: ml_host_risk_score_{spaceId}, ml_host_risk_score_latest_{spaceId}
Users risk score creates these indices: ml_user_risk_score_{spaceId}, ml_user_risk_score_latest_{spaceId}

We use create index api to create the indices. Users must have the create_index or manage index privilege for the target index.

  1. Transforms cannot be installed:

Hosts risk score creates these transforms: ml_hostriskscore_pivot_transform_{spaceId}, ml_hostriskscore_latest_transform_{spaceId}
Users risk score creates these transforms: ml_userriskscore_pivot_transform_{spaceId}, ml_userriskscore_latest_transform_{spaceId}

We use create transform api to create the transforms. This Requires the following privileges:

cluster: manage_transform (the transform_admin built-in role grants this privilege)
source indices: read, view_index_metadata
destination index: read, create_index, index. If a retention_policy is configured, the delete privilege is also required.

  1. Saved objects cannot be installed:

We import saved objects and the end of the installation.

To access Saved Objects, you must have the required Saved Objects Management Kibana privilege.
To add the privilege, open the main menu, and then click Stack Management > Roles.

It creates a tag {Host | User} Risk Score - {spaceId} and links all the relevant saved objects to it.

Hosts risk score creates:

Screenshot 2022-09-22 at 16 16 55

Users risk score creates:

Screenshot 2022-09-22 at 16 20 57

  • Will the host and user risk score docs be linked to in any newly developed error messages? If yes, which ones and which doc sections will be linked in the code?

No. all the docs are linked to
https://www.elastic.co/guide/en/security/current/host-risk-score.html
https://www.elastic.co/guide/en/security/current/user-risk-score.html

It'd be great if we could put the reference of error message somewhere obvious on the page, or please let me know if we have a particular page for it.

@angorayc
Copy link
Contributor

angorayc commented Sep 22, 2022

No risk score data available to display - installation was finished without error but transforms haven't piked up data.

Please check your injected data and alerts data is available:
Screenshot 2022-10-06 at 09 45 46

We rely on transforms to generate data for host / user risk score.

The transforms we installed for host risk score are (please check /app/management/data/transform and find the relevant transforms are displayed as started):

  1. ml_hostriskscore_pivot_transform_{space_id}, its source index is .alerts-security.alerts-{{space_id}}, and its destination is ml_host_risk_score_{space_id}
  2. ml_hostriskscore_latest_transform_{space_id}, its source index is ml_host_risk_score_{{space_id}}, and its destination is ml_host_risk_score_latest_{space_id}

The transforms we installed for user risk score are (please check /app/management/data/transform and find the relevant transforms are displayed as started):

  1. ml_userriskscore_pivot_transform_{space_id}, its source index is .alerts-security.alerts-{{space_id}}, its destination is ml_user_risk_score_{space_id}
  2. ml_userriskscore_latest_transform_{space_id}, its source index is ml_user_risk_score_{{space_id}}, its destination is ml_user_risk_score_latest_{space_id}

To know if there's data generated for host risk score, we can do some queries over these indices:

GET ml_host_risk_score_{{space_id}}/_search
GET ml_host_risk_score_latest_{{space_id}}/_search

To know if there's data generated for user risk score, we can do some queries over these indices:

GET ml_user_risk_score_{{space_id}}/_search
GET ml_user_risk_score_latest_{{space_id}}/_search

If no data returns from ml_host_risk_score_latest_{{space_id}} for host risk score or no data returns from ml_user_risk_score_latest_{{space_id}} for user risk score explains why we see no data detected prompt in the UI.

When we have no data in ml_host_risk_score_latest_{{space_id}} and ml_user_risk_score_latest_{{space_id}}, usually there is also no data from their source indices, ml_host_risk_score_{{space_id}} and ml_user_risk_score_{{space_id}}.

The source indices of ml_host_risk_score_{{space_id}} and ml_user_risk_score_{{space_id}} is .alerts-security.alerts-{{space_id}}, so we might want to check if there's data when ml_hostriskscore_pivot_transform_{space_id} and ml_userriskscore_pivot_transform_{space_id} are trying to pick up data.

If you are checking for ml_hostriskscore_pivot_transform_{space_id}:

GET _transform/ml_hostriskscore_pivot_transform_{space_id}/_stats?human=true

If you are checking for ml_userriskscore_pivot_transform_{space_id}:

GET _transform/ml_userriskscore_pivot_transform_{space_id}/_stats?human=true

You might have a response like:

{
  "count": 1,
  "transforms": [
    {
      "id": "ml_hostriskscore_pivot_transform_default",
      "state": "started",
      "node": {
        "id": "H1tlwfTyRkWls-C0sarmHw",
        "name": "instance-0000000000",
        "ephemeral_id": "SBqlp5ywRuuop2gtcdCljA",
        "transport_address": "10.43.255.164:19635",
        "attributes": {}
      },
      "stats": {
        "pages_processed": 29,
        "documents_processed": 11805,
        "documents_indexed": 8,
        "documents_deleted": 0,
        "trigger_count": 9,
        "index_time_in_ms": 52,
        "index_total": 7,
        "index_failures": 0,
        "search_time_in_ms": 201,
        "search_total": 29,
        "search_failures": 0,
        "processing_time_in_ms": 14,
        "processing_total": 29,
        "delete_time_in_ms": 0,
        "exponential_avg_checkpoint_duration_ms": 59.02353261024906,
        "exponential_avg_documents_indexed": 0.8762710605864747,
        "exponential_avg_documents_processed": 1664.7724779548555
      },
      "checkpointing": {
        "last": {
          "checkpoint": 8,
          "timestamp": "2022-10-17T14:49:50.315Z",
          "timestamp_millis": 1666018190315,
          "time_upper_bound": "2022-10-17T14:47:50.315Z",
          "time_upper_bound_millis": 1666018070315
        },
        "operations_behind": 380,
        "changes_last_detected_at_string": "2022-10-17T14:49:50.113Z",
        "changes_last_detected_at": 1666018190113,
        "last_search_time_string": "2022-10-17T14:49:50.113Z",
        "last_search_time": 1666018190113
      }
    }
  ]
}

And then use the value of the time_upper_bound_millis from the response as a range query for the alerts index:

GET .alerts-security.alerts-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": <time_upper_bound_millis>
      }
    }
  }
}

for example:

GET .alerts-security.alerts-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": 1666018070315
      }
    }
  }
}

If no response return, please check if alerts are generated properly. If data exists, please restart transforms.

@angorayc
Copy link
Contributor

angorayc commented Sep 22, 2022

During the installation / upgrading process, if there's any error message below:

  • Ingest pipeline already exists
  • Transform already exists
  • Saved object already exists

Please manually delete the module and enable it again:

Manually delete the module:

  1. Remove saved objects: in /app/management/kibana/objects delete the created tag {{Host|User}} Risk Score - {{spaceId}} and the linked saved objects.

Screenshot 2022-09-22 at 16 16 55

Screenshot 2022-09-22 at 16 20 57

  1. Remove transforms in /app/management/data/transform (ml_{{host|user}}riskscore_latest_transform_{{spaceId}} and ml_{{host|user}}riskscore_pivot_transform_{{spaceId}}). (stop them first then delete them, also delete the dest. index it created)

Screenshot 2022-09-22 at 16 45 35

You can also delete transforms via dev tools:

# Stop and delete the latest transform
POST _transform/ml_{{host|user}}riskscore_latest_transform_{{space_id}}/_stop
DELETE _transform/ml_{{host|user}}riskscore_latest_transform_{{space_id}}

# Stop and delete the pivot transform
POST _transform/ml_{{host|user}}riskscore_pivot_transform_{{space_id}}/_stop
DELETE _transform/ml_{{host|user}}riskscore_pivot_transform_{{space_id}}
  1. Remove ingest pipeline ml_{{host|user}}riskscore_ingest_pipeline_{spaceId} in /app/management/ingest/ingest_pipelines.

You can also delete ingest pipeline via dev tools:

DELETE /_ingest/pipeline/ml_{{host|user}}riskscore_ingest_pipeline_{spaceId}
  1. Remove stored scripts via dev tools with delete stored script api:

Hosts risk score: ml_hostriskscore_levels_script_{spaceId}, ml_hostriskscore_init_script_{spaceId}, ml_hostriskscore_map_script_{spaceId}, ml_hostriskscore_reduce_script_{spaceId}

DELETE _scripts/ml_hostriskscore_levels_script_{spaceId}
DELETE _scripts/ml_hostriskscore_init_script_{spaceId}
DELETE _scripts/ml_hostriskscore_map_script_{spaceId}
DELETE _scripts/ml_hostriskscore_reduce_script_{spaceId}

Users risk score: ml_userriskscore_levels_script_{spaceId}, ml_userriskscore_map_script_{spaceId}, ml_userriskscore_reduce_script_{spaceId}

DELETE _scripts/ml_userriskscore_levels_script_{spaceId}
DELETE _scripts/ml_userriskscore_map_script_{spaceId}
DELETE _scripts/ml_userriskscore_reduce_script_{spaceId}

@ajosh0504
Copy link
Contributor

@nastasha-solomon Is there an open PR for this yet? We also have this old, and now outdated documentation for this feature in detection-rules.

We would like to make the official documentation as exhaustive as possible so users don't have to look in multiple places- hopefully we can include everything Angela has detailed above in the main docs?

@jmikell821 jmikell821 self-assigned this Sep 28, 2022
@jmikell821 jmikell821 changed the title [DOCS] Changes to host and user risk score features [META] Changes to host and user risk score features Sep 28, 2022
@SourinPaul
Copy link

@jmikell821 per our conversation linking the documentation task for 8.5 breaking changes
here for traceability https://github.com/elastic/security-team/issues/4898

@angorayc
Copy link
Contributor

angorayc commented Oct 6, 2022

Known issue: @jmikell821 please include the known issue to the doc if it's ok, thank you!
The feature is working in 8.5, it's just it takes more time to complete. Please notify users to allow some time to install / upgrade. The performance is enhanced in elastic/kibana#142434, and here's the reason why 8.5 is slower and 8.6 is faster.

In 8.5, all the actions are done by client side requests, so it takes a while to enable / upgrade the module.
For example, after clicking on enable button, you can see lots of client side request sent (from the first create to the last hostRiskScoreDashboards), these requests create the indices, (an) ingest pipeline, transforms, and saved objects to enable the risk scores.
Screenshot 2022-10-06 at 21 10 32

We have a fix in 8.6 for that to move most of the actions to the server side, this will reduce the time for enable / upgrade process. The main differences are: We do all the actions for creating indices, (an) ingest pipeline, transforms to a single api, which displayed as risk_score in the network panel (the 1st one), and creating saved objects remain in hostRiskScoreDashboards.

Screenshot 2022-10-06 at 21 15 52

elastic/kibana#142434

@nastasha-solomon
Copy link
Contributor Author

Thank you, @angorayc ! We'll make sure this gets doc' in the 8.5 release notes and will keep an eye on it when 8.6 rolls around.

cc: @benironside

@angorayc
Copy link
Contributor

angorayc commented Oct 12, 2022

https://github.com/elastic/detection-rules/blob/main/docs/experimental-machine-learning/host-risk-score.md and https://github.com/elastic/detection-rules/blob/main/docs/experimental-machine-learning/user-risk-score.md we might want to clarify in these two pages are for pre v8.5 only.

The content on the github pages are no longer true in 8.5, so it’s good to always point users to elastic docs. Just in case users landing on those two github pages somehow, could we add some notice on the github page says that it’s deprecated in 8.5 and point them to https://www.elastic.co/guide/en/security/master/host-risk-score.html or https://www.elastic.co/guide/en/security/master/user-risk-score.html

@ajosh0504
Copy link
Contributor

@angorayc Any reason why we're asking users to delete everything in case of a conflict? I'm also unable to think of a situation when this would occur. Could you please elaborate on what you're thinking here?

@angorayc
Copy link
Contributor

@angorayc Any reason why we're asking users to delete everything in case of a conflict? I'm also unable to think of a situation when this would occur. Could you please elaborate on what you're thinking here?

As the current process continue even if error occurs in the middle of the process For example, the script continues to the end even if ingest pipeline conflict happens in the middle of the process. Indices and transforms will still be created, if they don't delete everything, then next time they might see indices already exists and transforms already exists. Therefore delete everything should be the safest way to avoid that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants