Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Create ML module for log analysis #42872

Merged

Conversation

weltenwort
Copy link
Member

@weltenwort weltenwort commented Aug 7, 2019

Summary

This adds an ML module called logs_ui_analysis, which contains a job definition to detect anomalies in the log rate.

⚠️ Some fields are designed to be overridden at setup time using #42946:

  • in the job log-entry-rate
    • analysis_config.bucket_span
    • data_description.time_field
  • in the datafeed datafeed-log-entry-rate:
    • aggregations.buckets.date_histogram.field
    • aggregations.buckets.date_histogram.fixed_interval
    • aggregations.buckets.aggregations[timestampField]
    • aggregations.buckets.aggregations[timestampField].max.field
    • aggregations.buckets.aggregations.doc_count_per_minute.bucket_script.script.params.bucket_span_in_ms

closes #42593

Implementation notes

  • The datafeed scales the doc_count to one minute using a bucket_script aggregation called doc_count_per_minute. That aggregation is used as the summary_count_field_name in the job configuration. The scaling has several advantages:
    • Denominators of 1 in the unit reduce the risk of misinterpretation (doc_count / minute compared to doc_count / (15 minutes).
    • Future changes of bucket span sizes are simpler because the result's unit is not as tightly coupled to it.
  • The max timestamp aggregation is not included in the datafeed, because its name depends on the time field name. Since the additive nature of the overrides makes a removal of the predefined aggregation impossible, it needs to be specified solely as an override at setup time.

Testing hints

The module can be deployed via a call to Kibana's ML module setup api at /api/ml/modules/setup/logs_ui_analysis. The POST requests body needs to specify a few parameters, e.g.:

{
  "indexPatternName": "filebeat-*",
  "prefix": "kibana-logs-ui-testspace-default-",
  "startDatafeed": true,
  "jobOverrides": [
    {
      "job_id": "log-entry-rate",
      "analysis_config": {
        "bucket_span": "900000ms"
      },
      "data_description": {
        "time_field": "@timestamp"
      }
    }
  ],
  "datafeedOverrides": [
    {
      "job_id": "log-entry-rate",
      "aggregations": {
        "buckets": {
          "date_histogram": {
            "field": "@timestamp",
            "fixed_interval": "900000ms"
          },
          "aggregations": {
            "@timestamp": {
              "max": {
                "field": "@timestamp"
              }
            },
            "doc_count_per_minute": {
              "bucket_script": {
                "script": {
                  "params": {
                    "bucket_span_in_ms": 900000
                  }
                }
              }
            }
          }
        }
      }
    }
  ]
}

@weltenwort weltenwort added v8.0.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services release_note:skip Skip the PR/issue when compiling release notes v7.4.0 labels Aug 7, 2019
@weltenwort weltenwort self-assigned this Aug 7, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-logs-ui

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

The overrides are recursively merged and therefore additive. Therefore
we can't specify the timestamp agg here, because it could not be
overridden later with a different field and agg name. It needs to be
solely specified at setup time.
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@weltenwort weltenwort marked this pull request as ready for review August 9, 2019 20:36
@weltenwort weltenwort requested a review from a team as a code owner August 9, 2019 20:36
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@jasonrhodes
Copy link
Member

@elastic/ml-ui any feedback you all have or anything else you need from us on this? We are beginning to build the UI that connects to this endpoint and are hoping to test it soon. Thanks!

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the testing hints I was able to create a job based on filebeat data, so the setup in general LGTM.

Looks good in Single Metric Viewer:

image

(I'm not familiar with the use case so I cannot really assess the quality of the job/datafeed config, but code LGTM)

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

jasonrhodes added a commit that referenced this pull request Aug 14, 2019
* Add ml module with hard-coded timestamp field

* Fix data_recognizer test

* Parameterize the bucket span normalization

* Remove max agg which will be specified during setup

The overrides are recursively merged and therefore additive. Therefore
we can't specify the timestamp agg here, because it could not be
overridden later with a different field and agg name. It needs to be
solely specified at setup time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logs UI Logs UI feature release_note:skip Skip the PR/issue when compiling release notes review Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.4.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logs UI] Create ML module for log analysis
5 participants