Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Updates APM Module to Work with Service Maps #70361

Merged
merged 4 commits into from
Jul 2, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
{
"id": "apm_transaction",
"title": "APM",
"description": "Detect anomalies in high mean of transaction duration (ECS).",
"description": "Detect anomalies in transactions from your APM services",
blaklaybul marked this conversation as resolved.
Show resolved Hide resolved
"type": "Transaction data",
"logoFile": "logo.json",
"defaultIndexPattern": "apm-*",
"defaultIndexPattern": "apm-*-transaction",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that this only has the * wildcard in the middle of the pattern, and not at the end too, as we have apm-*-transaction-* in the example provided to the setup endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two index patterns in the test data provided by @dgieselaar : apm-* and apm-*-transaction. I went with the latter since the module apm_transaction focuses only on transaction data. @dgieselaar will apm-*-transaction reliably exist? If not, @peteharverson are there any potential consequences of having a nonexistent index pattern here?

Copy link
Member

@dgieselaar dgieselaar Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blaklaybul it's configurable, so no guarantees (for both apm-* and apm-*-transaction). I'm assuming we set this when we create the job (have to check but can't find the code right now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blaklaybul the defaultIndexPattern supplied in the module manifest.json is just used as a fallback by our module endpoints if no indexPatternName is supplied to the endpoint. So for the common use case, where the index pattern is supplied to the setup endpoint to create the jobs, this value from the manifest won't be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"query": {
"bool": {
"filter": [
{ "term": { "processor.event": "transaction" } },
{ "term": { "transaction.type": "request" } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the previous job tied to transaction.type=request? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change happened in #30820 - does that mean ML did not work for other transaction types for the past 1.5 years?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if the jobs were created via the ML job creation process, then the datafeed filtered for "transaction.type": "request".

{ "exists": { "field": "transaction.duration" } }
]
}
},
"jobs": [
{
"id": "high_mean_response_time",
"file": "high_mean_response_time.json"
"id": "high_mean_transaction_duration",
"file": "high_mean_transaction_duration.json"
}
],
"datafeeds": [
{
"id": "datafeed-high_mean_response_time",
"file": "datafeed_high_mean_response_time.json",
"job_id": "high_mean_response_time"
"id": "datafeed-high_mean_transaction_duration",
"file": "datafeed_high_mean_transaction_duration.json",
"job_id": "high_mean_transaction_duration"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"bool": {
"filter": [
{ "term": { "processor.event": "transaction" } },
{ "term": { "transaction.type": "request" } }
{ "exists": { "field": "transaction.duration.us" } }
Copy link
Member

@sorenlouv sorenlouv Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the following request to create a new ML job:

POST /api/ml/modules/setup/apm_transaction
{
  // tagging the job with service.environment
  "custom_settings": {
    "service.environment": "production"
  },
  // tagging the job with app name
  "groups": ["apm"],
  // specifying the indicies to query
  "indexPatternName": "apm-*-transaction-*",
  // create job and start immediately
  "startDatafeed": true,
  // limit job to specific environment
  "query": {
    "bool": {
      "filter": [{ "term": { "service.environment": "production" } }]
    }
  }
}

Will the query specified by APM over the API and the query in the config file be merged? Or should we change the query to:

  "query": {
    "bool": {
      "filter": [
        { "term": { "processor.event": "transaction" } },
        { "exists": { "field": "transaction.duration.us" } }
        { "term": { "service.environment": "production" } }]
    }
  }

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. overall does the request look correct?

Copy link
Member

@jgowdyelastic jgowdyelastic Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the query will not get merged, it will overwrite the query in each datafeed in the module.

here’s a suggestion for the setup request.
i’ve moved the custom settings into the jobOverrides and added a job_tags property. l’m not sold on that name, so better suggestions are welcome.
i’ve also removed the groups override because it’s not needed, all APM modules already have this group set.

POST /api/ml/modules/setup/apm_transaction
{
  // tagging the job with service.environment
  "jobOverrides": {
    "custom_settings": {
      "job_tags": {
        "service.environment": "production"
       }
     }
  },
  // specifying the indicies to query
  "indexPatternName": "apm-*-transaction-*",
  // create job and start immediately
  "startDatafeed": true,
  // limit job to specific environment
  "query": {
    "bool": {
      "filter": [
        { "term": { "processor.event": "transaction" } },
        { "exists": { "field": "transaction.duration.us" } }
        { "term": { "service.environment": "production" } }]
    }
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jgowdyelastic !

Is jobOverrides requried? I thought this would do:

POST /api/ml/modules/setup/apm_transaction
{
  "custom_settings": {
    "job_tags": {
      "service.environment": "production"
     }
   }
  //...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, with it you can override any part of the job.

Copy link
Member

@sorenlouv sorenlouv Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so do we then also need it for the other attributes like query?

{
  "jobOverrides": {
    "query": {
      "bool": {
        "filter": [
          { "term": { "processor.event": "transaction" } },
          { "exists": { "field": "transaction.duration.us" } },
          { "term": { "service.environment": "production" } }
        ]
      }
    }
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, only for custom_settings.
jobOverrides and datafeedOverrides can be thought of as additional or advanced overrides for the job.
#42946

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - that sounds a bit inconsistent to me. Is the intention to align this? Is it documented somewhere which fields need to be wrapped in jobOverrides and which don't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query and groups are the only overrides which could potentially be moved into jobOverrides' and 'datafeedOverrides. For the sake of backwards compatibility I'm happy for them to stay where there are, unless others agree that they are confusing and should be moved.

]
}
}
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"job_type": "anomaly_detector",
"groups": [
"apm"
],
"description": "Detect transaction duration anomalies across transaction types for your APM services",
"analysis_config": {
"bucket_span": "15m",
"detectors": [
{
"detector_description": "high_mean('transaction.duration.us') by 'transaction.type' partitionfield='service.name'",
peteharverson marked this conversation as resolved.
Show resolved Hide resolved
"function": "high_mean",
"field_name": "transaction.duration.us",
"by_field_name": "transaction.type",
"partition_field_name": "service.name"
}
],
"influencers": [
"transaction.type",
"service.name"
]
},
"analysis_limits": {
"model_memory_limit": "32mb"
},
"data_description": {
"time_field": "@timestamp"
},
"model_plot_config": {
"enabled": true
},
"custom_settings": {
"created_by": "ml-module-apm-transaction"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ export default ({ getService }: FtrProviderContext) => {
responseCode: 200,
jobs: [
{
jobId: 'pf5_high_mean_response_time',
jobId: 'pf5_high_mean_transaction_duration',
jobState: JOB_STATE.CLOSED,
datafeedState: DATAFEED_STATE.STOPPED,
modelMemoryLimit: '11mb',
Expand Down