Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds apm instrumentation to task manager's task runner #55356

Merged
merged 2 commits into from
Feb 11, 2020

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented Jan 20, 2020

Summary

This adds APM support to task manager by instrumenting task manager's task runner with apm transactions.

To use in Kibana with yarn start, create a file config/apm.dev.js with the following contents:

module.exports = {
  active: true,
  secretToken: '<token>',
  serverUrl: '<url>',
  // transactionSampleRate: 1.0,
  // metricsInterval: '5s',
};

You'll then need an APM server running. For running Kibana off master, I was able to run a production cloud 7.5.1 deployment, with apm enabled (the default), and use the token and url provided by the deployment. Start your local Kibana via yarn start, and you'll start seeing numbers in your APM instance.

Here's an example of what a pretty simple instrumentation (the first commit) can provide:

image

@pmuellr pmuellr added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes labels Jan 20, 2020
const trans = apm.startTransaction(
`taskManager run ${this.instance.taskType}`,
'taskManager',
this.instance.taskType,
Copy link
Member Author

@pmuellr pmuellr Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and type (first two parms ^^^) of apm.startTransaction() seem useful, in terms of how the UI ends up making use of them. The subtype and action (last two parms) I'm less sure of, as I'm not sure they come up directly in the UI. Since these end up being other plugins tasks running, seems unlikely we can consistently get more specific per-task info. And alerting and actions already have alertType and actionType specific task types, which makes for nice grouping.

Thinking of just removing the last two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

return this.processResult(validatedResult);
} catch (err) {
this.logger.error(`Task ${this} failed: ${err}`);
// in error scenario, we can not get the RunResult
// re-use modifiedContext's state, which is correct as of beforeRun
if (trans) trans.end();
Copy link
Member Author

@pmuellr pmuellr Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to use trans.end('success') and one for failure as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@@ -252,6 +253,7 @@ export class TaskStore {
)
);

const trans = apm.startTransaction(`taskManager markAvailableTasksAsClaimed`, 'taskManager');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this call, we should look into using transaction.setLabel() to add the queue size. Hopefully it will show up somewhere in the UI, but ideally I'd like some sort of specialzed ui for this anyway, so while it may not be immediately useful, at least it's getting written somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label values must be strings, so not as useful as I thought

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we'd need to actually get the queue size, but I wouldn't want the cost of getting that to be included here, so we'll have to figure out some other story for the queue size ...

Adds some apm transaction boundaries for parts of task manager, so that
they will show up in APM as new types of transactions.  Should provide
some visibility into the ES calls made by task manager for alerting and
actions, especially under stress testing scenarios.
@pmuellr pmuellr force-pushed the taskManager/apm-ize branch from bd09593 to 76451dd Compare February 11, 2020 13:57
@pmuellr pmuellr added v7.7.0 v8.0.0 Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 11, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr pmuellr marked this pull request as ready for review February 11, 2020 13:59
@pmuellr pmuellr requested a review from a team as a code owner February 11, 2020 13:59
@dgieselaar dgieselaar requested a review from a team February 11, 2020 14:12
Copy link
Member

@sorenlouv sorenlouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super exciting to see!

Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmuellr
Copy link
Member Author

pmuellr commented Feb 11, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💛 Build succeeded, but was flaky


Test Failures

Kibana Pipeline / kibana-xpack-agent / Chrome X-Pack UI Functional Tests.x-pack/test/functional/apps/advanced_settings/feature_controls/advanced_settings_spaces·ts.Advanced Settings spaces feature controls space with Advanced Settings disabled redirects to management home

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

[00:00:00]       │
[00:00:00]         └-: Advanced Settings
[00:00:00]           └-> "before all" hook
[00:03:03]           └-: spaces feature controls
[00:03:03]             └-> "before all" hook
[00:03:03]             └-> "before all" hook
[00:03:03]               │ info [logstash_functional] Loading "mappings.json"
[00:03:03]               │ info [logstash_functional] Loading "data.json.gz"
[00:03:04]               │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [logstash-2015.09.22] creating index, cause [api], templates [], shards [1]/[0], mappings [_doc]
[00:03:04]               │ info [o.e.c.r.a.AllocationService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2015.09.22][0]]]).
[00:03:04]               │ info [logstash_functional] Created index "logstash-2015.09.22"
[00:03:04]               │ debg [logstash_functional] "logstash-2015.09.22" settings {"index":{"analysis":{"analyzer":{"url":{"max_token_length":"1000","tokenizer":"uax_url_email","type":"standard"}}},"number_of_replicas":"0","number_of_shards":"1"}}
[00:03:04]               │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [logstash-2015.09.20] creating index, cause [api], templates [], shards [1]/[0], mappings [_doc]
[00:03:04]               │ info [o.e.c.r.a.AllocationService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2015.09.20][0]]]).
[00:03:04]               │ info [logstash_functional] Created index "logstash-2015.09.20"
[00:03:04]               │ debg [logstash_functional] "logstash-2015.09.20" settings {"index":{"analysis":{"analyzer":{"url":{"max_token_length":"1000","tokenizer":"uax_url_email","type":"standard"}}},"number_of_replicas":"0","number_of_shards":"1"}}
[00:03:04]               │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [logstash-2015.09.21] creating index, cause [api], templates [], shards [1]/[0], mappings [_doc]
[00:03:04]               │ info [o.e.c.r.a.AllocationService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2015.09.21][0]]]).
[00:03:04]               │ info [logstash_functional] Created index "logstash-2015.09.21"
[00:03:04]               │ debg [logstash_functional] "logstash-2015.09.21" settings {"index":{"analysis":{"analyzer":{"url":{"max_token_length":"1000","tokenizer":"uax_url_email","type":"standard"}}},"number_of_replicas":"0","number_of_shards":"1"}}
[00:03:13]               │ info progress: 7522
[00:03:20]               │ info [logstash_functional] Indexed 4633 docs into "logstash-2015.09.22"
[00:03:20]               │ info [logstash_functional] Indexed 4757 docs into "logstash-2015.09.20"
[00:03:20]               │ info [logstash_functional] Indexed 4614 docs into "logstash-2015.09.21"
[00:04:24]             └-: space with Advanced Settings disabled
[00:04:24]               └-> "before all" hook
[00:04:24]               └-> "before all" hook
[00:04:24]                 │ info [empty_kibana] Loading "mappings.json"
[00:04:24]                 │ info [empty_kibana] Loading "data.json.gz"
[00:04:24]                 │ info [o.e.c.m.MetaDataDeleteIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_1/Jlpc8ny3TfCk48ViE8n4Ww] deleting index
[00:04:24]                 │ info [o.e.c.m.MetaDataDeleteIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_2/JoBT2WN_Sui36PR3T_zCGQ] deleting index
[00:04:24]                 │ info [empty_kibana] Deleted existing index [".kibana_2",".kibana_1"]
[00:04:24]                 │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [_doc]
[00:04:24]                 │ info [empty_kibana] Created index ".kibana"
[00:04:24]                 │ debg [empty_kibana] ".kibana" settings {"index":{"number_of_replicas":"1","number_of_shards":"1"}}
[00:04:24]                 │ info [empty_kibana] Indexed 2 docs into ".kibana"
[00:04:26]                 │ info Creating index .kibana_2.
[00:04:26]                 │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_2] creating index, cause [api], templates [], shards [1]/[1], mappings [_doc]
[00:04:26]                 │ info [o.e.c.r.a.AllocationService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] updating number_of_replicas to [0] for indices [.kibana_2]
[00:04:26]                 │ info Reindexing .kibana to .kibana_1
[00:04:26]                 │ info [o.e.c.m.MetaDataCreateIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_1] creating index, cause [api], templates [], shards [1]/[1], mappings [_doc]
[00:04:26]                 │ info [o.e.c.r.a.AllocationService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] updating number_of_replicas to [0] for indices [.kibana_1]
[00:04:26]                 │ info [o.e.t.LoggingTaskListener] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] 5343 finished with response BulkByScrollResponse[took=31.9ms,timed_out=false,sliceId=null,updated=0,created=2,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[00:04:26]                 │ info [o.e.c.m.MetaDataDeleteIndexService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana/5E7N9jCqSFeA1qQxS-wLWw] deleting index
[00:04:26]                 │ info Migrating .kibana_1 saved objects to .kibana_2
[00:04:26]                 │ debg Migrating saved objects config:6.0.0-alpha1, space:default
[00:04:26]                 │ info [o.e.c.m.MetaDataMappingService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_2/VrBXepTcQ1yfTuU7dQVQ-A] update_mapping [_doc]
[00:04:26]                 │ info [o.e.c.m.MetaDataMappingService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_2/VrBXepTcQ1yfTuU7dQVQ-A] update_mapping [_doc]
[00:04:26]                 │ info Pointing alias .kibana to .kibana_2.
[00:04:26]                 │ info Finished in 631ms.
[00:04:26]                 │ debg applying update to kibana config: {"accessibility:disableAnimations":true,"dateFormat:tz":"UTC"}
[00:04:27]                 │ info [o.e.c.m.MetaDataMappingService] [kibana-ci-immutable-oraclelinux-tests-xl-1581457584437083876] [.kibana_2/VrBXepTcQ1yfTuU7dQVQ-A] update_mapping [_doc]
[00:04:28]                 │ debg creating space
[00:04:29]                 │ debg created space
[00:04:29]               └-> redirects to management home
[00:04:29]                 └-> "before each" hook: global before each
[00:04:29]                 │ debg navigateToActualUrl http://localhost:6131/s/custom_space/app/kibana#management/kibana/settings
[00:04:30]                 │ debg browser[INFO] http://localhost:6131/s/custom_space/app/kibana?_t=1581459755373#management/kibana/settings 350 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'self'". Either the 'unsafe-inline' keyword, a hash ('sha256-P5polb1UreUSOe5V/Pv7tc+yeZuJXiOi/3fqhGsU7BE='), or a nonce ('nonce-...') is required to enable inline execution.
[00:04:30]                 │
[00:04:30]                 │ debg browser[INFO] http://localhost:6131/s/custom_space/bundles/app/kibana/bootstrap.js 8:19 "^ A single error about an inline script not firing due to content security policy is expected!"
[00:04:30]                 │ debg TestSubjects.exists(managementHome)
[00:04:30]                 │ debg Find.existsByDisplayedByCssSelector('[data-test-subj="managementHome"]') with timeout=10000
[00:04:33]                 │ debg --- retry.tryForTime error: [data-test-subj="managementHome"] is not displayed
[00:04:37]                 │ debg browser[INFO] http://localhost:6131/built_assets/dlls/vendors_3.bundle.dll.js 181:139970 "INFO: 2020-02-11T22:22:40Z
[00:04:37]                 │        Adding connection to http://localhost:6131/s/custom_space/elasticsearch
[00:04:37]                 │
[00:04:37]                 │      "
[00:04:37]                 │ debg browser[INFO] http://localhost:6131/s/custom_space/app/kibana#/management 350 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'self'". Either the 'unsafe-inline' keyword, a hash ('sha256-P5polb1UreUSOe5V/Pv7tc+yeZuJXiOi/3fqhGsU7BE='), or a nonce ('nonce-...') is required to enable inline execution.
[00:04:37]                 │
[00:04:37]                 │ debg browser[INFO] http://localhost:6131/s/custom_space/bundles/app/kibana/bootstrap.js 8:19 "^ A single error about an inline script not firing due to content security policy is expected!"
[00:04:37]                 │ debg --- retry.tryForTime failed again with the same message...
[00:04:40]                 │ debg --- retry.tryForTime failed again with the same message...
[00:04:41]                 │ info Taking screenshot "/dev/shm/workspace/kibana/x-pack/test/functional/screenshots/failure/Advanced Settings spaces feature controls space with Advanced Settings disabled redirects to management home.png"
[00:04:41]                 │ debg browser[INFO] http://localhost:6131/built_assets/dlls/vendors_3.bundle.dll.js 181:139970 "INFO: 2020-02-11T22:22:47Z
[00:04:41]                 │        Adding connection to http://localhost:6131/s/custom_space/elasticsearch
[00:04:41]                 │
[00:04:41]                 │      "
[00:04:44]                 │ info Current URL is: http://localhost:6131/s/custom_space/app/kibana#/management?_g=()
[00:04:44]                 │ info Saving page source to: /dev/shm/workspace/kibana/x-pack/test/functional/failure_debug/html/Advanced Settings spaces feature controls space with Advanced Settings disabled redirects to management home.html
[00:04:44]                 └- ✖ fail: "Advanced Settings spaces feature controls space with Advanced Settings disabled redirects to management home"
[00:04:44]                 │

Stack Trace

Error: expected testSubject(managementHome) to exist
    at TestSubjects.existOrFail (/dev/shm/workspace/kibana/test/functional/services/test_subjects.ts:60:15)

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr merged commit f304f88 into elastic:master Feb 11, 2020
pmuellr added a commit to pmuellr/kibana that referenced this pull request Feb 11, 2020
Adds some apm transaction boundaries for parts of task manager, so that
they will show up in APM as new types of transactions.  Should provide
some visibility into the ES calls made by task manager for alerting and
actions, especially under stress testing scenarios.
pmuellr added a commit that referenced this pull request Feb 12, 2020
Adds some apm transaction boundaries for parts of task manager, so that
they will show up in APM as new types of transactions.  Should provide
some visibility into the ES calls made by task manager for alerting and
actions, especially under stress testing scenarios.
gmmorris added a commit to gmmorris/kibana that referenced this pull request Feb 12, 2020
* master:
  [Canvas] Move sample data and feature registration to canvas np plugin (elastic#56564)
  instrument task manager with apm transactions (elastic#55356)
  displays Alert Instance state on Alert Details page (elastic#56842)
  Adding the Accessibility Statement to docs (elastic#57153)
  [Uptime] Remove redundant adapter function (elastic#56980)
  [SIEM][Detection Engine] Backend end-to-end tests
  [Uptime] Added tests for pages (elastic#56736)
  Updating to kind-of@6.0.3 (elastic#57367)
@mikecote mikecote added release_note:enhancement and removed release_note:skip Skip the PR/issue when compiling release notes labels Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Feature:Alerting release_note:enhancement Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants