forked from elastic/stack-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds page about recovering from a failed job to anomaly detection docs (
elastic#1667) (elastic#1673) Co-authored-by: Lisa Cawley <lcawley@elastic.co>
- Loading branch information
1 parent
e66885c
commit efcbae2
Showing
3 changed files
with
50 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
docs/en/stack/ml/anomaly-detection/ml-restart-failed-jobs.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
[role="xpack"] | ||
[[ml-restart-failed-jobs]] | ||
= Restart failed {anomaly-jobs} | ||
|
||
If an {anomaly-job} fails, try to restart the job by following the procedure | ||
described below. If the restarted job runs as expected, then the problem that | ||
caused the job to fail was transient and no further investigation is needed. If | ||
the job quickly fails after the restart, then the problem is persistent and | ||
needs further investigation. In this case, find out which node the failed job | ||
was running on by checking the job stats on the **Job management** pane in | ||
{kib}. Then get the logs for that node and look for exceptions and errors where | ||
the ID of the {anomaly-job} is in the message to have a better understanding of | ||
the issue. | ||
|
||
If an {anomaly-job} has failed, do the following to recover from `failed` state: | ||
|
||
. _Force_ stop the corresponding {dfeed} by using the | ||
{ref}/ml-stop-datafeed.html[Stop {dfeed} API] with the `force` parameter being | ||
`true`. For example, the following request force stops the `my_datafeed` | ||
{dfeed}. | ||
+ | ||
-- | ||
[source,console] | ||
-------------------------------------------------- | ||
POST _ml/datafeeds/my_datafeed/_stop | ||
{ | ||
"force": "true" | ||
} | ||
-------------------------------------------------- | ||
// TEST[skip] | ||
-- | ||
|
||
. _Force_ close the {anomaly-job} by using the | ||
{ref}/ml-close-job.html[Close {anomaly-job} API] with the `force` parameter | ||
being `true`. For example, the following request force closes the `my_job` | ||
{anomaly-job}: | ||
+ | ||
-- | ||
[source,console] | ||
-------------------------------------------------- | ||
POST _ml/anomaly_detectors/my_job/_close?force=true | ||
-------------------------------------------------- | ||
// TEST[skip] | ||
-- | ||
|
||
. Restart the {anomaly-job} on the **Job management** pane in {kib}. | ||
|