[ML] Deleting open job's results index causes recreation with write alias name #57645

droberts195 · 2020-06-04T08:14:54Z

Deleting the results index of an anomaly detection job while it is running is not expected or supported. However, if it is done then this is what happens:

Because the results writes are being made via the write alias, a concrete index named after the write alias is auto-created
Because reads of results are done via the read alias, subsequent reads of results for the job find no results

So, if a job appears to be running but no results for it exist then it is worth checking if deletion of the results index is the cause. The simplest place to look is the output from _cat/indices - the concrete index for the job named after the write alias will look strange compared to the other ML indices listed in the output.

The next question is whether we could do anything to fail fast in this situation.

One idea that has been previously suggested is to add a ?alias_required argument to index requests that would fail the request if the write was not being made via an alias. This would be very helpful for ML.

Another possibility that doesn't require any core changes is that we could check the responses immediately after indexing anomaly results. The response to an index or bulk request says which index it was indexed into, even if this was specified as an alias in the request. Since we know we are supposed to be indexing via an alias we could fail the job if the index contained in the response is identical to the name we supplied in the index or bulk request.

Another possibility would be to intercept delete index requests using a filter client and reject any for indices with names beginning .ml that didn't come from the _xpack user. This could however be dangerous, as it would make it hard to recover from corruption caused by future bugs. So this is probably the least desirable of the 3 options.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-04T08:14:57Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2020-06-09T12:18:42Z

I just realised this is hugely related to #55267, which suggests an alternative solution.

droberts195 · 2020-07-17T12:54:32Z

#58917 provides the building block to fix this problem. Now we can move on to add the require_alias=true flag to all our state and results writes.

droberts195 · 2020-08-04T09:55:20Z

Fixed by #60315

droberts195 added the :ml Machine learning label Jun 4, 2020

hendrikmuhs mentioned this issue Jun 24, 2020

[ML] Job fail to start with "Invalid alias name [.ml-state-write] ..." #58482

Closed

droberts195 closed this as completed Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Deleting open job's results index causes recreation with write alias name #57645

[ML] Deleting open job's results index causes recreation with write alias name #57645

droberts195 commented Jun 4, 2020

elasticmachine commented Jun 4, 2020

droberts195 commented Jun 9, 2020 •

edited

Loading

droberts195 commented Jul 17, 2020

droberts195 commented Aug 4, 2020

[ML] Deleting open job's results index causes recreation with write alias name #57645

[ML] Deleting open job's results index causes recreation with write alias name #57645

Comments

droberts195 commented Jun 4, 2020

elasticmachine commented Jun 4, 2020

droberts195 commented Jun 9, 2020 • edited Loading

droberts195 commented Jul 17, 2020

droberts195 commented Aug 4, 2020

droberts195 commented Jun 9, 2020 •

edited

Loading