[ML] Update index mappings on process start, not job open #37607

droberts195 · 2019-01-18T13:18:08Z

At present we update index mappings for the state and results indices in TransportOpenJobAction. This dates back to the 5.4 to 5.5 upgrade, when we knew 5.4 jobs would not run in 5.5 because there were special checks to prevent it.

Unfortunately doing the index mapping upgrades when opening a job is not sufficient to ensure the mappings are correct by the time documents requiring the mappings are indexed. Documents can be indexed before the mappings are correct when a rolling upgrade is done with ML jobs open. These cause dynamic mappings to be created, and then when a subsequent job open is called (possibly for a different job) an error results because mappings cannot be updated.

The solution is to update the mappings on process open, not on job open. This is similar to the change made in e194d8e on #37483. (Thankfully with that one we noticed the problem in the initial review phase.)

Although the problem has existed since 5.6, version 6.5 is more likely to suffer from it because (a) the validation for enabled=false has been tightened up in #33933 and (b) in 6.5 we introduced the multi_bucket_impact field with mapping type double.

The only workaround to recover from dynamic mappings that clash with the desired mappings is to reindex the affected index while preserving all aliases, and this is hard. Therefore we should fix this as a priority for 6.6.1.

The fix will only stop the mappings inconsistency being created in the future. It will not help anyone who has already suffered from mappings inconsistency. I will paste the steps to recover by reindexing into this issue once they are validated.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-01-18T13:18:10Z

Pinging @elastic/ml-core

This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes elastic#37607

This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes #37607

This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes elastic#37607

This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes #37607

smalenfant · 2019-02-12T19:52:32Z

I would love to see the manual steps if possible. I have a few ML jobs stuck in close due to this issue.

droberts195 · 2019-02-12T21:08:11Z

Here are the steps needed to fix the issue (as run from the dev tools console):

Stop all datafeeds and close all jobs and tell other users not to open any more until this process is complete. It is probably good to note which jobs were open in order to reopen them at the end of this process.

POST _xpack/ml/datafeeds/*/_stop
POST _xpack/ml/anomaly_detectors/*/_close

Get aliases of ML results indices

GET .ml-anomalies-*/_alias

The response should be stored in a file (let's call it ml_results_alias_response.txt)

Reindex all ML results indices into a temporary index. This has to be done for each index starting with .ml-anomalies-.

POST _reindex
{
  "source": {
    "index": "{index_name}"
  },
  "dest": {
    "index": "tmp-{index_name}"
  }
}

Delete original indices. This has to be done for each index starting with .ml-anomalies-.

DELETE {index_name}

Reindex temporary indices back to their original names.

POST _reindex
{
  "source": {
    "index": "tmp-{index_name}"
  },
  "dest": {
    "index": "{index_name}"
  }
}

Restore aliases. This script generates the necessary body to the post aliases request. It is a bash script but it does depend on jq to be available. It can be used with the file from step 2.

./gen_post_aliases_body.sh < ml_results_alias_response.txt

Then copy the output and use it as the body in the following request:

POST _aliases
{BODY}

Delete the temporary indices created
Reopen any jobs that were closed in step 1.

This process will ensure the results indices have the correct mappings after the upgrade to 6.5.x.

smalenfant · 2019-02-12T23:23:37Z

@droberts195 Thanks for this info. 1 thing missing is the script, looks like I don't have access to that repo (404).

droberts195 · 2019-02-13T09:21:02Z

Sorry about that @smalenfant. I edited the big comment above to make the script an attachment of this issue. I had to rename it with a .txt extension to do this, so after downloading rename the file to remove the .txt extension and chmod +x it.

smalenfant · 2019-02-22T22:18:06Z

@droberts195 We didn't have time to go through the process of re-indexing although we did upgrade to 6.6.1. The mapping problem went away, although my jobs can't seem to open at all now. Might be a totally different issue.

POST _xpack/ml/anomaly_detectors/ttms/_open
{
 "statusCode": 504,
 "error": "Gateway Time-out",
 "message": "Client request timeout"
}

droberts195 · 2019-02-25T09:32:08Z

@smalenfant like you say, your new problem could be something completely different. If you have a support contract please open a support case for it. Then our support team can lead you through the process of gathering enough information to diagnose what's wrong. If you don't have a support contract please ask on the Discuss forum. Tag your question with the machine-learning tag so we don't miss it.

droberts195 added >bug :ml Machine learning labels Jan 18, 2019

droberts195 self-assigned this Jan 18, 2019

droberts195 mentioned this issue Jan 22, 2019

[ML] Update ML results mappings on process start #37706

Merged

droberts195 closed this as completed in #37706 Jan 23, 2019

droberts195 mentioned this issue Aug 14, 2020

[ML] Disable dynamic mappings for all ML indices #61083

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Update index mappings on process start, not job open #37607

[ML] Update index mappings on process start, not job open #37607

droberts195 commented Jan 18, 2019

elasticmachine commented Jan 18, 2019

smalenfant commented Feb 12, 2019

droberts195 commented Feb 12, 2019 •

edited

Loading

smalenfant commented Feb 12, 2019

droberts195 commented Feb 13, 2019

smalenfant commented Feb 22, 2019

droberts195 commented Feb 25, 2019

[ML] Update index mappings on process start, not job open #37607

[ML] Update index mappings on process start, not job open #37607

Comments

droberts195 commented Jan 18, 2019

elasticmachine commented Jan 18, 2019

smalenfant commented Feb 12, 2019

droberts195 commented Feb 12, 2019 • edited Loading

smalenfant commented Feb 12, 2019

droberts195 commented Feb 13, 2019

smalenfant commented Feb 22, 2019

droberts195 commented Feb 25, 2019

droberts195 commented Feb 12, 2019 •

edited

Loading