Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Incorrect mappings on .ml-config index after upgrading to 7.9.0 #61157

Closed
droberts195 opened this issue Aug 14, 2020 · 5 comments · Fixed by #61064
Closed

[ML] Incorrect mappings on .ml-config index after upgrading to 7.9.0 #61157

droberts195 opened this issue Aug 14, 2020 · 5 comments · Fixed by #61064
Labels
>bug :ml Machine learning

Comments

@droberts195
Copy link
Contributor

After upgrading to 7.9.0 it is possible for the .ml-config index to end up with incorrect mappings. This is the exception that is seen when this happens:

{
  "reason" : "mapper [model_plot_config.annotations_enabled] cannot be changed from type [keyword] to [boolean]",
  "type" : "illegal_argument_exception",
  "root_cause" : [ {
    "reason" : "mapper [model_plot_config.annotations_enabled] cannot be changed from type [keyword] to [boolean]",
    "type" : "illegal_argument_exception"
  } ]
}

In the UI it looks like this:

image

The mappings are supposed to be upgraded automatically when the cluster is upgraded, but there is a loophole that means this doesn't always happen.

  • If after upgrading you open an anomaly detection job before creating or updating a job then you suffer the problem
  • If after upgrading you create or update a job before opening an anomaly detection job then you don't suffer the problem

The second scenario can also happen programmatically, because when a job persists model state this causes an update to set the model snapshot ID on the job config.

So basically if you upgrade your cluster with ML jobs running and leave them running for 3-4 hours after upgrade then the .ml-config index mappings get upgraded as required.

@droberts195 droberts195 added >bug :ml Machine learning labels Aug 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor Author

droberts195 commented Aug 14, 2020

One way to insure against suffering this problem at all is to manually adjust the mappings that are required in 7.9.0 before upgrading to 7.9.0 (it doesn't hurt to apply them to an earlier version):

PUT .ml-config/_mapping
{
  "properties": {
    "analysis_config": {
      "properties": {
        "per_partition_categorization" : {
          "properties" : {
            "enabled" : {
              "type" : "boolean"
            },
            "stop_on_warn" : {
              "type" : "boolean"
            }
          }
        }
      }
    },
    "max_num_threads" : {
      "type" : "integer"
    },
    "model_plot_config" : {
      "properties" : {
        "annotations_enabled" : {
          "type" : "boolean"
        }
      }
    }
  }
}

You'll need to be a superuser to do that - no other built in role lets you modify .ml-config.

However, even this may fail, as the underlying problem has existed for a long time. When upgrading to 7.8 it's possible that dynamic mappings were created that weren't catastrophically bad, but weren't what's in the ML template. After that it will be impossible to ever apply the mappings from the ML template. The only solution is reindexing, which I will describe in the next comment.

@droberts195
Copy link
Contributor Author

If you end up with a .ml-config index that has the wrong mappings and is causing failures, the recovery process is to reindex. However, because the ML code currently only works with a single concrete .ml-config index you need to reindex twice, once to a temporary index and then back to a recreated .ml-config index:

  1. Enable ML upgrade mode
  2. Create a new index, say temp_ml_config - it doesn't matter about settings or mappings for this index - defaults are fine
  3. Reindex .ml-config into temp_ml_config
  4. Delete the .ml-config index
  5. Recreate the .ml-config index with the setting auto_expand_replicas: [0-1] - the mappings will get picked up from the latest ML template, so don't set any mappings in the creation request
  6. Reindex temp_ml_config into .ml-config
  7. Disable ML upgrade mode
  8. Delete the temp_ml_config index

You'll need to be a superuser to do this - no other built in role lets you modify .ml-config.

@wwang500
Copy link

If you end up with a .ml-config index that has the wrong mappings and is causing failures, the recovery process is to reindex. However, because the ML code currently only works with a single concrete .ml-config index you need to reindex twice, once to a temporary index and then back to a recreated .ml-config index:

  1. Enable ML upgrade mode
  2. Create a new index, say temp_ml_config - it doesn't matter about settings or mappings for this index - defaults are fine
  3. Reindex .ml-config into temp_ml_config
  4. Delete the .ml-config index
  5. Recreate the .ml-config index with the setting auto_expand_replicas: [0-1] - the mappings will get picked up from the latest ML template, so don't set any mappings in the creation request
  6. Reindex temp_ml_config into .ml-config
  7. Disable ML upgrade mode
  8. Delete the temp_ml_config index

You'll need to be a superuser to do this - no other built in role lets you modify .ml-config.

Here are the matching commands that can be executed in dev console:

#step 1
POST _ml/set_upgrade_mode?enabled=true&timeout=10m

#step 2
PUT temp_ml_config

#step 3
POST _reindex
{
  "source": { "index": ".ml-config" }, 
  "dest": { "index": "temp_ml_config" }
}

#step 4
DELETE .ml-config

#step 5
PUT .ml-config
{
  "settings": { "auto_expand_replicas": "0-1"}
}

#step 6
POST _reindex
{
  "source": { "index": "temp_ml_config" }, 
  "dest": { "index": ".ml-config" }
}

#step 7
POST _ml/set_upgrade_mode?enabled=false&timeout=10m

#step 8
DELETE temp_ml_config

@droberts195
Copy link
Contributor Author

The workarounds for 7.9.0 are now publicly documented in https://www.elastic.co/guide/en/machine-learning/7.9/ml-troubleshooting.html#ml-troubleshooting-mappings and anyone who skips 7.9.0 and upgrades to 7.9.1 or above should not suffer the issue due to #61064.

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Aug 19, 2020
The ML mappings upgrade test had become useless as it was
checking a field that has been the same since 6.5. This
commit switches to a field that was changed in 7.9.

Additionally, the test only used to check the results index
mappings.  This commit also adds checking for the config
index.

Relates elastic#61157
droberts195 added a commit that referenced this issue Sep 2, 2020
The ML mappings upgrade test had become useless as it was
checking a field that has been the same since 6.5. This
commit switches to a field that was changed in 7.9.

Additionally, the test only used to check the results index
mappings.  This commit also adds checking for the config
index.

Relates #61157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
3 participants