[ML] Fix race condition when creating multiple jobs #40049

droberts195 · 2019-03-14T14:52:18Z

If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785

If multiple jobs are created together and the anomaly results index does not exist then some of the jobs could fail to update the mappings of the results index. This lead them to fail to write their results correctly later. This change fixes the problem by updating the mappings of the results index if it is found to exist during a creation attempt.

elasticmachine · 2019-03-14T14:52:20Z

Pinging @elastic/ml-core

benwtrent · 2019-03-14T15:22:35Z

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/persistence/JobResultsProvider.java

+                                            ImmutableOpenMap<String, MappingMetaData> indexMappings =
+                                                response.getMappings().iterator().next().value;
+                                            MappingMetaData typeMappings = indexMappings.iterator().next().value;
+                                            addTermsAndAliases(typeMappings, indexName, termFields, createAliasListener);


It seems to me that a race condition around mappings still exists, though it is an equally rare one.

3 calls to create the results index

It gets created with by one of the calls (with those term mappings)

The other two fail and fall into this clause, pull that single job's mapping, update it with their own fields, and put the whole mapping again

The first of the two "late" jobs would lose its terms due to the second of the "late" jobs replacing the mapping on with PutMappingRequest

The first of the two "late" jobs would lose its terms due to the second of the "late" jobs replacing the mapping on with PutMappingRequest

No, because if you put mappings they always add to the current mappings, they don't replace them. This is how Elasticsearch mappings have always worked and is why we get away with only putting mappings for the new job terms. We don't put our core result field mappings except when the index is initially created and they don't get wiped out.

You are correct there is still a race condition, but it's on the number of fields in the mappings. If you create 3 jobs simultaneously each with 400 new term fields then the one that creates the index will add the first 400. The other two will see that their 400 plus the 400 already in the index is 800, which is OK, so will both try to put mappings. And whichever of these requests arrives second will fail because then the index would have too many field mappings: 1200 plus however many core fields we have. However, this has always been a bug as it could also happen in the code path where the index already exists. The end result of this is that you'd get an unfriendly error message about mapping limits instead of a friendly one telling you you need to use a custom results index for the job, so it's not that bad considering the tiny chance of it happening.

@droberts195 I read

// Put the whole mapping, not just the term fields, otherwise we'll wipe the _meta section of the mapping try (XContentBuilder termFieldsMapping = ElasticsearchMappings.resultsMapping(mappingType, termFields)) {

incorrectly. I keep forgetting that an UPDATE to a mapping is done via a PUT and not a POST...

The number of fields is still a race condition, but only when it comes to failing early, which is acceptable.

droberts195 · 2019-03-14T21:30:50Z

Jenkins run elasticsearch-ci/packaging-sample

If multiple jobs are created together and the anomaly results index does not exist then some of the jobs could fail to update the mappings of the results index. This lead them to fail to write their results correctly later. Although this scenario sounds rare, it is exactly what happens if the user creates their first jobs using the Nginx module in the ML UI. This change fixes the problem by updating the mappings of the results index if it is found to exist during a creation attempt. Fixes #38785

David Roberts added 2 commits March 14, 2019 12:56

Add a test

341ca20

droberts195 added >bug v7.0.0 :ml Machine learning v6.7.0 v8.0.0 v7.2.0 labels Mar 14, 2019

benwtrent reviewed Mar 14, 2019

View reviewed changes

benwtrent approved these changes Mar 14, 2019

View reviewed changes

Merge branch 'master' into fix_multi_job_create_race

d9e36c7

droberts195 merged commit be7ee7d into elastic:master Mar 15, 2019

droberts195 deleted the fix_multi_job_create_race branch March 15, 2019 09:25

michaelbaamonde added v7.0.0-rc1 and removed v7.0.0 labels Mar 25, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Fix race condition when creating multiple jobs #40049

[ML] Fix race condition when creating multiple jobs #40049

Uh oh!

droberts195 commented Mar 14, 2019

Uh oh!

elasticmachine commented Mar 14, 2019

Uh oh!

benwtrent Mar 14, 2019

Uh oh!

droberts195 Mar 14, 2019

Uh oh!

benwtrent Mar 14, 2019

Uh oh!

droberts195 commented Mar 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ML] Fix race condition when creating multiple jobs #40049

[ML] Fix race condition when creating multiple jobs #40049

Uh oh!

Conversation

droberts195 commented Mar 14, 2019

Uh oh!

elasticmachine commented Mar 14, 2019

Uh oh!

benwtrent Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

droberts195 Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

benwtrent Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

droberts195 commented Mar 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants