Skip to content

Conversation

@droberts195
Copy link

If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785

David Roberts added 2 commits March 14, 2019 12:56
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index.  This
lead them to fail to write their results correctly later.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

ImmutableOpenMap<String, MappingMetaData> indexMappings =
response.getMappings().iterator().next().value;
MappingMetaData typeMappings = indexMappings.iterator().next().value;
addTermsAndAliases(typeMappings, indexName, termFields, createAliasListener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that a race condition around mappings still exists, though it is an equally rare one.

  • 3 calls to create the results index
  • It gets created with by one of the calls (with those term mappings)
  • The other two fail and fall into this clause, pull that single job's mapping, update it with their own fields, and put the whole mapping again
  • The first of the two "late" jobs would lose its terms due to the second of the "late" jobs replacing the mapping on with PutMappingRequest

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The first of the two "late" jobs would lose its terms due to the second of the "late" jobs replacing the mapping on with PutMappingRequest

No, because if you put mappings they always add to the current mappings, they don't replace them. This is how Elasticsearch mappings have always worked and is why we get away with only putting mappings for the new job terms. We don't put our core result field mappings except when the index is initially created and they don't get wiped out.

You are correct there is still a race condition, but it's on the number of fields in the mappings. If you create 3 jobs simultaneously each with 400 new term fields then the one that creates the index will add the first 400. The other two will see that their 400 plus the 400 already in the index is 800, which is OK, so will both try to put mappings. And whichever of these requests arrives second will fail because then the index would have too many field mappings: 1200 plus however many core fields we have. However, this has always been a bug as it could also happen in the code path where the index already exists. The end result of this is that you'd get an unfriendly error message about mapping limits instead of a friendly one telling you you need to use a custom results index for the job, so it's not that bad considering the tiny chance of it happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 I read

// Put the whole mapping, not just the term fields, otherwise we'll wipe the _meta section of the mapping
try (XContentBuilder termFieldsMapping = ElasticsearchMappings.resultsMapping(mappingType, termFields)) {

incorrectly. I keep forgetting that an UPDATE to a mapping is done via a PUT and not a POST...

The number of fields is still a race condition, but only when it comes to failing early, which is acceptable.

@droberts195
Copy link
Author

Jenkins run elasticsearch-ci/packaging-sample

@droberts195 droberts195 merged commit be7ee7d into elastic:master Mar 15, 2019
@droberts195 droberts195 deleted the fix_multi_job_create_race branch March 15, 2019 09:25
droberts195 pushed a commit that referenced this pull request Mar 15, 2019
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785
droberts195 pushed a commit that referenced this pull request Mar 15, 2019
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785
droberts195 pushed a commit that referenced this pull request Mar 15, 2019
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Errors when running nginx module jobs

5 participants