Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Refactor ML mappings and templates into JSON resources #51765

Conversation

dimitris-athanasiou
Copy link
Contributor

ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
IndexTemplateRegistry. It expects the templates to be in resource
json files.

For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
MlIndexTemplateRegistry.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

try {
BytesReference source = load(resource);
source = replaceVariables(source, version, versionProperty, variables);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwbrown Here are the changes I made to allow variable replacement on template loading. Could you take a look please when you have some time?

@@ -887,101 +881,6 @@ protected Clock getClock() {
public UnaryOperator<Map<String, IndexTemplateMetaData>> getIndexTemplateMetaDataUpgrader() {
return templates -> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not replaced this with returning an operator that does nothing yet because it is unclear whether IndexTemplateRegistry fully addresses upgrading issues. I'm waiting to hear on this, until then the PR is WIP.

private static final String CLUSTER_NAME = "myCluster";

@SuppressWarnings("unchecked")
public void testCreateJobResultsIndex() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are unfortunately the result of extreme mocking, something we tried in the past but doesn't seem to bring benefits. I have removed them and replaced them with integration tests in JobResultsProviderIT.

@@ -1031,6 +855,31 @@ public void testDatafeedTimingStats_NotFound() throws IOException {
verifyNoMoreInteractions(client);
}

@SuppressWarnings("unchecked")
// public void testTermFieldMapping() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this one, I forgot to address it. I'll take care of it before removing the WIP label.

@@ -77,14 +56,11 @@
private ClusterAdminClient clusterAdminClient;
private IndicesAdminClient indicesAdminClient;

private IndicesAliasesRequestBuilder aliasesRequestBuilder;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight I'm not sure this mock client class was useful. I remove all unused methods.


private static final IndexTemplateConfig CONFIG_TEMPLATE = configTemplate();

private static final IndexTemplateConfig INFERENCE_TEMPLATE = new IndexTemplateConfig(InferenceIndexConstants.LATEST_INDEX_NAME,
Copy link
Contributor Author

@dimitris-athanasiou dimitris-athanasiou Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template's name is formed by the inference prefix plus the latest version. This would mean that once we change version we'll create a new template. I think that's probably not a good idea. I left it as is for now but we can address in this PR. In fact, our template naming is a bit inconsistent but I'm not sure we can iron it easily. I'll make sure we discuss this.

Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments about escaping in the changes to TemplateUtils/IndexTemplateConfig, but I think once those are addressed the additional variable support will be valuable.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a few nits plus there's a question of what we should do when the plugin is disabled.

Let me know if the test failures are showing up any unforeseen edge cases that need talking through.

createIndexRequest.settings(Settings.builder()
.put(IndexMetaData.SETTING_AUTO_EXPAND_REPLICAS, "0-1")
.put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, "1")
.put(UnassignedInfo.INDEX_DELAYED_NODE_LEFT_TIMEOUT_SETTING.getKey(), delayedNodeTimeOutSetting));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the templates we decided to leave this out in the future. Can it be omitted here too now we're explicitly setting it in the affected tests? (Plus the code above that looks for whether we're using the black hole autodetect process.)

@@ -0,0 +1,47 @@
{
"order" : 0,
"version" : 8000099,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a variable?


private static final IndexTemplateConfig ANOMALY_DETECTION_RESULTS_TEMPLATE = anomalyDetectionResultsTemplate();

private static final IndexTemplateConfig ANOMALY_DETECTION_STATE_TEMPLATE = new IndexTemplateConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using the same state index for data frame analytics state too, so maybe drop the ANOMALY_DETECTION_ bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move all state-related logic in AnomalyDetectorsIndex into a MlStateIndex class. But I'd rather do that in a follow-up PR.

@Override
protected List<IndexTemplateConfig> getTemplateConfigs() {
if (mlEnabled == false) {
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a difference compared to what we did with the old getIndexTemplateMetaDataUpgrader() functionality. In that we used to return our templates even when ML was disabled on the node. Returning nothing when ML is disabled isn't necessarily wrong, but it's a difference that needs thinking about.

There may be some subtle effect in Cloud for example. I think at one time there was a proposal that when a Cloud cluster gets switched to different hardware and a full cluster snapshot from the old hardware is restored into brand new empty cluster running on different hardware that all the X-Pack plugins are disabled during that restore. I am not sure if this is still being proposed or not. But it's a scenario where the ML plugin would be disabled in a cluster that contained ML indices.

Let's consult more widely on this offline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was wrong to return an empty list. A number of tests was failing because of that. As long as the plugin is loaded, the templates should be there as some nodes may become ML nodes.

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added 2 comments.

validate(source);

return filter(source, version, versionProperty);
return source.utf8ToString();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of conversions and deep copies: utf8->utf16->utf8->utf16. It does not look to hard to implement replaceVariables and validate with strings instead of utf8 byte buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gwbrown
Copy link
Contributor

gwbrown commented Feb 4, 2020

The changes in b765370 LGTM. Thank you!

@dimitris-athanasiou
Copy link
Contributor Author

I believe the failing tests in MlConfigIndexMappingsFullClusterRestartIT are now because of the lack of template upgrading. This is worked on in #51456. Once that is merged I should rebase this PR and then I think it'll be ready.

ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
`IndexTemplateRegistry`. It expects the templates to be in resource
json files.

For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
`MlIndexTemplateRegistry`.
@dimitris-athanasiou dimitris-athanasiou force-pushed the refactor-ml-mappings-into-json-files branch from 3b553d8 to 7ace93c Compare February 12, 2020 10:49
@dimitris-athanasiou
Copy link
Contributor Author

This has now been rebased to pick #51456. Let's see if it passes all the tests.

@gwbrown
Copy link
Contributor

gwbrown commented Feb 12, 2020

Changes in 0aba76e also LGTM. Thanks again!

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (source == null) {
throw new ElasticsearchParseException("Template must not be null");
}
if (Strings.isEmpty(source)) {
throw new ElasticsearchParseException("Template must not be empty");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder about that: this exception type is used if a REST request is malformed. In this context this makes no sense and we consider the input immutable, right? IMHO this is more of a RTE kind of thing.

Anyway, this is out of scope for this PR, you just add another exception and follow the existing style.

@@ -0,0 +1,36 @@
{
Copy link

@hendrikmuhs hendrikmuhs Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put comments in such files? I think it would be good to leave some explanation to the reader of this. I would also add a placeholder for a changelog, in case we change the schema.

^ applies to all json files introduced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no support for comments in JSON files as it stands. We could in theory support them by removing them comment lines when such files are loaded. As for a changelog, I think one of the benefits for those being files, is that we treat them similarly to code files. Thus, git serves as a change log. We can git-blame them and see the commit where each line was introduced/modified. Not sure we need more. But let's see how it goes and we add them if it appears to be needed?

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Since the CI is green and this change is blocking two other pieces of work from starting I would be happy to merge this as-is and do any additional minor tweaks in followup PRs.

@dimitris-athanasiou dimitris-athanasiou merged commit 8fcbd1d into elastic:master Feb 13, 2020
@dimitris-athanasiou dimitris-athanasiou deleted the refactor-ml-mappings-into-json-files branch February 13, 2020 09:51
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 14, 2020
…stic#51765)

ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
`IndexTemplateRegistry`. It expects the templates to be in resource
json files.

For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
`MlIndexTemplateRegistry`.

Backport of elastic#51765
dimitris-athanasiou added a commit that referenced this pull request Feb 14, 2020
#52353)

ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
`IndexTemplateRegistry`. It expects the templates to be in resource
json files.

For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
`MlIndexTemplateRegistry`.

Backport of #51765
tvernum added a commit to tvernum/elasticsearch that referenced this pull request Feb 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants