Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add option to stop datafeed that finds no data #47922

Merged
merged 7 commits into from
Oct 14, 2019

Conversation

droberts195
Copy link
Contributor

@droberts195 droberts195 commented Oct 11, 2019

Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Adds a new datafeed config option that tells a datafeed
that has never found any data to stop itself and close
its associated job after a certain number of real-time
searches have returned no data.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195 droberts195 marked this pull request as ready for review October 12, 2019 19:51
@droberts195 droberts195 removed the WIP label Oct 12, 2019
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

any initial training period) then it will automatically stop itself and
close its associated job after this many real-time searches that return no
documents. In other words, it will stop after `frequency` times
`stop_after_empty_search_responses` of real-time operation. If not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the old name, right? Please update it.

during any initial training period) then it will automatically stop itself
and close its associated job after this many real-time searches that return
no documents. In other words, it will stop after `frequency` times
`stop_after_empty_search_responses` of real-time operation. If not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

@@ -676,6 +701,18 @@ public void setDelayedDataCheckConfig(DelayedDataCheckConfig delayedDataCheckCon
this.delayedDataCheckConfig = delayedDataCheckConfig;
}

public void setMaxEmptySearches(int maxEmptySearches) {
if (maxEmptySearches == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to encode missing setting as "-1"?
Since the field is nullable we can pass "Integer" to this method.
Alternatively, we could pass "Optional" or introduce another method "clearMaxEmptySearches".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the field is nullable we can pass "Integer" to this method.

This would be true for initial creation of the config, but there is a complication with updates, which is also related to your other question.

The update needs a way to distinguish "I don't want to update this setting" from "I want to update this setting to 'never'". This needs to be a way that will work via a JSON REST request, not just a Java API.

Alternatively, we could pass "Optional" or introduce another method "clearMaxEmptySearches".

If the interface was purely Java then these would be better. But since the REST request for updates will have to have something like -1 to mean "I want to update this setting to 'never'" I thought it was better to duplicate that here rather than (for example) translate -1 to an empty optional in the update REST action, then take an optional here. Doing that would force future maintainers to look through more source files to see how the REST level documentation mapped to the Java code.

@@ -140,6 +144,12 @@ public DatafeedUpdate(StreamInput in) throws IOException {
this.scrollSize = in.readOptionalVInt();
this.chunkingConfig = in.readOptionalWriteable(ChunkingConfig::new);
delayedDataCheckConfig = in.readOptionalWriteable(DelayedDataCheckConfig::new);
// TODO: change version in backport
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
maxEmptySearches = in.readOptionalInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use readOptionalInt here and readOptionalVInt in DatafeedConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A VInt cannot be negative. The datafeed config object can store "never stop this datafeed due to lack of data" as null, so can map an input of -1 to a stored null. So what is stored in the config is either null or a positive integer. But the update needs to distinguish "I don't want to update this setting" from "I want to update this setting to 'never'". I'm using null for "I don't want to update this setting" and -1 for "I want to update this setting to 'never'". But this means that the -1 needs to be stored within the update and potentially transported to a different node, so it cannot be a VInt.

I suppose I could have used 0 to encode "I want to update this setting to 'never'" as 0 is not a valid setting. But elsewhere in Elasticsearch -1 is a more common way of saying "never". And I could have made the internal representation 0 even though the external configuration is -1, but the detrimental effect on tracing user-input values through the code seems to outweigh the slight benefit in number of bytes transmitted over the network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, thanks for detailed explanation.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -140,6 +144,12 @@ public DatafeedUpdate(StreamInput in) throws IOException {
this.scrollSize = in.readOptionalVInt();
this.chunkingConfig = in.readOptionalWriteable(ChunkingConfig::new);
delayedDataCheckConfig = in.readOptionalWriteable(DelayedDataCheckConfig::new);
// TODO: change version in backport
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
maxEmptySearches = in.readOptionalInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, thanks for detailed explanation.

@droberts195 droberts195 merged commit d308095 into elastic:master Oct 14, 2019
@droberts195 droberts195 deleted the stop_df_that_finds_no_data branch October 14, 2019 12:26
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 14, 2019
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 14, 2019
The BWC version for auto-stopping datafeeds
should be 7.5.0 now elastic#47922 is backported.
droberts195 added a commit that referenced this pull request Oct 14, 2019
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.

Backport of #47922
droberts195 added a commit that referenced this pull request Oct 14, 2019
The BWC version for auto-stopping datafeeds
should be 7.5.0 now #47922 is backported.
droberts195 added a commit to elastic/kibana that referenced this pull request Oct 15, 2019
This change augments the SIEM jobs and datafeeds that were
added in #47848 with the allow_lazy_open and max_empty_searches
options that were added in elastic/elasticsearch#47726 and
elastic/elasticsearch#47922 respectively.
droberts195 added a commit to elastic/kibana that referenced this pull request Oct 16, 2019
…#48372)

This change augments the SIEM jobs and datafeeds that were
added in #47848 with the allow_lazy_open and max_empty_searches
options that were added in elastic/elasticsearch#47726 and
elastic/elasticsearch#47922 respectively.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants