-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add option to stop datafeed that finds no data #47922
[ML] Add option to stop datafeed that finds no data #47922
Conversation
Adds a new datafeed config option that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data.
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
any initial training period) then it will automatically stop itself and | ||
close its associated job after this many real-time searches that return no | ||
documents. In other words, it will stop after `frequency` times | ||
`stop_after_empty_search_responses` of real-time operation. If not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the old name, right? Please update it.
during any initial training period) then it will automatically stop itself | ||
and close its associated job after this many real-time searches that return | ||
no documents. In other words, it will stop after `frequency` times | ||
`stop_after_empty_search_responses` of real-time operation. If not set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here.
@@ -676,6 +701,18 @@ public void setDelayedDataCheckConfig(DelayedDataCheckConfig delayedDataCheckCon | |||
this.delayedDataCheckConfig = delayedDataCheckConfig; | |||
} | |||
|
|||
public void setMaxEmptySearches(int maxEmptySearches) { | |||
if (maxEmptySearches == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to encode missing setting as "-1"?
Since the field is nullable we can pass "Integer" to this method.
Alternatively, we could pass "Optional" or introduce another method "clearMaxEmptySearches".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the field is nullable we can pass "Integer" to this method.
This would be true for initial creation of the config, but there is a complication with updates, which is also related to your other question.
The update needs a way to distinguish "I don't want to update this setting" from "I want to update this setting to 'never'". This needs to be a way that will work via a JSON REST request, not just a Java API.
Alternatively, we could pass "Optional" or introduce another method "clearMaxEmptySearches".
If the interface was purely Java then these would be better. But since the REST request for updates will have to have something like -1
to mean "I want to update this setting to 'never'" I thought it was better to duplicate that here rather than (for example) translate -1
to an empty optional in the update REST action, then take an optional here. Doing that would force future maintainers to look through more source files to see how the REST level documentation mapped to the Java code.
@@ -140,6 +144,12 @@ public DatafeedUpdate(StreamInput in) throws IOException { | |||
this.scrollSize = in.readOptionalVInt(); | |||
this.chunkingConfig = in.readOptionalWriteable(ChunkingConfig::new); | |||
delayedDataCheckConfig = in.readOptionalWriteable(DelayedDataCheckConfig::new); | |||
// TODO: change version in backport | |||
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { | |||
maxEmptySearches = in.readOptionalInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use readOptionalInt
here and readOptionalVInt
in DatafeedConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A VInt
cannot be negative. The datafeed config object can store "never stop this datafeed due to lack of data" as null
, so can map an input of -1
to a stored null
. So what is stored in the config is either null
or a positive integer. But the update needs to distinguish "I don't want to update this setting" from "I want to update this setting to 'never'". I'm using null
for "I don't want to update this setting" and -1
for "I want to update this setting to 'never'". But this means that the -1
needs to be stored within the update and potentially transported to a different node, so it cannot be a VInt
.
I suppose I could have used 0
to encode "I want to update this setting to 'never'" as 0
is not a valid setting. But elsewhere in Elasticsearch -1
is a more common way of saying "never". And I could have made the internal representation 0
even though the external configuration is -1
, but the detrimental effect on tracing user-input values through the code seems to outweigh the slight benefit in number of bytes transmitted over the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand, thanks for detailed explanation.
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/ProblemTracker.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -140,6 +144,12 @@ public DatafeedUpdate(StreamInput in) throws IOException { | |||
this.scrollSize = in.readOptionalVInt(); | |||
this.chunkingConfig = in.readOptionalWriteable(ChunkingConfig::new); | |||
delayedDataCheckConfig = in.readOptionalWriteable(DelayedDataCheckConfig::new); | |||
// TODO: change version in backport | |||
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { | |||
maxEmptySearches = in.readOptionalInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand, thanks for detailed explanation.
Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data.
The BWC version for auto-stopping datafeeds should be 7.5.0 now elastic#47922 is backported.
Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data. Backport of #47922
The BWC version for auto-stopping datafeeds should be 7.5.0 now #47922 is backported.
This change augments the SIEM jobs and datafeeds that were added in #47848 with the allow_lazy_open and max_empty_searches options that were added in elastic/elasticsearch#47726 and elastic/elasticsearch#47922 respectively.
…#48372) This change augments the SIEM jobs and datafeeds that were added in #47848 with the allow_lazy_open and max_empty_searches options that were added in elastic/elasticsearch#47726 and elastic/elasticsearch#47922 respectively.
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.