[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404

blaklaybul · 2020-09-15T17:01:42Z

When starting a datafeed that points to multiple index patterns - i.e.:

"indices": [
      "metricbeat-*",
      "metrics-*"
    ]

but one of the index patterns matches no indices (in this case, metrics-*), and the index pattern does not exist in kibana, I'm getting an error:

server    log   [10:39:47.697] [error][data][elasticsearch] [status_exception]: No node found to start datafeed [datafeed-kibana-metrics-ui-default-default-hosts_network_out], allocation explanation [cannot start datafeed [datafeed-kibana-metrics-ui-default-default-hosts_network_out] because index [metrics-*] does not exist, is closed, or is still initializing.]

However, we have:
"indices_options": {
"allow_no_indices": true
}

set on the datafeed.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-09-15T17:01:44Z

Pinging @elastic/ml-core (:ml)

dimitris-athanasiou · 2020-09-16T06:41:44Z

I think we should take this issue as an opportunity to explore if the datafeed could start even if no concrete index matches at all.

The tricky question is should we advance time or not when no data is found?

Like @droberts195 suggested, perhaps we can consider that if no data has ever been found we can avoid advancing time. That would allow setting up a job and datafeed and starting them before the data comes in.

sophiec20 · 2020-09-17T17:45:37Z

When we lazy load a job it is waiting for a node to have available memory. This has a small footprint as it is just 2 persistent tasks that are not yet allocated to a node.

However if we were to wait for the first data to come in (without advancing time), how would we manage memory? A job that is waiting for data should not register memory in use (according to the calculation needed to ascertain which node jobs should be placed on) but the job is in an open state, so it's already picked a node so its already registered its max memory requirement. This is a bit chicken and egg.

If we support this scenario, then I think we need to assume that multiple jobs could all be waiting for data to arrive, so we would need to figure our how to account for its memory after its started processing buckets, rather than while it is waiting to start.

benwtrent · 2020-09-22T17:27:34Z

@dimitris-athanasiou @droberts195

What are we thinking here?

Do we want to effectively make datafeeds "lazy" waiting for their indices to come available?

Or do we want extractors and node assignment to handle the case when no data is found?

I am thinking the later. Given my response to "should we advance time". I think we treat the data feed as a "real-time" (if they have it configured as such) that sees no data. We continue to move the scroll.

The tricky question is should we advance time or not when no data is found?

I am not sure. I think we should advance time. Reason being, it is timeseries that we are reading. Assuming there was no backlog to read from (i.e. no data), we then turn to real time scrolling. Consequently, we will read only new data as it comes in.

Setting the option

"indices_options": {
   "allow_no_indices": true
}

Is an advanced configuration. The user should have a good reason for setting it. We can document the behavior but I think the behavior of least surprise is that we move the scroll forward, as normal, and treat it as real time just not seeing any data.

benwtrent · 2020-09-22T18:11:38Z

Well, since we need to validate how to actually extract the data, I am not sure how we would want to move forward with this.

If there is an index pattern that has NO concrete indices, how can we validate that the timefield is aggregatable? It seems that we have two options:

Require a concrete index to exist (to create extracted fields)
Or we don't assign the datafeed to a node until at least one concrete index is available.

I am not sure we want to assign the datafeed, allow it to run, etc. when there is not a single concrete index. This seems error prone.

dimitris-athanasiou · 2020-09-23T07:07:26Z

I was also thinking that there is no point assigning the datafeed task at all if there are no concrete indices. Plus I think that'd be the simplest solution and it achieves what we need.

If we do find data, I think from then on time should keep advancing. Lack of input data could be an anomaly the user is looking for.

panbalag · 2020-09-24T14:50:31Z

This issue is being addressed by the PR #62827

benwtrent · 2020-09-24T14:53:25Z

I do think this is being partially addressed.

PR #62827 only handles the case for when there is at least one concrete index across all the patterns.

It still stands that there needs to be at least ONE concrete index for a datafeed to start.

elasticsearchmachine · 2023-03-16T15:51:35Z

Pinging @elastic/ml-core (Team:ML)

blaklaybul added >bug :ml Machine learning needs:triage Requires assignment of a team area label labels Sep 15, 2020

dimitris-athanasiou added team-discuss and removed needs:triage Requires assignment of a team area label labels Sep 15, 2020

phillipb mentioned this issue Sep 22, 2020

[Metrics UI] Anomaly Detection setup flow for Metrics elastic/kibana#76787

Merged

3 tasks

benwtrent self-assigned this Sep 22, 2020

benwtrent removed their assignment Mar 16, 2023

elasticsearchmachine added the Team:ML Meta label for the ML team label Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404

[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404

blaklaybul commented Sep 15, 2020

elasticmachine commented Sep 15, 2020

dimitris-athanasiou commented Sep 16, 2020

sophiec20 commented Sep 17, 2020

benwtrent commented Sep 22, 2020 •

edited

Loading

benwtrent commented Sep 22, 2020

dimitris-athanasiou commented Sep 23, 2020

panbalag commented Sep 24, 2020

benwtrent commented Sep 24, 2020

elasticsearchmachine commented Mar 16, 2023

[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404

[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404

Comments

blaklaybul commented Sep 15, 2020

elasticmachine commented Sep 15, 2020

dimitris-athanasiou commented Sep 16, 2020

sophiec20 commented Sep 17, 2020

benwtrent commented Sep 22, 2020 • edited Loading

benwtrent commented Sep 22, 2020

dimitris-athanasiou commented Sep 23, 2020

panbalag commented Sep 24, 2020

benwtrent commented Sep 24, 2020

elasticsearchmachine commented Mar 16, 2023

benwtrent commented Sep 22, 2020 •

edited

Loading