-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Datafeed fails on missing indices, even with allow_no_indices set to true #62404
Comments
Pinging @elastic/ml-core (:ml) |
I think we should take this issue as an opportunity to explore if the datafeed could start even if no concrete index matches at all. The tricky question is should we advance time or not when no data is found? Like @droberts195 suggested, perhaps we can consider that if no data has ever been found we can avoid advancing time. That would allow setting up a job and datafeed and starting them before the data comes in. |
When we lazy load a job it is waiting for a node to have available memory. This has a small footprint as it is just 2 persistent tasks that are not yet allocated to a node. However if we were to wait for the first data to come in (without advancing time), how would we manage memory? A job that is waiting for data should not register memory in use (according to the calculation needed to ascertain which node jobs should be placed on) but the job is in an open state, so it's already picked a node so its already registered its max memory requirement. This is a bit chicken and egg. If we support this scenario, then I think we need to assume that multiple jobs could all be waiting for data to arrive, so we would need to figure our how to account for its memory after its started processing buckets, rather than while it is waiting to start. |
@dimitris-athanasiou @droberts195 What are we thinking here? Do we want to effectively make datafeeds "lazy" waiting for their indices to come available? Or do we want extractors and node assignment to handle the case when no data is found? I am thinking the later. Given my response to "should we advance time". I think we treat the data feed as a "real-time" (if they have it configured as such) that sees no data. We continue to move the scroll.
I am not sure. I think we should advance time. Reason being, it is timeseries that we are reading. Assuming there was no backlog to read from (i.e. no data), we then turn to real time scrolling. Consequently, we will read only new data as it comes in. Setting the option
Is an advanced configuration. The user should have a good reason for setting it. We can document the behavior but I think the behavior of least surprise is that we move the scroll forward, as normal, and treat it as real time just not seeing any data. |
Well, since we need to validate how to actually extract the data, I am not sure how we would want to move forward with this. If there is an index pattern that has NO concrete indices, how can we validate that the timefield is aggregatable? It seems that we have two options:
I am not sure we want to assign the datafeed, allow it to run, etc. when there is not a single concrete index. This seems error prone. |
I was also thinking that there is no point assigning the datafeed task at all if there are no concrete indices. Plus I think that'd be the simplest solution and it achieves what we need. If we do find data, I think from then on time should keep advancing. Lack of input data could be an anomaly the user is looking for. |
This issue is being addressed by the PR #62827 |
I do think this is being partially addressed. PR #62827 only handles the case for when there is at least one concrete index across all the patterns. It still stands that there needs to be at least ONE concrete index for a datafeed to start. |
Pinging @elastic/ml-core (Team:ML) |
When starting a datafeed that points to multiple index patterns - i.e.:
but one of the index patterns matches no indices (in this case, metrics-*), and the index pattern does not exist in kibana, I'm getting an error:
However, we have:
"indices_options": {
"allow_no_indices": true
}
set on the datafeed.
The text was updated successfully, but these errors were encountered: