-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] upgraded_cluster/30_ml_jobs_crud/Test open old jobs failed with .ml-state-write not an alias #59011
Comments
Pinging @elastic/ml-core (:ml) |
Interestingly, in
That strikes me as very dodgy, because it implies that our production code couldn't cope with possible race conditions with the creation of the More recently this won't even help, which probably explains why these tests sometimes fail now. The index we are actually using is Theory for what happened:
|
We also have quite a few open issues for "all shards failed" against our internal state or stats indices: #54887, #55221, #55807, #57102, and #58841 I am thinking that |
There have been a few test failures that are likely caused by tests performing actions that use ML indices immediately after the actions that create those ML indices. Currently this can result in attempts to search the newly created index before its shards have initialized. This change makes the method that creates the internal ML indices that have been affected by this problem (state and stats) wait for the shards to be initialized before returning. Fixes elastic#54887 Fixes elastic#55221 Fixes elastic#55807 Fixes elastic#57102 Fixes elastic#58841 Fixes elastic#59011
I've added this |
Oh, I see. Yes, I guess it does make sense in the case where the old cluster is on a version that will create |
…#59027) There have been a few test failures that are likely caused by tests performing actions that use ML indices immediately after the actions that create those ML indices. Currently this can result in attempts to search the newly created index before its shards have initialized. This change makes the method that creates the internal ML indices that have been affected by this problem (state and stats) wait for the shards to be initialized before returning. Fixes #54887 Fixes #55221 Fixes #55807 Fixes #57102 Fixes #58841 Fixes #59011
Build scan: https://gradle-enterprise.elastic.co/s/svde5ts33nfki
Repro line:
Reproduces locally?: No
Applicable branches: 7.8 (upgraded from 7.6)
Failure history:
https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:'2020-03-03T13:01:09.914Z',mode:absolute,to:'2020-07-03T12:01:15.600Z'))&_a=(columns:!(_source),index:b646ed00-7efc-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'%22Test%20open%20old%20jobs%22'),sort:!(process.time-start,desc))
There are many failures of this test over the years, but it has failed for many completely different reasons. The earliest I could find with this problem was on 5th March 2020.
Failure excerpt:
The relevant bit of the YAML file is:
We have seen the problem of
.ml-state-write
being a concrete index rather than an alias in clusters outside of CI. It seems that, very occasionally, it can happen in CI too.The text was updated successfully, but these errors were encountered: