Description
The failure has not been seen on the official CI yet, but on a private one:
2> REPRODUCE WITH: ./gradlew :x-pack:qa:ml-native-tests:integTestRunner -Dtests.seed=E6CA9B79CBDD62D7 -Dtests.class=org.elasticsearch.xpack.ml.integration.ForecastIT -Dtests.method="testOverflowToDisk" -Dtests.security.manager=true -Dtests.locale=agq -Dtests.timezone=Europe/Malta
FAILURE 50.8s | ForecastIT.testOverflowToDisk <<< FAILURES!
Throwable #1: java.lang.AssertionError:
Expected: <8000>
but: was <7103>
at __randomizedtesting.SeedInfo.seed([E6CA9B79CBDD62D7:A91A2523A2A36B67]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.elasticsearch.xpack.ml.integration.ForecastIT.testOverflowToDisk(ForecastIT.java:248)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at java.base/java.lang.Thread.run(Thread.java:844)
1> [2018-06-06T14:20:28,079][INFO ][o.e.x.m.i.ForecastIT ] [testNoData]: before test
1> [2018-06-06T14:20:28,080][INFO ][o.e.x.m.i.ForecastIT ] [ForecastIT#testNoData]: setting up test
1> [2018-06-06T14:20:28,280][INFO ][o.e.x.m.i.ForecastIT ] [ForecastIT#testNoData]: all set up test
1> [2018-06-06T14:20:29,247][INFO ][o.e.x.m.i.ForecastIT ] [ForecastIT#testNoData]: cleaning up after test
1> [2018-06-06T14:20:29,557][INFO ][o.e.x.m.i.ForecastIT ] [ForecastIT#testNoData]: cleaned up after test
1> [2018-06-06T14:20:29,558][INFO ][o.e.x.m.i.ForecastIT ] [testNoData]: after test
This is caused by a internal timing problem:
A document that marks the end of the forecast is indexed last but due to the near real time
behavior, sharding and concurrency it can happen that not all forecast data points are written when the status document is written, meaning searchable
.
The issue has been introduced in #30969. Before the test assertions the test closed the job which triggers an index refresh but after that change closing the job has been moved to the end. Therefore the test fix is as easy as closing the job again and opening it for the 2nd test.
Beside the test issue this is still a problem. The UI only closes the job if it was closed prior running the forecast. Open jobs are kept open and potentially hit the problem although this is rather unlikely as the UI side is slowed down by network communication round tripping. Furthermore aggregations are used for charting, missing a couple of datapoints likely does not produce a visual artifact.