[ML] CI ForecastIT::testOverflowToDisk can fail due to index timing issues

The failure has not been seen on the official CI yet, but on a private one:

```
 2> REPRODUCE WITH: ./gradlew :x-pack:qa:ml-native-tests:integTestRunner -Dtests.seed=E6CA9B79CBDD62D7 -Dtests.class=org.elasticsearch.xpack.ml.integration.ForecastIT -Dtests.method="testOverflowToDisk" -Dtests.security.manager=true -Dtests.locale=agq -Dtests.timezone=Europe/Malta
FAILURE 50.8s | ForecastIT.testOverflowToDisk <<< FAILURES!
Throwable #1: java.lang.AssertionError: 
Expected: <8000>
    but: was <7103>
	at __randomizedtesting.SeedInfo.seed([E6CA9B79CBDD62D7:A91A2523A2A36B67]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.elasticsearch.xpack.ml.integration.ForecastIT.testOverflowToDisk(ForecastIT.java:248)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at java.base/java.lang.Thread.run(Thread.java:844)
 1> [2018-06-06T14:20:28,079][INFO ][o.e.x.m.i.ForecastIT     ] [testNoData]: before test
 1> [2018-06-06T14:20:28,080][INFO ][o.e.x.m.i.ForecastIT     ] [ForecastIT#testNoData]: setting up test
 1> [2018-06-06T14:20:28,280][INFO ][o.e.x.m.i.ForecastIT     ] [ForecastIT#testNoData]: all set up test
 1> [2018-06-06T14:20:29,247][INFO ][o.e.x.m.i.ForecastIT     ] [ForecastIT#testNoData]: cleaning up after test
 1> [2018-06-06T14:20:29,557][INFO ][o.e.x.m.i.ForecastIT     ] [ForecastIT#testNoData]: cleaned up after test
 1> [2018-06-06T14:20:29,558][INFO ][o.e.x.m.i.ForecastIT     ] [testNoData]: after test
```

This is caused by a internal timing problem:

A document that marks the *end* of the forecast is indexed last but due to the `near real time` behavior, sharding and concurrency it can happen that not all forecast data points are written when the status document is written, meaning `searchable`.

The issue has been introduced in #30969. Before the test assertions the test closed the job which triggers an index refresh but after that change closing the job has been moved to the end. Therefore the test fix is as easy as closing the job again and opening it for the 2nd test.

Beside the test issue this is still a problem. The UI only closes the job if it was closed prior running the forecast. Open jobs are kept open and potentially hit the problem although this is rather unlikely as the UI side is slowed down by network communication round tripping. Furthermore aggregations are used for charting, missing a couple of datapoints likely does not produce a visual artifact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] CI ForecastIT::testOverflowToDisk can fail due to index timing issues #31173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] CI ForecastIT::testOverflowToDisk can fail due to index timing issues #31173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions