Skip to content

[ML] TransformIT » testStopWaitForCheckpoint test fails #67121

@benwtrent

Description

@benwtrent

Build scan:
https://gradle-enterprise.elastic.co/s/btdxhp7hfpb6y
Repro line:

./gradlew ':x-pack:plugin:transform:qa:multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.TransformIT.testStopWaitForCheckpoint" -Dtests.seed=E5346188868E92A8 -Dtests.security.manager=true -Dtests.locale=pt-BR -Dtests.timezone=America/Thule -Druntime.java=8

Reproduces locally?:
No
Applicable branches:
7.x (though probably master as well)
Failure history:

Failure excerpt:

org.elasticsearch.xpack.transform.integration.TransformIT > testStopWaitForCheckpoint FAILED |  
-- | --
  | ElasticsearchStatusException[Elasticsearch exception [type=status_exception, reason=Failed to update transform task [transform-wait-for-checkpoint] state value should_stop_at_checkpoint from [true] to [true]]] |  
  | at __randomizedtesting.SeedInfo.seed([E5346188868E92A8:98E14DE34A1A789E]:0) |  
  | at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) |  
  | at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) |  
  | at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) |  
  | at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) |  
  | at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1617) |  
  | at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1584) |  
  | at org.elasticsearch.client.TransformClient.stopTransform(TransformClient.java:319) |  
  | at org.elasticsearch.xpack.transform.integration.TransformIntegTestCase.stopTransform(TransformIntegTestCase.java:136) |  
  | at org.elasticsearch.xpack.transform.integration.TransformIT.testStopWaitForCheckpoint(TransformIT.java:288)

This indicates some race condition when trying to set the value for should_stop_at_checkpoint. I am not sure if it was retried in the call and then caused the failure or if two callers are making the stop call.

Getting logging to attach to the issue.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions