Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformConfigurationIndexIT.testDeleteConfigurationLeftOver failures #54810

Closed
dimitris-athanasiou opened this issue Apr 6, 2020 · 10 comments · Fixed by #54939 or #55786
Closed
Assignees
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI

Comments

@dimitris-athanasiou
Copy link
Contributor

Jenkins: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.7+matrix-java-periodic/ES_RUNTIME_JAVA=openjdk14,nodes=general-purpose/23/console

Build scan: https://gradle-enterprise.elastic.co/s/skpba3jxaypqk

Failure:

org.elasticsearch.client.ResponseException: 
method [GET], host [http://[::1]:46410], URI [.transform-notifications-000002/_search?ignore_unavailable=true], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}

at __randomizedtesting.SeedInfo.seed([61CEE881B62A7B54:3FFF0103521740CE]:0)
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:267)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.xpack.transform.integration.TransformRestTestCase.logAudits(TransformRestTestCase.java:506)
at org.elasticsearch.xpack.transform.integration.TransformRestTestCase.waitForTransform(TransformRestTestCase.java:383)

Reproduce with:

./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformConfigurationIndexIT.testDeleteConfigurationLeftOver" -Dtests.seed=61CEE881B62A7B54 -Dtests.security.manager=true -Dtests.locale=ar -Dtests.timezone=Pacific/Guam -Dcompiler.java=13
HomeTestsorg.elasticsearch.xpack.transform.integration.TransformConfigurationIndexIT » testDeleteConfigurationLeftOver

Cannot reproduce locally.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@droberts195
Copy link
Contributor

The server side log shows this:

[2020-04-06T08:42:07,553][INFO ][o.e.c.m.MetaDataCreateIndexService] [integTest-0] [.transform-notifications-000002] creating index, cause [auto(bulk api)], templates [.transform-notifications-000002], shards [1]/[1], mappings [_doc]
[2020-04-06T08:42:07,554][INFO ][o.e.c.r.a.AllocationService] [integTest-0] updating number_of_replicas to [0] for indices [.transform-notifications-000002]
[2020-04-06T08:42:07,555][DEBUG][o.e.c.c.PublicationTransportHandler] [integTest-0] received diff cluster state version [460] with uuid [P-5Y85MASAyLbz5Qv7ju3w], diff size [607]
[2020-04-06T08:42:07,602][DEBUG][o.e.c.c.C.CoordinatorPublication] [integTest-0] publication ended successfully: Publication{term=1, version=460}
[2020-04-06T08:42:07,609][WARN ][r.suppressed             ] [integTest-0] path: .transform-notifications-000002/_search, params: {ignore_unavailable=true, index=.transform-notifications-000002}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:580) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:68) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:245) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:402) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1248) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1222) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.search.SearchService.lambda$runAsync$0(SearchService.java:413) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2020-04-06T08:42:07,641][WARN ][r.suppressed             ] [integTest-0] path: .transform-notifications-000002/_search, params: {ignore_unavailable=true, index=.transform-notifications-000002}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:580) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:68) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:245) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:402) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1248) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1222) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.search.SearchService.lambda$runAsync$0(SearchService.java:413) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2020-04-06T08:42:07,648][INFO ][o.e.c.r.a.AllocationService] [integTest-0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.transform-notifications-000002][0]]]).

The most likely sequence seems to be:

  1. One thread in production code indexes the very first transforms notification
  2. This causes the index to be created
  3. Before the primary shard has been allocated a test code thread coincidentally searches for notifications
  4. This test code thread gets the error that all shards failed

The test code is using the ignore_unavailable=true option, so it should not receive an error. So the problem may be due to ignore_unavailable=true not working for hidden indices that have just been created.

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 8, 2020
elasticmachine added a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 8, 2020
elasticmachine added a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 8, 2020
elasticmachine added a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 8, 2020
hendrikmuhs pushed a commit that referenced this issue Apr 9, 2020
move no initializing shards check before dumping audit messages

fixes #54810
hendrikmuhs pushed a commit that referenced this issue Apr 9, 2020
move no initializing shards check before dumping audit messages

fixes #54810
hendrikmuhs pushed a commit that referenced this issue Apr 9, 2020
move no initializing shards check before dumping audit messages

fixes #54810
@not-napoleon
Copy link
Member

@not-napoleon not-napoleon reopened this Apr 16, 2020
@iverase
Copy link
Contributor

iverase commented Apr 21, 2020

An another failure in 7.x: https://gradle-enterprise.elastic.co/s/t3tpj3fsp2nmo

@original-brownbear
Copy link
Member

Another one on 7.7 https://gradle-enterprise.elastic.co/s/efhyg2za3sj7q

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 27, 2020
@przemekwitek
Copy link
Contributor

Another one on 7.x:

Build 20200427134158-C6F180FA
Log https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part2/4695/console
Build Scans https://gradle-enterprise.elastic.co/s/svojzy2pwyopa

Repro lines:

REPRODUCE WITH: ./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformConfigurationIndexIT.testDeleteConfigurationLeftOver" \
  -Dtests.seed=A124F550373B72B8 \
  -Dtests.security.manager=true \
  -Dtests.locale=el-CY \
  -Dtests.timezone=CST6CDT \
  -Dcompiler.java=14 \
  -Druntime.java=8

REPRODUCE WITH: ./gradlew ':x-pack:plugin:transform:qa:single-node-tests:integTestRunner' --tests "org.elasticsearch.xpack.transform.integration.TransformConfigurationIndexIT.testDeleteConfigurationLeftOver" \
  -Dtests.seed=A124F550373B72B8 \
  -Dtests.security.manager=true \
  -Dtests.locale=el-CY \
  -Dtests.timezone=CST6CDT \
  -Dcompiler.java=14 \
  -Druntime.java=8

hendrikmuhs pushed a commit that referenced this issue Apr 27, 2020
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
hendrikmuhs pushed a commit that referenced this issue Apr 27, 2020
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
@andreidan
Copy link
Contributor

@hendrikmuhs we had another 7.7 failure for this (build scan available here https://gradle-enterprise.elastic.co/s/sbywajk6g4chc )

I believe #55786 needs to be backported

@hendrikmuhs
Copy link

hendrikmuhs commented May 7, 2020

@hendrikmuhs we had another 7.7 failure for this (build scan available here gradle-enterprise.elastic.co/s/sbywajk6g4chc )

I believe #55786 needs to be backported

@andreidan That's correct. The fix has the backport_pending label on it as a reminder. 7.7 is not yet tagged and commits to the branch should be only made for really necessary ones, I think this not the case for this fix.

I will backport as soon as 7.7 is open for non-critical commits.

@benwtrent
Copy link
Member

Yet another failure for 7.7 :( https://gradle-enterprise.elastic.co/s/jna5v4vz7z654

hendrikmuhs pushed a commit that referenced this issue May 14, 2020
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
@hendrikmuhs
Copy link

finally backported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI
Projects
None yet
10 participants