Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransformContinuousIT.testContinousEvents failing #66410

Closed
mark-vieira opened this issue Dec 15, 2020 · 6 comments · Fixed by #66718
Closed

[CI] TransformContinuousIT.testContinousEvents failing #66410

mark-vieira opened this issue Dec 15, 2020 · 6 comments · Fixed by #66718
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI

Comments

@mark-vieira
Copy link
Contributor

This started failing often recently. Also average runtime have increased in the same time period so something is going on here.

Build scan:
https://gradle-enterprise.elastic.co/s/f42ebecbqgebo/tests/:x-pack:plugin:transform:qa:multi-node-tests:javaRestTest/org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT/testContinousEvents

Repro line:
./gradlew ':x-pack:plugin:transform:qa:multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.testContinousEvents" -Dtests.seed=CE925CC7CF87CC0F -Dtests.security.manager=true -Dtests.locale=ar-SD -Dtests.timezone=America/Cancun -Druntime.java=8

Reproduces locally?:
Didn't reproduce for me.

Applicable branches:
master and 7.x

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.timeZoneId=America/Los_Angeles&tests.container=org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT&tests.disabledDistributions=WyJvdXRjb21lOnBhc3NlZCJd&tests.sortField=FAILED&tests.test=testContinousEvents&tests.unstableOnly=true

Failure excerpt:


org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT > testContinousEvents FAILED
    java.lang.AssertionError: transform [continuous-histogram-pivot-test] does not progress, state: INDEXING, reason: null
    Expected: a value greater than <1608062383906L>
         but: <1608062360729L> was less than <1608062383906L>
        at __randomizedtesting.SeedInfo.seed([CE925CC7CF87CC0F:F5A42DDC45EC4E59]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.lambda$waitUntilTransformsReachedUpperBound$3(TransformContinuousIT.java:499)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1014)
        at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.waitUntilTransformsReachedUpperBound(TransformContinuousIT.java:497)
        at org.elasticsearch.xpack.transform.integration.continuous.TransformContinuousIT.testContinousEvents(TransformContinuousIT.java:275)
@mark-vieira mark-vieira added >test-failure Triaged test failures from CI :ml/Transform Transform labels Dec 15, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@przemekwitek przemekwitek self-assigned this Dec 16, 2020
@przemekwitek
Copy link
Contributor

przemekwitek commented Dec 16, 2020

I'm seeing a number of errors in the server logs that relate to missing ingest pipeline:

x-pack/plugin/transform/qa/multi-node-tests/build/testclusters/javaRestTest-1/logs/javaRestTest_server.json:3161:{"type": "server", "timestamp": "2020-12-16T05:03:03,358Z", "level": "DEBUG", "component": "o.e.x.t.t.ClientTransformIndexer", "cluster.name": "javaRestTest", "node.name": "javaRestTest-1", "message": "[continuous-histogram-pivot-test] Bulk index experienced [10] failures and at least 1 irrecoverable [pipeline with id [transform-ingest] does not exist].", "cluster.uuid": "8J1ypiXkSF2_EIxJXslouA", "node.id": "KW0_s_TuSB2Rfzlw-TodSQ"  }
x-pack/plugin/transform/qa/multi-node-tests/build/testclusters/javaRestTest-1/logs/javaRestTest_server.json:3163:"stacktrace": ["org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [10] failures and at least 1 irrecoverable [pipeline with id [transform-ingest] does not exist]. Other failures: ",
x-pack/plugin/transform/qa/multi-node-tests/build/testclusters/javaRestTest-1/logs/javaRestTest_server.json:3181:"Caused by: java.lang.IllegalArgumentException: pipeline with id [transform-ingest] does not exist",

Investigating...

@hendrikmuhs
Copy link

Also average runtime have increased in the same time period so something is going on here.

2 days ago I merged coverage for runtime fields, I think that explains the increased runtime. Whether that change also caused the failures sounds likely but is hard to say without inspecting the logs. We currently have a lot of commits (and breaks) coming in.

It would be good to improve error handling in the test to e.g. log the used configuration (the test suite uses a randomized permutation of elasticsearch features). In the past we have seen failures using a certain set of features (e.g. index sort). So this might help to find the common feature quicker.

@przemekwitek
Copy link
Contributor

One more thing that may be related to the increased runtime of this test is that we moved it from single node to multi node a few days ago.
@hendrikmuhs, could you think of ways it could potentially affect correctness?

It would be good to improve error handling in the test to e.g. log the used configuration (the test suite uses a randomized permutation of elasticsearch features). In the past we have seen failures using a certain set of features (e.g. index sort). So this might help to find the common feature quicker.

I'm doing that as part of debugging. I'll raise a PR soon.

@hendrikmuhs
Copy link

One more thing that may be related to the increased runtime of this test is that we moved it from single node to multi node a few days ago.
@hendrikmuhs, could you think of ways it could potentially affect correctness?

No, it should work the same way as before. I can only imagine the runtime increase.

This was referenced Dec 17, 2020
hendrikmuhs pushed a commit that referenced this issue Jan 20, 2021
transforms reports the the last time changes where detected with changes_last_detected_at, however
that doesn't tell a user it searched for changes, this PR adds a field last_search_time to report
when transform searched for changes the last time.

fixes #66410
relates #66367
hendrikmuhs pushed a commit that referenced this issue Jan 21, 2021
…67779)

transforms reports the last time changes where detected with changes_last_detected_at, however
that doesn't tell a user it searched for changes, this PR adds a field last_search_time to report
when transform searched for changes the last time.

fixes #66410
relates #66367
backport #66718
@danhermann
Copy link
Contributor

Seeing a similar test failure here: https://gradle-enterprise.elastic.co/s/yenlhn77pan62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants