make flaky test less sensitive #19976

mfsiega-airbyte · 2022-12-01T14:28:49Z

What

Make flaky test less sensitive.

How

Don't insist that only a single job be run. If the sync needed a retry for any other reason then the test will fail. This is too sensitive, and the flakiness is interfering with normal dev process.

supertopher · 2022-12-01T15:24:39Z

this looks great to me, but I'm pretty short on context for this part of the codebase so I'm shying away from approval.
thanks for working on this so quickly

me and Michael had a DM conversation about this. I think my team can do more to set expectations around responsibility. I'll have a more clear DRI and better process going forward. Thanks for helping us with the first run

lmossman · 2022-12-01T19:48:31Z

...rc/acceptanceTests/java/io/airbyte/test/acceptance/ContainerOrchestratorAcceptanceTests.java

-    final long numAttempts = apiClient.getJobsApi()
-        .getJobInfo(new JobIdRequestBody().id(connectionSyncRead.getJob().getId()))
-        .getAttempts()
-        .size();
-
-    // it should be able to accomplish the resume without an additional attempt!
-    assertEquals(1, numAttempts);


I think the reason this assertion was being made was that we explicitly wanted to test that if the worker is scaled down and back up in the middle of a sync (e.g. deploying a new change to cloud prod), then those syncs will NOT be failed but will instead continue on as normal and succeed.

If we get rid of this assertion, it could mean that a regression where scaling down a working in the middle of a sync actually causes that sync to fail will go unnoticed.

@mfsiega-airbyte Do we have a sense of why these sync attempts are failing in CI? I feel like we should be able to rely on the fact that our acceptance test sync attempts are succeeding, otherwise other tests will likely be breaking as well.

Here's my understanding - bear in mind that I don't actually have any context on this test, beyond looking at it today.

This test (even after this PR) does validate that the sync succeeds even if the worked is scaled down and back up. Note that in the preceding line we waitForSuccessfulJob - if the sync fails, waitForSuccessfulJob will eventually throw and the test will fail.

What the assertion -- deleted by this PR -- is verifying is that it succeeds within a single attempt. Is there a reason that it's important that it succeeds within a single attempt? And in other places where our acceptance tests look for successful syncs, do they also verify that it succeeds within a single attempt?

If not, it's hard to see why we're verifying that here (except I guess that it's a cool implementation detail).

That's a good point about our other acceptance tests.

I think this was trying to ensure that scaling down and up doesn't actually result in any "failures" at all, whether at the attempt or the overall sync level (i.e. it should be invisible to the user that the scaling down/up happened at all). But since our other tests are not ensuring this, it probably doesn't make sense to do that here. And as long as the overall sync succeeds, that's the more important part.

Given that, I think I'm fine with this change. But it might be worthwhile to separately look into why these attempts are failing, since I think that is still unexpected.

Agree on all of it. I see what you mean about scaling being invisible.

I'll go ahead and merge this but agree we should understand why some attempts are failing, and ideally we'd be able to re-enable these.

make flaky test less sensitive

6f9a814

mfsiega-airbyte requested review from colesnodgrass, lmossman and supertopher December 1, 2022 14:30

mfsiega-airbyte temporarily deployed to more-secrets December 1, 2022 14:31 Inactive

Fix pmd in flaky test

ae37fa3

mfsiega-airbyte temporarily deployed to more-secrets December 1, 2022 15:07 Inactive

fix file formatting

5bf1a02

mfsiega-airbyte temporarily deployed to more-secrets December 1, 2022 15:56 Inactive

lmossman reviewed Dec 1, 2022

View reviewed changes

lmossman approved these changes Dec 1, 2022

View reviewed changes

mfsiega-airbyte merged commit b4a334e into master Dec 1, 2022

mfsiega-airbyte deleted the msiega/fix-flaky-test branch December 1, 2022 20:55

github-actions bot mentioned this pull request Dec 2, 2022

Bump helm chart version reference to 0.42.4 #20029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make flaky test less sensitive #19976

make flaky test less sensitive #19976

mfsiega-airbyte commented Dec 1, 2022 •

edited

Loading

supertopher commented Dec 1, 2022

lmossman Dec 1, 2022 •

edited

Loading

mfsiega-airbyte Dec 1, 2022

lmossman Dec 1, 2022

mfsiega-airbyte Dec 1, 2022

make flaky test less sensitive #19976

make flaky test less sensitive #19976

Conversation

mfsiega-airbyte commented Dec 1, 2022 • edited Loading

What

How

supertopher commented Dec 1, 2022

lmossman Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

mfsiega-airbyte Dec 1, 2022

Choose a reason for hiding this comment

lmossman Dec 1, 2022

Choose a reason for hiding this comment

mfsiega-airbyte Dec 1, 2022

Choose a reason for hiding this comment

mfsiega-airbyte commented Dec 1, 2022 •

edited

Loading

lmossman Dec 1, 2022 •

edited

Loading