[testing] Reduce flaky tests by retrying git failures #9907

emmyoop · 2024-04-12T14:56:34Z

Housekeeping

I am a maintainer of dbt-core

Short description

We have a lot of tests that are failing because of Git connection issues. Sometimes tox fails to install all dependencies and that causes the entire test run to fail without actually running any tests. This makes our monitoring noisy.

Suggested approach: leveraging something the nick-fields/retry@v3 action (example but in the tox invocation here)

Acceptance criteria

Anytime we use git when testing, have retry logic

Suggested Tests

This task is specifically for tests

-- can force a test to fail in a commit & observe the retry works as expected at the integration group level

Impact to Other Teams

Adapters team won't be impacted but may be interested if we come up with a solution

Will backports be required?

backport as far as we can to reduce this noise

Context

log output from test failing on tox

Run tox -- --ddtrace
integration: install_deps> python -I -m pip install -r dev-requirements.txt -r editable-requirements.txt
  Running command git clone --filter=blob:none --quiet https://github.com/dbt-labs/dbt-adapters.git /tmp/pip-req-build-g9zkv3vu
  error: RPC failed; curl 16 Error in the HTTP2 framing layer
  fatal: expected 'packfile'
  fatal: could not fetch 22b2ad3f683cca452f28320c0aba8bb95933ca6e from promisor remote
Collecting git+https://github.com/dbt-labs/dbt-adapters.git@main (from -r dev-requirements.txt (line 1))
  Cloning https://github.com/dbt-labs/dbt-adapters.git (to revision main) to /tmp/pip-req-build-g9zkv3vu
integration: exit 1 (2.55 seconds) /home/runner/work/dbt-core/dbt-core> python -I -m pip install -r dev-requirements.txt -r editable-requirements.txt pid=1980
  warning: Clone succeeded, but checkout failed.
  You can inspect what was checked out with 'git status'
  and retry with 'git restore --source=HEAD :/'

  error: subprocess-exited-with-error
  
  × git clone --filter=blob:none --quiet https://github.com/dbt-labs/dbt-adapters.git /tmp/pip-req-build-g9zkv3vu did not run successfully.
  │ exit code: 1[28](https://github.com/dbt-labs/dbt-core/actions/runs/8633734890/job/23667503237#step:8:29)
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/dbt-labs/dbt-adapters.git /tmp/pip-req-build-g9zkv3vu did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
  integration: FAIL code 1 (5.44 seconds)
  evaluation failed :( (5.50 seconds)

Error: Process completed with exit code 1.

Sample of tests marked as flaky but are likely just connection issues. There may not be a solution when there's a longer GitHub outage. Look through #7808 for other possible failures.

#9906

max retries exceeded
#9905
#9903

timeout
#9902
#9900

Note: integration tests are run with the workflow_dispatch trigger in scheduled testing here. typically it would be run with workflow_call trigger but isn't because it's special (comment)

The text was updated successfully, but these errors were encountered:

MichelleArk · 2024-04-16T18:28:11Z

From refinement:

At what level should the retry logic live? Options: GH workflow (all we can really do if we fail at the tox step), using existing retry/fallback code
Could consider marking all gh-sensitive tests into a group and running them on their own test worker

emmyoop · 2024-04-16T19:52:35Z

Hit this again on 1.3 and 1.4 today

aranke · 2024-05-08T15:22:18Z

@emmyoop It looks like pip already retries network connections up to 5 times: https://pip.pypa.io/en/stable/cli/pip/#cmdoption-retries

Given this information, I'm not sure if adding retries to our test runner (tox in this case) would improve the situation.

Similar issue in a GCP repo: GoogleCloudPlatform/python-docs-samples#3485 (comment)

Thoughts?

…ures

…ures (#10137)

FishtownBuildBot · 2024-05-14T11:57:44Z

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#5504

…ures (#10137) (cherry picked from commit 751139d)

…ts due to network failures

…ures (#10137)

…ts due to network failures (#10178) * [Backport 1.0.latest] Fix #9907: Add retry to tox to reduce flaky tests due to network failures * Update main.yml

…ts due to network failures (#10179) * [Backport 1.1.latest] Fix #9907: Add retry to tox to reduce flaky tests due to network failures * Update main.yml

…ts due to network failures (#10182)

…ts due to network failures (#10180)

…ts due to network failures (#10183)

…ts due to network failures (#10137) (#10184)

…ts due to network failures (#10137) (#10186)

…ts due to network failures (#10143) (cherry picked from commit 751139d) Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>

…ts due to network failures (#10142) (cherry picked from commit 751139d) Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>

mirnawong1 · 2024-07-17T13:09:49Z

hey @aranke , it looks like this opened a docs issue -- can I double check what customer-facing changes are needed? from skimming this issue, it looks like this is more internal testing?

emmyoop added user docs [docs.getdbt.com] Needs better documentation tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality labels Apr 12, 2024

MichelleArk mentioned this issue Apr 16, 2024

[CT-2656] Flaky Tests #7808

Closed

martynydbt assigned aranke Apr 25, 2024

aranke added a commit that referenced this issue May 13, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

6a0fe31

…ures

aranke mentioned this issue May 13, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network failures #10137

Merged

5 tasks

aranke closed this as completed in #10137 May 14, 2024

aranke added a commit that referenced this issue May 14, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

751139d

…ures (#10137)

github-actions bot pushed a commit that referenced this issue May 14, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

6965834

…ures (#10137) (cherry picked from commit 751139d)

github-actions bot pushed a commit that referenced this issue May 14, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

5662437

…ures (#10137) (cherry picked from commit 751139d)

aranke added a commit that referenced this issue May 20, 2024

[Backport 1.0.latest] Fix #9907: Add retry to tox to reduce flaky tes…

3656fd3

…ts due to network failures

aranke mentioned this issue May 20, 2024

[Backport 1.0.latest] Fix #9907: Add retry to tox to reduce flaky tests due to network failures #10178

Merged

5 tasks

aranke added a commit that referenced this issue May 20, 2024

[Backport 1.1.latest] Fix #9907: Add retry to tox to reduce flaky tes…

1a5e7e9

…ts due to network failures

aranke mentioned this issue May 20, 2024

[Backport 1.1.latest] Fix #9907: Add retry to tox to reduce flaky tests due to network failures #10179

Merged

5 tasks

aranke added a commit that referenced this issue May 20, 2024

[Backport 1.3.latest] Fix #9907: Add retry to tox to reduce flaky tes…

122edd8

…ts due to network failures

aranke added a commit that referenced this issue May 20, 2024

[Backport 1.2.latest] Fix #9907: Add retry to tox to reduce flaky tes…

48abe2a

…ts due to network failures

aranke added a commit that referenced this issue May 20, 2024

[Backport 1.4.latest] Fix #9907: Add retry to tox to reduce flaky tes…

249738d

…ts due to network failures

aranke added a commit that referenced this issue May 20, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

10ac032

…ures (#10137)

aranke added a commit that referenced this issue May 20, 2024

Fix #9907: Add retry to tox to reduce flaky tests due to network fail…

5c3f770

…ures (#10137)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.2.latest] Fix #9907: Add retry to tox to reduce flaky tes…

71ce835

…ts due to network failures (#10182)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.3.latest] Fix #9907: Add retry to tox to reduce flaky tes…

78901d6

…ts due to network failures (#10180)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.4.latest] Fix #9907: Add retry to tox to reduce flaky tes…

ac9a93c

…ts due to network failures (#10183)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.5.latest] Fix #9907: Add retry to tox to reduce flaky tes…

a6e46a6

…ts due to network failures (#10137) (#10184)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.6.latest] Fix #9907: Add retry to tox to reduce flaky tes…

beab696

…ts due to network failures (#10137) (#10186)

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.7.latest] Fix #9907: Add retry to tox to reduce flaky tes…

d0ddeea

…ts due to network failures (#10143) (cherry picked from commit 751139d) Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>

aranke added a commit that referenced this issue May 21, 2024

[Backport 1.8.latest] Fix #9907: Add retry to tox to reduce flaky tes…

3887949

…ts due to network failures (#10142) (cherry picked from commit 751139d) Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>

aranke mentioned this issue Jul 12, 2024

test env vars casing 1 6 latest #10437

Draft

5 tasks

dbeatty10 mentioned this issue Oct 4, 2024

[Core] Reduce flaky tests by retrying git failures dbt-labs/docs.getdbt.com#5504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[testing] Reduce flaky tests by retrying git failures #9907

[testing] Reduce flaky tests by retrying git failures #9907

emmyoop commented Apr 12, 2024 •

edited by MichelleArk

Loading

MichelleArk commented Apr 16, 2024 •

edited

Loading

emmyoop commented Apr 16, 2024

aranke commented May 8, 2024 •

edited

Loading

FishtownBuildBot commented May 14, 2024

mirnawong1 commented Jul 17, 2024

[testing] Reduce flaky tests by retrying git failures #9907

[testing] Reduce flaky tests by retrying git failures #9907

Comments

emmyoop commented Apr 12, 2024 • edited by MichelleArk Loading

Housekeeping

Short description

Acceptance criteria

Suggested Tests

Impact to Other Teams

Will backports be required?

Context

MichelleArk commented Apr 16, 2024 • edited Loading

emmyoop commented Apr 16, 2024

aranke commented May 8, 2024 • edited Loading

FishtownBuildBot commented May 14, 2024

mirnawong1 commented Jul 17, 2024

emmyoop commented Apr 12, 2024 •

edited by MichelleArk

Loading

MichelleArk commented Apr 16, 2024 •

edited

Loading

aranke commented May 8, 2024 •

edited

Loading