Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectors-ci: auto retry on DaggerError #29081

Merged
merged 3 commits into from
Aug 4, 2023

Conversation

alafanechere
Copy link
Contributor

What

Closes #28796

Our pipeline might occasionally face transient DaggerError (eg failure reaching out to DockerHub).
To avoid pipeline flakyness we can retry a step run on this type of failure

How

  • Make the retry logic more straightforward
  • Add a max_dagger_error_retries attribute on Step which defaults to 3
  • Test it

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Aug 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii
Copy link
Collaborator

source-openweather test report (commit daec730fa3) - ✅

⏲️ Total pipeline duration: 01mn31s

Step Result
Validate airbyte-integrations/connectors/source-openweather/metadata.yaml
Connector version semver check
QA checks
Code format checks
Connector package install
Build source-openweather docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-openweather test

@alafanechere
Copy link
Contributor Author

Manual testing:

This reverts commit daec730.
@alafanechere alafanechere requested a review from a team August 4, 2023 12:32
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Aug 4, 2023
@alafanechere alafanechere enabled auto-merge (squash) August 4, 2023 12:33
@@ -91,13 +91,16 @@ class Step(ABC):

title: ClassVar[str]
max_retries: ClassVar[int] = 0
max_dagger_error_retries: ClassVar[int] = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 retries - the classic starting place

should_log: ClassVar[bool] = True
success_exit_code: ClassVar[int] = 0
skipped_exit_code: ClassVar[int] = None
# The max duration of a step run. If the step run for more than this duration it will be considered as timed out.
# The default of 5 hours is arbitrary and can be changed if needed.
max_duration: ClassVar[timedelta] = timedelta(hours=5)

retry_delay = timedelta(seconds=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how we can configure this as needed

@alafanechere alafanechere merged commit 6bf3dd1 into master Aug 4, 2023
@alafanechere alafanechere deleted the augustin/connectors-ci/more-retries branch August 4, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transient build errors should be retried
3 participants