-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix corrupted bare Git repository recovery in DAG bundles #56206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix corrupted bare Git repository recovery in DAG bundles #56206
Conversation
prdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required, but just a thought, we could use tenacity, here instead of a custom retry loop. It might make the retry/backoff logic easier to read/maintain. For example:
@retry(stop=stop_after_attempt(3), wait=wait_exponential(), reraise=True)
def clone_bare_repo(url, path, env=None):
return Repo.clone_from(url, path, bare=True, env=env)|
cc: @jedcunningham @ephraimbuddy @kaxil @jscheffl When using gitbundles with edge workers (EdgeExectutor) it could happen that the git connection could be unstable causing git clone/ git bare clone to fail. If the bare clone is broken, all subsequent tasks on the worker fail unless you manually ssh onto the machine to tidy up the bare repo. This is an attempt to self heal Im hoping to get this in the next wave of provider release. cc: @eladkal
|
good idea! |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass
8465619 to
1b238ca
Compare
|
@potiuk this looks like an unrelated failure? |
1b238ca to
03de448
Compare
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this looks good. I would have preferred to catch other AirflowExceptions directly in the PR and not reverting these - breaking change doe not apply in my eyes also it is no Dag or user code touching these exceptions.... Now some other PR needs to clean this... but anyway LGTM.
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @ephraimbuddy ?
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions

When using git DAG bundles, corrupted bare repositories can cause all tasks
landing on a host to fail with InvalidGitRepositoryError. This adds retry
logic that detects corrupted bare repositories, cleans them up, and attempts
to re-clone them once before failing.
Changes: