Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel and fail downloads if exception happened #2710

Merged

Conversation

artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Jul 16, 2024

Solving problems found during CI runs:

If something goes wrong when Extractor is extracting the documents, all the scheduled downloads are still finished. That could mean that if there's a queue of 1000 documents waiting to be processed, even if the job fails all 1000 documents will be downloaded and extracted.

Additionally, there's no error checking for failed downloads - if a download fails, connector keeps working. Even if all downloads fail, connector still keeps working.

We instead want to cancel the syncs in the queue.

This PR will be followed up by #2671: it'll be possible to set up error thresholds for this logic.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

Release Note

  • Terminating a sync will also terminate the pending content downloads.
  • Failing downloads will terminate running syncs.

@artem-shelkovnikov artem-shelkovnikov marked this pull request as ready for review July 17, 2024 10:06
@artem-shelkovnikov artem-shelkovnikov requested a review from a team July 17, 2024 10:06
Comment on lines +609 to 615
except Exception as ex:
self._logger.error(f"Extractor failed with an error: {ex}")
lazy_downloads.cancel()
raise
finally:
# wait for all downloads to be finished
await lazy_downloads.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a clear answer on this; is the finally block still called if an error is raised in the caught exception block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, see this for example:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the finally block is awaiting all of the downloads even if there's an error. But I guess lazy_downloads.cancel() is wiping everything, so that's okay?
(Asking because the PR title is Cancel and fail downloads if exception happened but it looks like it will still eventually await the downloads).

Copy link
Member Author

@artem-shelkovnikov artem-shelkovnikov Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's intended:

Cancellation does not mean that the task is immediately dead - we need to make sure that the task has an opportunity to gracefully cancel.

For example if the cancelled task catches asyncio.CancelledError and does something, then it will still need to be awaited.

We use similar logic here:

for task in pending:
task.cancel()
try:
await task
except asyncio.CancelledError:
logger.error("Service did not handle cancellation gracefully.")

task.cancelI() injects the throw of asyncio.CancelledError where it can (pretty much any await or async with statement). After that the code can catch it or not. The except part here is checking that task re-raised asyncio.CancelledError cause it could not handle it - there was no except asyncio.CancelledError inside task or this error was re-raised.

Hope the description makes sense, if not I can prepare an illustrative example of this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really helpful, thanks!

@artem-shelkovnikov artem-shelkovnikov enabled auto-merge (squash) July 17, 2024 14:06
@artem-shelkovnikov artem-shelkovnikov merged commit 0ccfeac into main Jul 17, 2024
2 checks passed
@artem-shelkovnikov artem-shelkovnikov deleted the artem/cancel-download-tasks-if-something-goes-wrong branch July 17, 2024 14:12
Copy link

💔 Failed to create backport PR(s)

The backport operation could not be completed due to the following error:
There are no branches to backport to. Aborting.

The backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 2710 --autoMerge --autoMergeMethod squash

Copy link

💚 Backport PR(s) successfully created

Status Branch Result
8.15 #2715

This backport PR will be merged automatically after passing CI.

Copy link

💔 Failed to create backport PR(s)

The backport operation could not be completed due to the following error:
There are no branches to backport to. Aborting.

The backport PRs will be merged automatically after passing CI.

To backport manually run:
backport --pr 2710 --autoMerge --autoMergeMethod squash

seanstory pushed a commit that referenced this pull request Jul 17, 2024
Co-authored-by: Artem Shelkovnikov <artem.shelkovnikov@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants