Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base task: attempt to log CheckpointLogger errors for on_timeout and on_failure #794

Closed
wants to merge 1 commit into from

Conversation

matt-codecov
Copy link
Contributor

@matt-codecov matt-codecov commented Oct 16, 2024

the UploadFlow reliability metrics rely on there being a checkpoint logged for every way the flow can end, whether it's a success or a failure. looking at the data, it appears 40-60% of flow endings aren't captured. this means we can't totally trust our calculated reliability rate

i am not totally clear on how celery calls on_failure() and on_timeout(), but this PR attempts to load any checkpoints data it can find in them and log an error event. ideally this will increase our share of captured endings and make the metrics more accurate

on_timeout() is a member on a celery Request which we've subclassed as BaseCodecovRequest. it doesn't receive kwargs as an argument, but according to docs it has a kwargs property which presumably holds the kwargs that the task was scheduled with. on_failure() is a member on the task and one of its arguments is kwargs. a non-retried UploadTask will not have any checkpoints data in its kwargs, but i think a retried UploadTask and future tasks should at least have a beginning checkpoint.

doing this creates a possibility that some endings will be double-counted:

  • a SoftTimeLimitException may result in both on_timeout() and on_failure() being called
  • a SoftTimeLimitException allows the task to clean up gracefully. if it takes too long, a hard TimeLimitException is raised? and it's possible if this happens that on_timeout() will be called twice
  • generally if we catch an error inside the celery task we still tell celery the task finished successfully and go to on_success(), but i don't know for sure that there are not cases where we log an error and then also wind up in on_failure() or on_retry()

@matt-codecov matt-codecov marked this pull request as draft October 16, 2024 22:18
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.00%. Comparing base (48ce427) to head (1cd2e55).
Report is 35 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
tasks/base.py 77.77% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #794      +/-   ##
==========================================
- Coverage   98.01%   98.00%   -0.01%     
==========================================
  Files         443      443              
  Lines       36615    36637      +22     
==========================================
+ Hits        35887    35905      +18     
- Misses        728      732       +4     
Flag Coverage Δ
integration 98.00% <82.60%> (-0.01%) ⬇️
unit 98.00% <82.60%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 95.85% <82.60%> (-0.03%) ⬇️
OutsideTasks 97.99% <100.00%> (+<0.01%) ⬆️
Files with missing lines Coverage Δ
helpers/checkpoint_logger/flows.py 100.00% <100.00%> (ø)
tasks/base.py 95.04% <77.77%> (-1.69%) ⬇️

@codecov-notifications
Copy link

codecov-notifications bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
tasks/base.py 77.77% 4 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #794      +/-   ##
==========================================
- Coverage   98.01%   98.00%   -0.01%     
==========================================
  Files         443      443              
  Lines       36615    36637      +22     
==========================================
+ Hits        35887    35905      +18     
- Misses        728      732       +4     
Flag Coverage Δ
integration 98.00% <82.60%> (-0.01%) ⬇️
unit 98.00% <82.60%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 95.85% <82.60%> (-0.03%) ⬇️
OutsideTasks 97.99% <100.00%> (+<0.01%) ⬆️
Files with missing lines Coverage Δ
helpers/checkpoint_logger/flows.py 100.00% <100.00%> (ø)
tasks/base.py 95.04% <77.77%> (-1.69%) ⬇️

@codecov-qa
Copy link

codecov-qa bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.00%. Comparing base (48ce427) to head (1cd2e55).
Report is 35 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
tasks/base.py 77.77% 4 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #794      +/-   ##
==========================================
- Coverage   98.01%   98.00%   -0.01%     
==========================================
  Files         443      443              
  Lines       36615    36637      +22     
==========================================
+ Hits        35887    35905      +18     
- Misses        728      732       +4     
Flag Coverage Δ
integration 98.00% <82.60%> (-0.01%) ⬇️
unit 98.00% <82.60%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 95.85% <82.60%> (-0.03%) ⬇️
OutsideTasks 97.99% <100.00%> (+<0.01%) ⬆️
Files with missing lines Coverage Δ
helpers/checkpoint_logger/flows.py 100.00% <100.00%> (ø)
tasks/base.py 95.04% <77.77%> (-1.69%) ⬇️

Copy link

codecov-public-qa bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.00%. Comparing base (48ce427) to head (1cd2e55).

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #794      +/-   ##
==========================================
- Coverage   98.01%   98.00%   -0.01%     
==========================================
  Files         443      443              
  Lines       36615    36637      +22     
==========================================
+ Hits        35887    35905      +18     
- Misses        728      732       +4     
Flag Coverage Δ
integration 98.00% <82.60%> (-0.01%) ⬇️
unit 98.00% <82.60%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 95.85% <82.60%> (-0.03%) ⬇️
OutsideTasks 97.99% <100.00%> (+<0.01%) ⬆️
Files Coverage Δ
helpers/checkpoint_logger/flows.py 100.00% <100.00%> (ø)
tasks/base.py 95.04% <77.77%> (-1.69%) ⬇️

@matt-codecov
Copy link
Contributor Author

#823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant