Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error handling when there are no crashes to process #1518

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

johnclary
Copy link
Member

@johnclary johnclary commented Aug 21, 2024

Associated issues

This issue cropped up when processing ~30 daily CRIS extracts. It seems that occasionally, but not always, the crashReports directory is completely missing from the extracts with no crashes. The CSV files are always present.

This script makes the following changes:

  • Skips PDF processing entirely when (1) the script is invoked with both --csv and --pdf and (2) there were no crashes found in the CSV
  • Adds a final test at the end of processing to make sure that the number of PDFs processed matches the number of crashes processed via CSV (again, only when the script is invoked with both --csv and --pdf)
  • Raises an error if no extracts are retrieved from the S3 bucket. This was an earlier oversight.

Testing

I've left ~30 extracts in the dev inbox which you can use for testing.

  1. Start your local stack
  2. Run the cris import with the --s3-download command:
$ ./cris_import.py --csv --pdf --s3-download

Ship list

  • Check migrations for any conflicts with latest migrations in master branch
  • Confirm Hasura role permissions for necessary access
  • Code reviewed
  • Product manager approved

return
if cli_args.s3_download and not extracts_todo:
# always short circuit if we find nothing in S3
raise Exception("No extracts found in S3 bucket")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeek—I thought we already had a check like this in place but it must have dropped off during one of the later refactors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also could have sworn we were raising an exception already


if not pdf_count:
raise IOError("No PDFs found in extract")
Copy link
Member Author

@johnclary johnclary Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this check anymore since we're handling this elsewhere in the script

Copy link
Member

@chiaberry chiaberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested, but didnt test it when there were no crashes to process.

Copy link
Member

@frankhereford frankhereford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull up them anchors and 🚢🚢🚢🚢

Comment on lines +53 to +58

no_crashes_found = (
True if cli_args.csv and records_processed["crashes"] == 0 else False
)

if cli_args.pdf and not no_crashes_found:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more elegant than my hacky fix. Thank you!

@johnclary
Copy link
Member Author

Thank you for your reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants