Better error handling when there are no crashes to process #1518

johnclary · 2024-08-21T16:41:37Z

Associated issues

Update CRIS import to handle when there is no crashReports directory atd-data-tech#18719

This issue cropped up when processing ~30 daily CRIS extracts. It seems that occasionally, but not always, the crashReports directory is completely missing from the extracts with no crashes. The CSV files are always present.

This script makes the following changes:

Skips PDF processing entirely when (1) the script is invoked with both --csv and --pdf and (2) there were no crashes found in the CSV
Adds a final test at the end of processing to make sure that the number of PDFs processed matches the number of crashes processed via CSV (again, only when the script is invoked with both --csv and --pdf)
Raises an error if no extracts are retrieved from the S3 bucket. This was an earlier oversight.

Testing

I've left ~30 extracts in the dev inbox which you can use for testing.

Start your local stack
Run the cris import with the --s3-download command:

$ ./cris_import.py --csv --pdf --s3-download

Ship list

Check migrations for any conflicts with latest migrations in master branch
Confirm Hasura role permissions for necessary access
Code reviewed
Product manager approved

johnclary · 2024-08-21T16:43:10Z

atd-etl/cris_import/cris_import.py

-        return
+    if cli_args.s3_download and not extracts_todo:
+        # always short circuit if we find nothing in S3
+        raise Exception("No extracts found in S3 bucket")


Eeek—I thought we already had a check like this in place but it must have dropped off during one of the later refactors

I also could have sworn we were raising an exception already

johnclary · 2024-08-21T16:44:13Z

atd-etl/cris_import/utils/process_pdfs.py


-    if not pdf_count:
-        raise IOError("No PDFs found in extract")


we don't need this check anymore since we're handling this elsewhere in the script

chiaberry

I tested, but didnt test it when there were no crashes to process.

frankhereford

Pull up them anchors and 🚢🚢🚢🚢

frankhereford · 2024-08-21T19:39:44Z

atd-etl/cris_import/cris_import.py

+
+        no_crashes_found = (
+            True if cli_args.csv and records_processed["crashes"] == 0 else False
+        )
+
+        if cli_args.pdf and not no_crashes_found:


Much more elegant than my hacky fix. Thank you!

johnclary · 2024-08-22T15:23:44Z

Thank you for your reviews!

add error checking/handling based on number of crashes and pdfs

998900c

johnclary commented Aug 21, 2024

View reviewed changes

johnclary requested review from frankhereford, chiaberry, tillyw, mddilley and roseeichelmann August 21, 2024 16:51

johnclary mentioned this pull request Aug 21, 2024

DAG updates for new VZ data model cityofaustin/atd-airflow#240

Merged

3 tasks

chiaberry approved these changes Aug 21, 2024

View reviewed changes

frankhereford approved these changes Aug 21, 2024

View reviewed changes

frankhereford mentioned this pull request Aug 21, 2024

Add directory existence check to CRIS import script #1519

Closed

4 tasks

johnclary merged commit 51dd184 into master Aug 22, 2024
9 checks passed

johnclary deleted the 18719-no-crashreports-dir branch August 28, 2024 00:40

This was referenced Sep 9, 2024

The CRIS import ETL is failing because there is no crashReports directory in the extract cityofaustin/atd-data-tech#18952

Closed

Create ETL to check for missing crashes with missing CR3 pdfs cityofaustin/atd-data-tech#18954

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error handling when there are no crashes to process #1518

Better error handling when there are no crashes to process #1518

johnclary commented Aug 21, 2024 •

edited

Loading

johnclary Aug 21, 2024

chiaberry Aug 21, 2024

johnclary Aug 21, 2024 •

edited

Loading

chiaberry left a comment

frankhereford left a comment •

edited

Loading

frankhereford Aug 21, 2024

johnclary commented Aug 22, 2024

Better error handling when there are no crashes to process #1518

Better error handling when there are no crashes to process #1518

Conversation

johnclary commented Aug 21, 2024 • edited Loading

Associated issues

Testing

Ship list

johnclary Aug 21, 2024

Choose a reason for hiding this comment

chiaberry Aug 21, 2024

Choose a reason for hiding this comment

johnclary Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

chiaberry left a comment

Choose a reason for hiding this comment

frankhereford left a comment • edited Loading

Choose a reason for hiding this comment

frankhereford Aug 21, 2024

Choose a reason for hiding this comment

johnclary commented Aug 22, 2024

johnclary commented Aug 21, 2024 •

edited

Loading

johnclary Aug 21, 2024 •

edited

Loading

frankhereford left a comment •

edited

Loading