ROW Inspector Prioritization ETL #4

Charlie-Henry · 2024-12-20T20:03:32Z

Closes cityofaustin/atd-data-tech#19680

This PR adds a new metric script which generates a priority score for ROW permits based on several performance measures . This was originally piloted by @johnclary 5 years ago over in atd-amanda. This (I believe) faithfully copies every performance measure from that original work and includes the rest of the ETL to retrieve data from AMANDA (the original work was done "manually" with CSV extracts from AMANDA).

Testing

If you want to test the code, I have supplied an env_template, this airflow DAG should point you to where most of the secrets are stored in 1pass. The only new secrets are AGOL, where I used the ATD_Publisher_Scripts account.

To set up the environment, you can either use the docker commands in the readme or set up a python 3.10 venv or conda env and pip install -r requirements.txt.

Step 1. refresh permit and segment data in S3

python amanda/amanda_to_s3.py --query row_inspector_permit_list

python amanda/amanda_to_s3.py --query row_inspector_segment_list

(just now wondering if I should make it possible to pass a list of queries to amanda_to_s3.py)?

Step 2. run the ETL to load data into socrata

python metrics/inspector_prioritization.py

This will replace data in this dataset.

The logging for inspector_prioritization should be pretty clear if it worked successfully. Something like this is the response from the Socrata API, to let you know data was successfully updated:
2024-12-20 13:53:56,184 INFO: {'Errors': 0, 'Rows Created': 2948, 'Rows Updated': 0, 'Rows Deleted': 0}

I also did make some additional changes to refactor some of the other code to use more shared functions in utils.py. Let me know if you want to test everything or you can trust me all the other scripts should continue to work :)

johnclary

Code looks good! I haven't been able to test this because I can't get the python env working on my Mac—with pip or docker.

johnclary · 2024-12-30T16:11:05Z

README.md

+
+`inspector_prioritization.py` "scores" permits based on several metrics to rank permits based on a prioritization for ROW inspectors.
+
+`python metrics/inspector_prioritization.py`


it would be good to document that this script requires that the S3-loading scripts have been run beforehand to ensure it's using fresh data

johnclary · 2024-12-30T16:17:08Z

README.md


 Then, provide the environment variables described in env_template to the docker image:

-`$ docker run -it --env-file env_file dts-right-of-way-reporting:production /bin/bash` 
+`docker run -it --env-file env_file dts-right-of-way-reporting:production /bin/bash` 


this command is missing the atddocker repo name. so it should be:

docker run -it --env-file env_file atddocker/dts-right-of-way-reporting:production /bin/bash

ooh and it would be nice for us local devvies to have the volume mountadded in 😇

docker run -it --env-file env_file -v "$(pwd):/app" atddocker/dts-right-of-way-reporting:production /bin/bash

johnclary · 2024-12-30T16:25:16Z

metrics/inspector_prioritization.py

+import boto3
+import pandas as pd
+import numpy as np
+from arcgis.gis import GIS


edit: i was not able to get this running :/

ignore this 👇

i had to install arcgis in order to get this running—looks like it's missing from the requirements.

annoyingly, i could only install it after pulling the amd64 version of the image. so this:

docker run -it --env-file env_file --platform linux/amd64 -v "$(pwd):/app" atddocker/dts-right-of-way-reporting:production /bin/bash

then this

pip install arcgis

@johnclary I was able to find a patch for the dockerfile to get it to build for me

johnclary · 2024-12-30T16:29:50Z

metrics/inspector_prioritization.py

+    permits["count_segments"] = permits["count_segments"].fillna(0)
+
+    # 10 points for permits with more than 1 segment, 5 points otherwise:
+    permits["count_segment_scoring"] = np.where(permits["count_segments"] > 1, 10, 5)


i don't know if this is really practical to implement, but it could be nice to collect all these magic numbers used for scoring into named vars at the top of the script, or a dict. that would make life easier the next time we need to tweak the scoring parameters.

johnclary · 2024-12-30T16:36:04Z

metrics/inspector_prioritization.py

+        yield data[i : i + batch_size]
+
+
+def retrieve_road_segment_data(segments, gis):


would be nice to have a little more documentation here. looks like you're grabbing the geometry of every street segment that has a permit so that you can assign it ROW inspector zones.

johnclary · 2024-12-30T16:39:09Z

metrics/inspector_prioritization.py

+    if response.features:
+        # It is possible a segment could intersect multiple zones, but we only take the first one.
+        return response.features[0].attributes["ROW_INSPECTOR_ZONE_ID"]
+    return None


this all seems to be working fine! if this needed to run at a high frequency, or if the AGOL API were giving us trouble, it would be cool to write an ETL that makes a big ole lookup table of all the segments by inspector zone and dapcz intersection. this ETL could pull from the lookup table, since it probably would not need to be updated very often.

TracyLinder

I'm by no means qualified to review a PR so I don't have anything of import to say. It's cool to see this side of the dev world though and y'all should invite me more often. ☺️

Charlie-Henry added 8 commits November 14, 2024 13:30

inspector prioritization start

8689a6f

switching to using arcgis instead

3e166d4

not retrieving all segments?

dd93d28

refactor up some crufty code

94d6c85

use AGOL for retrieving segment data

defcbc1

adding inspector zone to output

ead7cfd

readme

1e05a2c

remove $'s

aa9c9e4

Charlie-Henry requested review from frankhereford, mateoclarke, chiaberry, johnclary, TracyLinder, roseeichelmann and tillyw December 20, 2024 20:17

johnclary reviewed Dec 30, 2024

View reviewed changes

TracyLinder reviewed Dec 30, 2024

View reviewed changes

Charlie-Henry added 2 commits December 31, 2024 12:44

docker fix for arcgis

35b4c3c

ROW inspector scoring documentation

a7f2845

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROW Inspector Prioritization ETL #4

ROW Inspector Prioritization ETL #4

Charlie-Henry commented Dec 20, 2024 •

edited

Loading

johnclary left a comment

johnclary Dec 30, 2024

johnclary Dec 30, 2024

johnclary Dec 30, 2024

johnclary Dec 30, 2024 •

edited

Loading

Charlie-Henry Dec 31, 2024

johnclary Dec 30, 2024

johnclary Dec 30, 2024

johnclary Dec 30, 2024

TracyLinder left a comment


		`inspector_prioritization.py` "scores" permits based on several metrics to rank permits based on a prioritization for ROW inspectors.

		`python metrics/inspector_prioritization.py`

		yield data[i : i + batch_size]


		def retrieve_road_segment_data(segments, gis):

ROW Inspector Prioritization ETL #4

Are you sure you want to change the base?

ROW Inspector Prioritization ETL #4

Conversation

Charlie-Henry commented Dec 20, 2024 • edited Loading

Testing

johnclary left a comment

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Charlie-Henry Dec 31, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

johnclary Dec 30, 2024

Choose a reason for hiding this comment

TracyLinder left a comment

Choose a reason for hiding this comment

Charlie-Henry commented Dec 20, 2024 •

edited

Loading

johnclary Dec 30, 2024 •

edited

Loading