-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROW Inspector Prioritization ETL #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! I haven't been able to test this because I can't get the python env working on my Mac—with pip or docker.
|
||
`inspector_prioritization.py` "scores" permits based on several metrics to rank permits based on a prioritization for ROW inspectors. | ||
|
||
`python metrics/inspector_prioritization.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to document that this script requires that the S3-loading scripts have been run beforehand to ensure it's using fresh data
|
||
Then, provide the environment variables described in env_template to the docker image: | ||
|
||
`$ docker run -it --env-file env_file dts-right-of-way-reporting:production /bin/bash` | ||
`docker run -it --env-file env_file dts-right-of-way-reporting:production /bin/bash` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this command is missing the atddocker
repo name. so it should be:
docker run -it --env-file env_file atddocker/dts-right-of-way-reporting:production /bin/bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh and it would be nice for us local devvies to have the volume mountadded in 😇
docker run -it --env-file env_file -v "$(pwd):/app" atddocker/dts-right-of-way-reporting:production /bin/bash
import boto3 | ||
import pandas as pd | ||
import numpy as np | ||
from arcgis.gis import GIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: i was not able to get this running :/
ignore this 👇
i had to install arcgis
in order to get this running—looks like it's missing from the requirements.
annoyingly, i could only install it after pulling the amd64 version of the image. so this:
docker run -it --env-file env_file --platform linux/amd64 -v "$(pwd):/app" atddocker/dts-right-of-way-reporting:production /bin/bash
then this
pip install arcgis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnclary I was able to find a patch for the dockerfile to get it to build for me
permits["count_segments"] = permits["count_segments"].fillna(0) | ||
|
||
# 10 points for permits with more than 1 segment, 5 points otherwise: | ||
permits["count_segment_scoring"] = np.where(permits["count_segments"] > 1, 10, 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know if this is really practical to implement, but it could be nice to collect all these magic numbers used for scoring into named vars at the top of the script, or a dict. that would make life easier the next time we need to tweak the scoring parameters.
yield data[i : i + batch_size] | ||
|
||
|
||
def retrieve_road_segment_data(segments, gis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have a little more documentation here. looks like you're grabbing the geometry of every street segment that has a permit so that you can assign it ROW inspector zones.
if response.features: | ||
# It is possible a segment could intersect multiple zones, but we only take the first one. | ||
return response.features[0].attributes["ROW_INSPECTOR_ZONE_ID"] | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all seems to be working fine! if this needed to run at a high frequency, or if the AGOL API were giving us trouble, it would be cool to write an ETL that makes a big ole lookup table of all the segments by inspector zone and dapcz intersection. this ETL could pull from the lookup table, since it probably would not need to be updated very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm by no means qualified to review a PR so I don't have anything of import to say. It's cool to see this side of the dev world though and y'all should invite me more often.
Closes cityofaustin/atd-data-tech#19680
This PR adds a new
metric
script which generates a priority score for ROW permits based on several performance measures . This was originally piloted by @johnclary 5 years ago over in atd-amanda. This (I believe) faithfully copies every performance measure from that original work and includes the rest of the ETL to retrieve data from AMANDA (the original work was done "manually" with CSV extracts from AMANDA).Testing
If you want to test the code, I have supplied an env_template, this airflow DAG should point you to where most of the secrets are stored in 1pass. The only new secrets are AGOL, where I used the
ATD_Publisher_Scripts
account.To set up the environment, you can either use the docker commands in the readme or set up a python 3.10 venv or conda env and
pip install -r requirements.txt
.Step 1. refresh permit and segment data in S3
(just now wondering if I should make it possible to pass a list of queries to
amanda_to_s3.py
)?Step 2. run the ETL to load data into socrata
This will replace data in this dataset.
The logging for
inspector_prioritization
should be pretty clear if it worked successfully. Something like this is the response from the Socrata API, to let you know data was successfully updated:2024-12-20 13:53:56,184 INFO: {'Errors': 0, 'Rows Created': 2948, 'Rows Updated': 0, 'Rows Deleted': 0}
I also did make some additional changes to refactor some of the other code to use more shared functions in
utils.py
. Let me know if you want to test everything or you can trust me all the other scripts should continue to work :)