Update comparison tool #2056

lucasmbrown-usds · 2022-11-01T20:25:57Z

Change fields.

Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.

Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.

* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

* Remove code that drops Guam and USVI from ETL * Add back code for dropping rows by FIPS code We may want this functionality, so let's keep it and just make the constant currently be an empty array. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

Removing HOLC calculation from score narwhal.

Rescales linguistic isolation to drop puerto rico

adds leaky underground storage tanks

also includes merging / clean up of the release

* added tribalId for Supplemental dataset (#1804) * Setting zoom levels for tribal map (#1810) * NRI dataset and initial score YAML configuration (#1534) * update be staging gha * NRI dataset and initial score YAML configuration * checkpoint * adding data checks for release branch * passing tests * adding INPUT_EXTRACTED_FILE_NAME to base class * lint * columns to keep and tests * update be staging gha * checkpoint * update be staging gha * NRI dataset and initial score YAML configuration * checkpoint * adding data checks for release branch * passing tests * adding INPUT_EXTRACTED_FILE_NAME to base class * lint * columns to keep and tests * checkpoint * PR Review * renoving source url * tests * stop execution of ETL if there's a YAML schema issue * update be staging gha * adding source url as class var again * clean up * force cache bust * gha cache bust * dynamically set score vars from YAML * docsctrings * removing last updated year - optional reverse percentile * passing tests * sort order * column ordening * PR review * class level vars * Updating DatasetsConfig * fix pylint errors * moving metadata hint back to code Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov> * Correct copy typo (#1809) * Add basic test suite for COI (#1518) * Update COI to use new yaml (#1518) * Add tests for DOE energy budren (1518 * Add dataset config for energy budren (1518) * Refactor ETL to use datasets.yml (#1518) * Add fake GEOIDs to COI tests (#1518) * Refactor _setup_etl_instance_and_run_extract to base (#1518) For the three classes we've done so far, a generic _setup_etl_instance_and_run_extract will work fine, for the moment we can reuse the same setup method until we decide future classes need more flexibility --- but they can also always subclass so... * Add output-path tests (#1518) * Update YAML to match constant (#1518) * Don't blindly set float format (#1518) * Add defaults for extract (#1518) * Run YAML load on all subclasses (#1518) * Update description fields (#1518) * Update YAML per final format (#1518) * Update fixture tract IDs (#1518) * Update base class refactor (#1518) Now that NRI is final I needed to make a small number of updates to my refactored code. * Remove old comment (#1518) * Fix type signature and return (#1518) * Update per code review (#1518) Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com> Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov> Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>

Yikes! Fixing merge messup!

Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.

Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.

* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

* Remove code that drops Guam and USVI from ETL * Add back code for dropping rows by FIPS code We may want this functionality, so let's keep it and just make the constant currently be an empty array. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

Removing HOLC calculation from score narwhal.

Rescales linguistic isolation to drop puerto rico

adds leaky underground storage tanks

also includes merging / clean up of the release

* wip * i believe this works -- let's see the pipeline * updated fixtures

* updated tile data * ensuring adjli_et in

* Add missing field to download (#1964) * Remove pydantic since it's unused (#1964) * Add percentile to CSV (#1964) * Update downloadable pickle (#1964)

…n branch) (#1962) * Configure and run `black` and other pre-commit hooks Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>

* Change TA_PERC, change TA_COUNT (#1988, #1989) - Make TA_PERC_STR back into a nullable float following the rules requestsed in #1989 - Move TA_COUNT to be TA_COUNT_AK, also add a null TA_COUNT_C for CONUS that we can fill in later. * Fix typo comment (#1988)

* Add "Is a Tribal DAC" field (#1998) * Add tribal DACs to score N final (#1998) * Add new fields to downloads (#1998) * Make a int a float (#1998) * Update field names, apply feedback (#1998)

* Add assertion around codebook (#1505) * Assert csv and excel have same cols (#1505)

* data source location * toml * cdc_places * cdc_svi_index * url updates * child oppy and dot travel * up to hud_recap * completed ticket * cache bust * hud_recap * us_army_fuds

I did a pretty rough and simple analysis of the variables we put in the tiles and grepped the frontend code to see if (1) they're ever accessed and (2) if they're used, even if they're read once. I removed everything I noticed was not accessed.

* Disable file size limits on tiles * Remove print debugs I know.

* Update file name pattern (#2037) * Remove ETL from generation (2037) I looked more carefully, and this ETL step isn't used in the score, so there's no need to run it every time. Per previous steps, I removed it from constants so the code is there it won't run by default.

vim-usds

@lucas

Sorry for the basic questions, does the pipeline run this script at any point? Either in etl_score, etl_score_post, etl_score_geo, or generate_tiles?

Or this something separate from our data pipeline?

If it's separate I'm assuming to use this someone would have to manually run it?

lucasmbrown-usds · 2022-11-01T20:48:29Z

@lucas

Sorry for the basic questions, does the pipeline run this script at any point? Either in etl_score, etl_score_post, etl_score_geo, or generate_tiles?

Or this something separate from our data pipeline?

If it's separate I'm assuming to use this someone would have to manually run it?

Yes this has to be run manually. It's our "comparison tool" which is just a python notebook. It's what I'll run to generate the final analysis for Narwhal.

github-actions · 2022-11-01T21:39:19Z

** Score Deployed! **
Find it here:

github-actions · 2022-11-01T22:01:32Z

** Map Deployed! **
Map with Staging Backend: https://screeningtool.geoplatform.gov/en?flags=stage_hash=2056/289024de25fb51e1bad1a3af5ec5bdf6d8ec31f2
Find tiles here: https://justice40-data.s3.amazonaws.com/data-pipeline-staging/2056/289024de25fb51e1bad1a3af5ec5bdf6d8ec31f2/data/score/tiles

vim-usds

LGTM!

vim-usds · 2022-11-01T22:32:21Z

@lucasmbrown-usds - I wonder if the python notebooks should be exempt from running the pipeline Data Checks checks in the GHA if it's not part of the pipeline

If this is true, wondering what else we can remove when making changes to the ipython folder? Are all scripts exempt from the classic BE checks that run in this PR?

emma-nechamkin and others added 30 commits August 10, 2022 12:07

Create deploy_be_staging.yml (#1575)

218fa48

Imputing income using geographic neighbors (#1559)

f680d86

Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.

Adding HOLC indicator (#1579)

3a96001

Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.

Update backend for Puerto Rico (#1686)

2e38aaa

* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

updating

92d68ba

Emma nechamkin/holc patch (#1742)

002cddf

Removing HOLC calculation from score narwhal.

updating ejscreen data, try two (#1747)

e98282d

Rescaling linguistic isolation (#1750)

29419dd

Rescales linguistic isolation to drop puerto rico

adds UST indicator (#1786)

daf188c

adds leaky underground storage tanks

Changing LHE in tiles to a boolean (#1767)

bbb5bbc

also includes merging / clean up of the release

added indoor plumbing to chas

cac1e04

added indoor plumbing to score housing burden

19d3bde

added indoor plumbing to score housing burden

3aa03f1

first run through

ed9b717

Update etl_score_geo.py

d55b7c0

Yikes! Fixing merge messup!

Create deploy_be_staging.yml (#1575)

485a9a8

Imputing income using geographic neighbors (#1559)

f047ca9

Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.

Adding HOLC indicator (#1579)

1782d02

Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.

Update backend for Puerto Rico (#1686)

05748c9

* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>

updating

b41a287

Emma nechamkin/holc patch (#1742)

7559cf4

Removing HOLC calculation from score narwhal.

updating ejscreen data, try two (#1747)

2ab24c6

Rescaling linguistic isolation (#1750)

f6efdd4

Rescales linguistic isolation to drop puerto rico

adds UST indicator (#1786)

b0a7284

adds leaky underground storage tanks

Changing LHE in tiles to a boolean (#1767)

0d90ae5

also includes merging / clean up of the release

added indoor plumbing to chas

8c75190

added indoor plumbing to score housing burden

15450cf

emma-nechamkin and others added 21 commits October 3, 2022 13:07

updated with scoring comparison

d4ae16b

updated for narhwal -- leaving commented code in for now

7c8617d

pydantic upgrade

ecabe79

produce a string for the front end to ingest (#1963)

a438b44

* wip * i believe this works -- let's see the pipeline * updated fixtures

Adding ADJLI_ET (#1976)

71385a0

* updated tile data * ensuring adjli_et in

Add back income percentile (#1977)

1334fcc

* Add missing field to download (#1964) * Remove pydantic since it's unused (#1964) * Add percentile to CSV (#1964) * Update downloadable pickle (#1964)

Merge branch 'main' into emma-nechamkin/release/score-narwhal

baa34ec

Issue 105: Configure and run black and other pre-commit hooks (clea…

6e6223c

…n branch) (#1962) * Configure and run `black` and other pre-commit hooks Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>

Removing fixed python version for black (#1985)

6505d49

Issue 1992: Do not impute income for null population tracts (#1993)

e2641fe

Hotfix for DOT data source DNS issue (#1999)

d89c516

Make tribal overlap set score N (#2004)

8b611ed

* Add "Is a Tribal DAC" field (#1998) * Add tribal DACs to score N final (#1998) * Add new fields to downloads (#1998) * Make a int a float (#1998) * Update field names, apply feedback (#1998)

Add assertions around codebook (#2014)

743d3ce

* Add assertion around codebook (#1505) * Assert csv and excel have same cols (#1505)

Remove suffixes from tribal lands (#1974) (#2008)

841a26d

Data source location (#2015)

d975118

* data source location * toml * cdc_places * cdc_svi_index * url updates * child oppy and dot travel * up to hud_recap * completed ticket * cache bust * hud_recap * us_army_fuds

Disable file size limits on tiles (#2031)

dbea349

* Disable file size limits on tiles * Remove print debugs I know.

Merge branch 'main' into emma-nechamkin/release/score-narwhal

e51af9d

updating comparison tool fields

289024d

lucasmbrown-usds requested review from vim-usds and mattbowen-usds as code owners November 1, 2022 20:25

vim-usds reviewed Nov 1, 2022

View reviewed changes

vim-usds approved these changes Nov 1, 2022

View reviewed changes

Base automatically changed from emma-nechamkin/release/score-narwhal to main December 2, 2022 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update comparison tool #2056

Update comparison tool #2056

lucasmbrown-usds commented Nov 1, 2022

vim-usds left a comment

lucasmbrown-usds commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

vim-usds left a comment

vim-usds commented Nov 1, 2022 •

edited

Loading

Update comparison tool #2056

Are you sure you want to change the base?

Update comparison tool #2056

Conversation

lucasmbrown-usds commented Nov 1, 2022

vim-usds left a comment

Choose a reason for hiding this comment

lucasmbrown-usds commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

github-actions bot commented Nov 1, 2022

vim-usds left a comment

Choose a reason for hiding this comment

vim-usds commented Nov 1, 2022 • edited Loading

vim-usds commented Nov 1, 2022 •

edited

Loading