-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update comparison tool #2056
base: main
Are you sure you want to change the base?
Update comparison tool #2056
Conversation
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Remove code that drops Guam and USVI from ETL * Add back code for dropping rows by FIPS code We may want this functionality, so let's keep it and just make the constant currently be an empty array. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Removing HOLC calculation from score narwhal.
Rescales linguistic isolation to drop puerto rico
adds leaky underground storage tanks
also includes merging / clean up of the release
* added tribalId for Supplemental dataset (#1804) * Setting zoom levels for tribal map (#1810) * NRI dataset and initial score YAML configuration (#1534) * update be staging gha * NRI dataset and initial score YAML configuration * checkpoint * adding data checks for release branch * passing tests * adding INPUT_EXTRACTED_FILE_NAME to base class * lint * columns to keep and tests * update be staging gha * checkpoint * update be staging gha * NRI dataset and initial score YAML configuration * checkpoint * adding data checks for release branch * passing tests * adding INPUT_EXTRACTED_FILE_NAME to base class * lint * columns to keep and tests * checkpoint * PR Review * renoving source url * tests * stop execution of ETL if there's a YAML schema issue * update be staging gha * adding source url as class var again * clean up * force cache bust * gha cache bust * dynamically set score vars from YAML * docsctrings * removing last updated year - optional reverse percentile * passing tests * sort order * column ordening * PR review * class level vars * Updating DatasetsConfig * fix pylint errors * moving metadata hint back to code Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov> * Correct copy typo (#1809) * Add basic test suite for COI (#1518) * Update COI to use new yaml (#1518) * Add tests for DOE energy budren (1518 * Add dataset config for energy budren (1518) * Refactor ETL to use datasets.yml (#1518) * Add fake GEOIDs to COI tests (#1518) * Refactor _setup_etl_instance_and_run_extract to base (#1518) For the three classes we've done so far, a generic _setup_etl_instance_and_run_extract will work fine, for the moment we can reuse the same setup method until we decide future classes need more flexibility --- but they can also always subclass so... * Add output-path tests (#1518) * Update YAML to match constant (#1518) * Don't blindly set float format (#1518) * Add defaults for extract (#1518) * Run YAML load on all subclasses (#1518) * Update description fields (#1518) * Update YAML per final format (#1518) * Update fixture tract IDs (#1518) * Update base class refactor (#1518) Now that NRI is final I needed to make a small number of updates to my refactored code. * Remove old comment (#1518) * Fix type signature and return (#1518) * Update per code review (#1518) Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com> Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov> Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
Yikes! Fixing merge messup!
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
* Update PR threshold count to 10 We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621 * Do not use linguistic iso for Puerto Rico Closes 1350. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Remove code that drops Guam and USVI from ETL * Add back code for dropping rows by FIPS code We may want this functionality, so let's keep it and just make the constant currently be an empty array. Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Removing HOLC calculation from score narwhal.
Rescales linguistic isolation to drop puerto rico
adds leaky underground storage tanks
also includes merging / clean up of the release
* wip * i believe this works -- let's see the pipeline * updated fixtures
* updated tile data * ensuring adjli_et in
…n branch) (#1962) * Configure and run `black` and other pre-commit hooks Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
* data source location * toml * cdc_places * cdc_svi_index * url updates * child oppy and dot travel * up to hud_recap * completed ticket * cache bust * hud_recap * us_army_fuds
I did a pretty rough and simple analysis of the variables we put in the tiles and grepped the frontend code to see if (1) they're ever accessed and (2) if they're used, even if they're read once. I removed everything I noticed was not accessed.
* Disable file size limits on tiles * Remove print debugs I know.
* Update file name pattern (#2037) * Remove ETL from generation (2037) I looked more carefully, and this ETL step isn't used in the score, so there's no need to run it every time. Per previous steps, I removed it from constants so the code is there it won't run by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the basic questions, does the pipeline run this script at any point? Either in etl_score
, etl_score_post
, etl_score_geo
, or generate_tiles
?
Or this something separate from our data pipeline?
If it's separate I'm assuming to use this someone would have to manually run it?
Yes this has to be run manually. It's our "comparison tool" which is just a python notebook. It's what I'll run to generate the final analysis for Narwhal. |
** Score Deployed! **
|
** Map Deployed! ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@lucasmbrown-usds - I wonder if the python notebooks should be exempt from running the pipeline If this is true, wondering what else we can remove when making changes to the ipython folder? Are all scripts exempt from the classic BE checks that run in this PR? |
Change fields.