Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update comparison tool #2056

Open
wants to merge 124 commits into
base: main
Choose a base branch
from
Open

Update comparison tool #2056

wants to merge 124 commits into from

Conversation

lucasmbrown-usds
Copy link
Contributor

Change fields.

emma-nechamkin and others added 30 commits August 10, 2022 12:07
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
* Update PR threshold count to 10

We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621

* Do not use linguistic iso for Puerto Rico

Closes 1350.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Remove code that drops Guam and USVI from ETL

* Add back code for dropping rows by FIPS code

We may want this functionality, so let's keep it and just make the constant currently be an empty array.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Removing HOLC calculation from score narwhal.
Rescales linguistic isolation to drop puerto rico
adds leaky underground storage tanks
also includes merging / clean up of the release
* added tribalId for Supplemental dataset (#1804)

* Setting zoom levels for tribal map (#1810)

* NRI dataset and initial score YAML configuration (#1534)

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* update be staging gha

* checkpoint

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* checkpoint

* PR Review

* renoving source url

* tests

* stop execution of ETL if there's a YAML schema issue

* update be staging gha

* adding source url as class var again

* clean up

* force cache bust

* gha cache bust

* dynamically set score vars from YAML

* docsctrings

* removing last updated year - optional reverse percentile

* passing tests

* sort order

* column ordening

* PR review

* class level vars

* Updating DatasetsConfig

* fix pylint errors

* moving metadata hint back to code

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>

* Correct copy typo (#1809)

* Add basic test suite for COI (#1518)

* Update COI to use new yaml (#1518)

* Add tests for DOE energy budren (1518

* Add dataset config for energy budren (1518)

* Refactor ETL to use datasets.yml (#1518)

* Add fake GEOIDs to COI tests (#1518)

* Refactor _setup_etl_instance_and_run_extract to base (#1518)

For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...

* Add output-path tests (#1518)

* Update YAML to match constant (#1518)

* Don't blindly set float format (#1518)

* Add defaults for extract (#1518)

* Run YAML load on all subclasses (#1518)

* Update description fields (#1518)

* Update YAML per final format (#1518)

* Update fixture tract IDs (#1518)

* Update base class refactor (#1518)

Now that NRI is final I needed to make a small number of updates to my
refactored code.

* Remove old comment (#1518)

* Fix type signature and return (#1518)

* Update per code review (#1518)

Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
Yikes! Fixing merge messup!
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
* Update PR threshold count to 10

We now show 10 indicators for PR. See the discussion on the github issue for more info: #1621

* Do not use linguistic iso for Puerto Rico

Closes 1350.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Remove code that drops Guam and USVI from ETL

* Add back code for dropping rows by FIPS code

We may want this functionality, so let's keep it and just make the constant currently be an empty array.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Removing HOLC calculation from score narwhal.
Rescales linguistic isolation to drop puerto rico
adds leaky underground storage tanks
also includes merging / clean up of the release
emma-nechamkin and others added 21 commits October 3, 2022 13:07
* wip

* i believe this works -- let's see the pipeline

* updated fixtures
* updated tile data

* ensuring adjli_et in
* Add missing field to download (#1964)

* Remove pydantic since it's unused (#1964)

* Add percentile to CSV (#1964)

* Update downloadable pickle (#1964)
…n branch) (#1962)

* Configure and run `black` and other pre-commit hooks

Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
* Change TA_PERC, change TA_COUNT (#1988, #1989)

- Make TA_PERC_STR back into a nullable float following the rules
  requestsed in #1989
- Move TA_COUNT to be TA_COUNT_AK, also add a null TA_COUNT_C for CONUS
  that we can fill in later.

* Fix typo comment (#1988)
* Add "Is a Tribal DAC" field (#1998)

* Add tribal DACs to score N final (#1998)

* Add new fields to downloads (#1998)

* Make a int a float (#1998)

* Update field names, apply feedback (#1998)
* Add assertion around codebook (#1505)

* Assert csv and excel have same cols (#1505)
* data source location

* toml

* cdc_places

* cdc_svi_index

* url updates

* child oppy and dot travel

* up to hud_recap

* completed ticket

* cache bust

* hud_recap

* us_army_fuds
I did a pretty rough and simple analysis of the variables we put in the
tiles and grepped the frontend code to see if (1) they're ever accessed
and (2) if they're used, even if they're read once. I removed everything
I noticed was not accessed.
* Disable file size limits on tiles

* Remove print debugs

I know.
* Update file name pattern (#2037)

* Remove ETL from generation (2037)

I looked more carefully, and this ETL step isn't used in the score, so
there's no need to run it every time. Per previous steps, I removed it
from constants so the code is there it won't run by default.
Copy link
Contributor

@vim-usds vim-usds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucas

Sorry for the basic questions, does the pipeline run this script at any point? Either in etl_score, etl_score_post, etl_score_geo, or generate_tiles?

Or this something separate from our data pipeline?

If it's separate I'm assuming to use this someone would have to manually run it?

@lucasmbrown-usds
Copy link
Contributor Author

@lucas

Sorry for the basic questions, does the pipeline run this script at any point? Either in etl_score, etl_score_post, etl_score_geo, or generate_tiles?

Or this something separate from our data pipeline?

If it's separate I'm assuming to use this someone would have to manually run it?

Yes this has to be run manually. It's our "comparison tool" which is just a python notebook. It's what I'll run to generate the final analysis for Narwhal.

Copy link
Contributor

@vim-usds vim-usds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vim-usds
Copy link
Contributor

vim-usds commented Nov 1, 2022

@lucasmbrown-usds - I wonder if the python notebooks should be exempt from running the pipeline Data Checks checks in the GHA if it's not part of the pipeline

If this is true, wondering what else we can remove when making changes to the ipython folder? Are all scripts exempt from the classic BE checks that run in this PR?

Base automatically changed from emma-nechamkin/release/score-narwhal to main December 2, 2022 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants