Skip to content

Commit

Permalink
Adding NLCD data (#1826)
Browse files Browse the repository at this point in the history
Adding NLCD's natural space indicator end to end to the score.
  • Loading branch information
emma-nechamkin authored Aug 17, 2022
1 parent 49623e4 commit 7d89d41
Show file tree
Hide file tree
Showing 18 changed files with 288 additions and 18 deletions.
21 changes: 21 additions & 0 deletions data/data-pipeline/data_pipeline/content/config/csv.yml
Original file line number Diff line number Diff line change
Expand Up @@ -289,4 +289,25 @@ fields:
format: bool
- score_name: Greater than or equal to the 90th percentile for share of properties at risk of fire in 30 years
label: Greater than or equal to the 90th percentile for share of properties at risk of fire in 30 years
format: bool
- score_name: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent and is low income?
label: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent and is low income?
format: bool
- score_name: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent
label: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent
format: bool
- score_name: Share of the tract's land area that is covered by impervious surface or cropland as a percent
label: Share of the tract's land area that is covered by impervious surface or cropland as a percent
format: percentage
- score_name: Share of the tract's land area that is covered by impervious surface or cropland as a percent (percentile)
label: Share of the tract's land area that is covered by impervious surface or cropland as a percent (percentile)
format: percentage
- score_name: Share of properties at risk of flood in 30 years (percentile)
label: Share of properties at risk of flood in 30 years (percentile)
format: percentage
- score_name: Share of properties at risk of fire in 30 years (percentile)
label: Share of properties at risk of fire in 30 years (percentile)
format: percentage
- score_name: Does the tract have at least 35 acres in it?
label: Does the tract have at least 35 acres in it?
format: bool
26 changes: 24 additions & 2 deletions data/data-pipeline/data_pipeline/content/config/excel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -278,10 +278,10 @@ sheets:
format: bool
- score_name: Share of properties at risk of flood in 30 years
label: Share of properties at risk of flood in 30 years
format: float
format: percentage
- score_name: Share of properties at risk of fire in 30 years
label: Share of properties at risk of fire in 30 years
format: float
format: percentage
- score_name: Greater than or equal to the 90th percentile for share of properties at risk of flood in 30 years and is low income?
label: Greater than or equal to the 90th percentile for share of properties at risk of flood in 30 years and is low income?
format: bool
Expand All @@ -294,3 +294,25 @@ sheets:
- score_name: Greater than or equal to the 90th percentile for share of properties at risk of fire in 30 years
label: Greater than or equal to the 90th percentile for share of properties at risk of fire in 30 years
format: bool
- score_name: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent and is low income?
label: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent and is low income?
format: bool
- score_name: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent
label: Greater than or equal to the 90th percentile for share of the tract's land area that is covered by impervious surface or cropland as a percent
format: bool
- score_name: Share of the tract's land area that is covered by impervious surface or cropland as a percent
label: Share of the tract's land area that is covered by impervious surface or cropland as a percent
format: percentage
- score_name: Share of the tract's land area that is covered by impervious surface or cropland as a percent (percentile)
label: Share of the tract's land area that is covered by impervious surface or cropland as a percent (percentile)
format: percentage
- score_name: Share of properties at risk of flood in 30 years (percentile)
label: Share of properties at risk of flood in 30 years (percentile)
format: percentage
- score_name: Share of properties at risk of fire in 30 years (percentile)
label: Share of properties at risk of fire in 30 years (percentile)
format: percentage
- score_name: Does the tract have at least 35 acres in it?
label: Does the tract have at least 35 acres in it?
format: bool

6 changes: 6 additions & 0 deletions data/data-pipeline/data_pipeline/etl/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@
"class_name": "HudHousingETL",
"is_memory_intensive": False,
},
{
"name": "nlcd_nature_deprived",
"module_dir": "nlcd_nature_deprived",
"class_name": "NatureDeprivedETL",
"is_memory_intensive": False,
},
{
"name": "census_acs_median_income",
"module_dir": "census_acs_median_income",
Expand Down
39 changes: 33 additions & 6 deletions data/data-pipeline/data_pipeline/etl/score/config/datasets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ datasets:
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true

- short_name: "ex_ag_loss"
df_field_name: "EXPECTED_AGRICULTURE_LOSS_RATE_FIELD_NAME"
long_name: "Expected agricultural loss rate (Natural Hazards Risk Index)"
Expand All @@ -54,7 +53,6 @@ datasets:
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true

- short_name: "ex_bldg_loss"
df_field_name: "EXPECTED_BUILDING_LOSS_RATE_FIELD_NAME"
long_name: "Expected building loss rate (Natural Hazards Risk Index)"
Expand All @@ -72,7 +70,6 @@ datasets:
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true

- short_name: "has_ag_val"
df_field_name: "CONTAINS_AGRIVALUE"
long_name: "Contains agricultural value"
Expand Down Expand Up @@ -168,7 +165,6 @@ datasets:
field_type: float
include_in_tiles: true
include_in_downloadable_files: true

- long_name: "First Street Foundation Flood Risk"
short_name: "FSF Flood Risk"
module_name: fsf_flood_risk
Expand Down Expand Up @@ -209,7 +205,6 @@ datasets:
include_in_tiles: false
include_in_downloadable_files: true
create_percentile: true

- long_name: "First Street Foundation Wildfire Risk"
short_name: "FSF Wildfire Risk"
module_name: fsf_wildfire_risk
Expand Down Expand Up @@ -250,7 +245,6 @@ datasets:
include_in_tiles: false
include_in_downloadable_files: true
create_percentile: true

- long_name: "DOT Travel Disadvantage Index"
short_name: "DOT"
module_name: "travel_composite"
Expand All @@ -263,3 +257,36 @@ datasets:
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true
- long_name: "National Land Cover Database (NLCD) Lack of Green Space / Nature-Deprived Communities dataset, as compiled by TPL"
short_name: "nlcd_nature_deprived"
module_name: "nlcd_nature_deprived"
input_geoid_tract_field_name: "GEOID10_TRACT"
load_fields:
- short_name: "ncld_eligible"
df_field_name: "ELIGIBLE_FOR_NATURE_DEPRIVED_FIELD_NAME"
long_name: "Does the tract have at least 35 acres in it?"
field_type: bool
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: false
- short_name: "percent_impervious"
df_field_name: "TRACT_PERCENT_IMPERVIOUS_FIELD_NAME"
long_name: "Share of the tract's land area that is covered by impervious surface as a percent"
field_type: percentage
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true
- short_name: "percent_nonnatural"
df_field_name: "TRACT_PERCENT_NON_NATURAL_FIELD_NAME"
long_name: "Share of the tract's land area that is covered by impervious surface or cropland as a percent"
field_type: percentage
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true
- short_name: "percent_cropland"
df_field_name: "TRACT_PERCENT_CROPLAND_FIELD_NAME"
long_name: "Share of the tract's land area that is covered by cropland as a percent"
field_type: percentage
include_in_tiles: true
include_in_downloadable_files: true
create_percentile: true
5 changes: 5 additions & 0 deletions data/data-pipeline/data_pipeline/etl/score/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,9 @@
+ field_names.PERCENTILE_FIELD_SUFFIX: "WF_PFS",
field_names.HIGH_FUTURE_FLOOD_RISK_FIELD: "FLD_ET",
field_names.HIGH_FUTURE_WILDFIRE_RISK_FIELD: "WF_ET",
field_names.TRACT_PERCENT_NON_NATURAL_FIELD_NAME
+ field_names.PERCENTILE_FIELD_SUFFIX: "IS_PFS",
field_names.NON_NATURAL_LOW_INCOME_FIELD_NAME: "IS_ET",
## FPL 200 and low higher ed for all others should no longer be M_EBSI, but rather
## FPL_200 (there is no higher ed in narwhal)
}
Expand Down Expand Up @@ -361,4 +364,6 @@
field_names.FUTURE_FLOOD_RISK_FIELD + field_names.PERCENTILE_FIELD_SUFFIX,
field_names.FUTURE_WILDFIRE_RISK_FIELD
+ field_names.PERCENTILE_FIELD_SUFFIX,
field_names.TRACT_PERCENT_NON_NATURAL_FIELD_NAME
+ field_names.PERCENTILE_FIELD_SUFFIX,
]
17 changes: 14 additions & 3 deletions data/data-pipeline/data_pipeline/etl/score/etl_score.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from data_pipeline.etl.sources.fsf_flood_risk.etl import (
FloodRiskETL,
)
from data_pipeline.etl.sources.nlcd_nature_deprived.etl import NatureDeprivedETL
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
from data_pipeline.score.score_runner import ScoreRunner
from data_pipeline.score import field_names
Expand Down Expand Up @@ -47,6 +48,7 @@ def __init__(self):
self.dot_travel_disadvantage_df: pd.DataFrame
self.fsf_flood_df: pd.DataFrame
self.fsf_fire_df: pd.DataFrame
self.nature_deprived_df: pd.DataFrame

def extract(self) -> None:
logger.info("Loading data sets from disk.")
Expand Down Expand Up @@ -134,6 +136,9 @@ def extract(self) -> None:
# Load flood risk data
self.fsf_flood_df = FloodRiskETL.get_data_frame()

# Load NLCD Nature-Deprived Communities data
self.nature_deprived_df = NatureDeprivedETL.get_data_frame()

# Load GeoCorr Urban Rural Map
geocorr_urban_rural_csv = (
constants.DATA_PATH / "dataset" / "geocorr" / "usa.csv"
Expand Down Expand Up @@ -356,6 +361,7 @@ def _prepare_initial_df(self) -> pd.DataFrame:
self.dot_travel_disadvantage_df,
self.fsf_flood_df,
self.fsf_fire_df,
self.nature_deprived_df,
]

# Sanity check each data frame before merging.
Expand Down Expand Up @@ -439,16 +445,18 @@ def _prepare_initial_df(self) -> pd.DataFrame:
field_names.IMPENETRABLE_SURFACES_FIELD,
field_names.UST_FIELD,
field_names.DOT_TRAVEL_BURDEN_FIELD,
field_names.AGRICULTURAL_VALUE_BOOL_FIELD,
field_names.FUTURE_FLOOD_RISK_FIELD,
field_names.FUTURE_WILDFIRE_RISK_FIELD,
field_names.TRACT_PERCENT_NON_NATURAL_FIELD_NAME,
field_names.POVERTY_LESS_THAN_200_FPL_IMPUTED_FIELD,
]

non_numeric_columns = [
self.GEOID_TRACT_FIELD_NAME,
field_names.PERSISTENT_POVERTY_FIELD,
field_names.HISTORIC_REDLINING_SCORE_EXCEEDED,
field_names.TRACT_ELIGIBLE_FOR_NONNATURAL_THRESHOLD,
field_names.AGRICULTURAL_VALUE_BOOL_FIELD,
]

# For some columns, high values are "good", so we want to reverse the percentile
Expand Down Expand Up @@ -500,7 +508,7 @@ def _prepare_initial_df(self) -> pd.DataFrame:
df_copy[numeric_columns] = df_copy[numeric_columns].apply(pd.to_numeric)

# Convert all columns to numeric and do math
# Note that we have a few special conditions here, that we handle explicitly.
# Note that we have a few special conditions here and we handle them explicitly.
# For *Linguistic Isolation*, we do NOT want to include Puerto Rico in the percentile
# calculation. This is because linguistic isolation as a category doesn't make much sense
# in Puerto Rico, where Spanish is a recognized language. Thus, we construct a list
Expand All @@ -509,6 +517,10 @@ def _prepare_initial_df(self) -> pd.DataFrame:
# For *Expected Agricultural Loss*, we only want to include in the percentile tracts
# in which there is some agricultural value. This helps us adjust the data such that we have
# the ability to discern which tracts truly are at the 90th percentile, since many tracts have 0 value.
#
# For *Non-Natural Space*, we may only want to include tracts that have at least 35 acreas, I think. This will
# get rid of tracts that we think are aberrations statistically. Right now, we have left this out
# pending ground-truthing.

for numeric_column in numeric_columns:
drop_tracts = []
Expand All @@ -524,7 +536,6 @@ def _prepare_initial_df(self) -> pd.DataFrame:
logger.info(
f"Dropping {len(drop_tracts)} tracts from Agricultural Value Loss"
)

elif numeric_column == field_names.LINGUISTIC_ISO_FIELD:
drop_tracts = df_copy[
# 72 is the FIPS code for Puerto Rico
Expand Down

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ def transform(self) -> None:
"""
logger.info("Transforming National Risk Index Data")

logger.info(self.COLUMNS_TO_KEEP)
# read in the unzipped csv data source then rename the
# Census Tract column for merging
df_fsf_flood_disagg: pd.DataFrame = pd.read_csv(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ def transform(self) -> None:
- Calculates share of properties at risk, left-clipping number of properties at 250
"""
logger.info("Transforming National Risk Index Data")

logger.info(self.COLUMNS_TO_KEEP)
# read in the unzipped csv data source then rename the
# Census Tract column for merging
df_fsf_fire_disagg: pd.DataFrame = pd.read_csv(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Nature deprived communities data

The following dataset was compiled by TPL (Trust for Public Lands) using NCLD data. We define as: AREA - [CROPLAND] - [IMPERVIOUS SURFACES].

## Codebook
- GEOID10 – Census tract ID
- SF – State Name
- CF – County Name
- P200_PFS – Percent of individuals below 200% Federal Poverty Line (from CEJST source data).
- CA_LT20 – Percent higher ed enrollment rate is less than 20% (from CEJST source data).
- TractAcres – Acres of tract calculated from ALAND10 field (area land/meters) in 2010 census tracts.
- CAVEAT: Some census tracts in the CEJST source file extend into open water. ALAND10 area was used to constrain percent calculations (e.g. cropland area) to land only.
- AcresCrops – Acres crops calculated by summing all cells in the NLCD Cropland Data Layer crop classes.
- PctCrops – Formula: AcresCrops/TractAcres*100.
- PctImperv – Mean imperviousness for each census tract.
- CAVEAT: Where tracts extend into open water, mean imperviousness may be underestimated.
- __TO USE__ PctNatural – Formula: 100 – PctCrops – PctImperv.
- PctNat90 – Tract in or below 10th percentile for PctNatural. 1 = True, 0 = False.
- PctNatural 10th percentile = 28.6439%
- ImpOrCrop – If tract >= 90th percentile for PctImperv OR PctCrops. 1 = True, 0 = False.
- PctImperv 90th percentile = 67.4146 %
- PctCrops 90th percentile = 27.8116 %
- LowInAndEd – If tract >= 65th percentile for P200_PFS AND CA_LT20.
- P200_PFS 65th percentile = 64.0%
- NatureDep – ImpOrCrp = 1 AND LowInAndEd = 1.

We added `GEOID10_TRACT` before converting shapefile to csv.

## Instructions to recreate

### Creating Impervious plus Cropland Attributes for Census Tracts

The Cropland Data Layer and NLCD Impervious layer were too big to put on our OneDrive, but you can download them here:
CDL: https://www.nass.usda.gov/Research_and_Science/Cropland/Release/datasets/2021_30m_cdls.zip
Impervious: https://s3-us-west-2.amazonaws.com/mrlc/nlcd_2019_impervious_l48_20210604.zip


#### Crops

Add an attribute called TractAcres (or similar) to the census tracts to hold a value representing acres covered by the census tract.
Calculate the TractAcres field for each census tract by using the Calculate Geometry tool (set the Property to Area (geodesic), and the Units to Acres).
From the Cropland Data Layer (CDL), extract only the pixels representing crops, using the Extract by Attributes tool in ArcGIS Spatial Analyst toolbox.
a. The attribute table tells you the names of each type of land cover. Since the CDL also contains NLCD classes and empty classes, the actual crop classes must be extracted.
From the crops-only raster extracted from the CDL, run the Reclassify tool to create a binary layer where all crops = 1, and everything else is Null.
Run the Tabulate Area tool:
a. Zone data = census tracts
b. Input raster data = the binary crops layer
c. This will produce a table with the square meters of crops in each census tract contained in an attribute called VALUE_1
Run the Join Field tool to join the table to the census tracts, with the VALUE_1 field as the Transfer Field, to transfer the VALUE_1 field (square meters of crops) to the census tracts.
Add a field to the census tracts called AcresCrops (or similar) to hold the acreage of crops in each census tract.
Calculate the AcresCrops field by multiplying the VALUE_1 field by 0.000247105 to produce acres of crops in each census tracts.
a. You can delete the VALUE_1 field.
Add a field called PctCrops (or similar) to hold the percent of each census tract occupied by crops.
Calculate the PctCrops field by dividing the AcresCrops field by the TractAcres field, and multiply by 100 to get the percent.
Impervious

Run the Zonal Statistics as Table tool:
a. Zone data = census tracts
b. Input raster data = impervious data raster layer
c. Statistics type = Mean
d. This will produce a table with the percent of each census tract occupied by impervious surfaces, contained in an attribute called MEAN

Run the Join Field tool to join the table to the census tracts, with the MEAN field as the Transfer Field, to transfer the MEAN field (percent impervious) to the census tracts.

Add a field called PctImperv (or similar) to hold the percent impervious value.

Calculate the PctImperv field by setting it equal to the MEAN field.
a. You can delete the MEAN field.
Combine the Crops and Impervious Data

Open the census tracts attribute table and add a field called PctNatural (or similar). Calculate this field using this equation: 100 – PctCrops – PctImperv . This produces a value that tells you the percent of each census tract covered in natural land cover.

Define the census tracts that fall in the 90th percentile of non-natural land cover:
a. Add a field called PctNat90 (or similar)
b. Right-click on the PctNatural field, and click Sort Ascending (lowest PctNatural values on top)
c. Select the top 10 percent of rows after the sort
d. Click on Show Selected Records in the attribute table
e. Calculate the PctNat90 field for the selected records = 1
f. Clear the selection
g. The rows that now have a value of 1 for PctNat90 are the most lacking for natural land cover, and can be symbolized accordingly in a map
Empty file.
Loading

0 comments on commit 7d89d41

Please sign in to comment.