Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1803: percent of zips (WIP) #1916

Open
wants to merge 96 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
218fa48
Create deploy_be_staging.yml (#1575)
emma-nechamkin Apr 18, 2022
f680d86
Imputing income using geographic neighbors (#1559)
emma-nechamkin Apr 27, 2022
3a96001
Adding HOLC indicator (#1579)
emma-nechamkin May 12, 2022
2e38aaa
Update backend for Puerto Rico (#1686)
switzersc-usds Jun 23, 2022
92d68ba
updating
emma-nechamkin Jul 13, 2022
f8a6567
Do not drop Guam and USVI from ETL (#1681)
switzersc-usds Jul 7, 2022
002cddf
Emma nechamkin/holc patch (#1742)
emma-nechamkin Jul 15, 2022
e98282d
updating ejscreen data, try two (#1747)
emma-nechamkin Jul 18, 2022
29419dd
Rescaling linguistic isolation (#1750)
emma-nechamkin Aug 2, 2022
daf188c
adds UST indicator (#1786)
emma-nechamkin Aug 3, 2022
bbb5bbc
Changing LHE in tiles to a boolean (#1767)
emma-nechamkin Aug 3, 2022
cac1e04
added indoor plumbing to chas
emma-nechamkin Aug 3, 2022
19d3bde
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
3aa03f1
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
ed9b717
first run through
emma-nechamkin Aug 3, 2022
9635ef5
Refactor DOE Energy Burden and COI to use YAML (#1796)
mattbowen-usds Aug 10, 2022
d55b7c0
Update etl_score_geo.py
emma-nechamkin Aug 11, 2022
485a9a8
Create deploy_be_staging.yml (#1575)
emma-nechamkin Apr 18, 2022
f047ca9
Imputing income using geographic neighbors (#1559)
emma-nechamkin Apr 27, 2022
1782d02
Adding HOLC indicator (#1579)
emma-nechamkin May 12, 2022
05748c9
Update backend for Puerto Rico (#1686)
switzersc-usds Jun 23, 2022
b41a287
updating
emma-nechamkin Jul 13, 2022
3071815
Do not drop Guam and USVI from ETL (#1681)
switzersc-usds Jul 7, 2022
7559cf4
Emma nechamkin/holc patch (#1742)
emma-nechamkin Jul 15, 2022
2ab24c6
updating ejscreen data, try two (#1747)
emma-nechamkin Jul 18, 2022
f6efdd4
Rescaling linguistic isolation (#1750)
emma-nechamkin Aug 2, 2022
b0a7284
adds UST indicator (#1786)
emma-nechamkin Aug 3, 2022
0d90ae5
Changing LHE in tiles to a boolean (#1767)
emma-nechamkin Aug 3, 2022
8c75190
added indoor plumbing to chas
emma-nechamkin Aug 3, 2022
15450cf
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
4f6a1b5
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
baa591a
first run through
emma-nechamkin Aug 3, 2022
97e1754
Refactor DOE Energy Burden and COI to use YAML (#1796)
mattbowen-usds Aug 10, 2022
94cdc47
Update etl_score_geo.py
emma-nechamkin Aug 11, 2022
dcda155
fixing rebase
emma-nechamkin Aug 11, 2022
481a2a0
updated to fix linting errors (#1818)
emma-nechamkin Aug 11, 2022
13e7908
Adding back MapComparison video
vim-usds Aug 16, 2022
d5fbb80
Add FUDS ETL (#1817)
mattbowen-usds Aug 16, 2022
d6c04b1
Disable markdown check for link
vim-usds Aug 16, 2022
9321798
Merge branch 'emma-nechamkin/release/score-narwhal' of https://github…
vim-usds Aug 16, 2022
ebac552
Adding DOT composite to travel score (#1820)
emma-nechamkin Aug 16, 2022
5e378ae
Adding first street foundation data (#1823)
emma-nechamkin Aug 17, 2022
981a36c
first run -- adding NCLD data to the ETL, but not yet to the score
emma-nechamkin Aug 17, 2022
49623e4
Add abandoned mine lands data (#1824)
mattbowen-usds Aug 17, 2022
2e05b1d
Merge branch 'emma-nechamkin/release/score-narwhal' of github.com:usd…
emma-nechamkin Aug 17, 2022
7d89d41
Adding NLCD data (#1826)
emma-nechamkin Aug 17, 2022
88dc2e5
updating to avoid conflicts
emma-nechamkin Aug 17, 2022
6e41e0d
Add donut hole calculation to score (#1828)
mattbowen-usds Aug 18, 2022
cb4866b
Adding eamlis and fuds data to legacy pollution in score (#1832)
emma-nechamkin Aug 18, 2022
3ba1c62
Update to use new FSF files (#1838)
emma-nechamkin Aug 18, 2022
1ee26bf
Quick fix to kitchen or plumbing indicator
emma-nechamkin Aug 18, 2022
d892bce
Fast flag update (#1844)
emma-nechamkin Aug 19, 2022
ad1ce2b
Tiles fix (#1845)
emma-nechamkin Aug 19, 2022
e6385c1
Update etl_score_geo.py
emma-nechamkin Aug 19, 2022
4bf7773
Issue 1827: Add demographics to tiles and download files (#1833)
lucasmbrown-usds Aug 22, 2022
6418335
Updates backend constants to N (#1854)
emma-nechamkin Aug 23, 2022
637b8c3
updated to show T/F/null vs T/F for AML and FUDS (#1866)
emma-nechamkin Aug 25, 2022
d3efcbd
fix markdown
esfoobar-usds Aug 25, 2022
e539db8
tuple type
esfoobar-usds Aug 26, 2022
1c4d3e4
Score tests (#1847)
emma-nechamkin Aug 26, 2022
b0b7ff0
just testing that the boolean is preserved on gha (#1867)
emma-nechamkin Aug 31, 2022
5201f9e
Adding tests to ensure proper calculations (#1871)
emma-nechamkin Aug 31, 2022
ccd72e2
tribal tiles fix (#1874)
esfoobar-usds Sep 1, 2022
9c0e199
Pipeline tile tests (#1864)
emma-nechamkin Sep 1, 2022
d41153d
Add tests to make sure each source makes it to the score correctly (#…
mattbowen-usds Sep 6, 2022
426328e
Updating traffic barriers to include low pop threshold (#1889)
emma-nechamkin Sep 7, 2022
fb4c484
Remove no land tracts from map (#1894)
emma-nechamkin Sep 8, 2022
6e9c44e
Issue 1831: missing life expectancy data from Maine and Wisconsin (#1…
lucasmbrown-usds Sep 10, 2022
60164c8
Removing low pop tracts from FEMA population loss (#1898)
emma-nechamkin Sep 12, 2022
4d02525
1831 Follow up (#1902)
lucasmbrown-usds Sep 15, 2022
876655d
Add tests for all non-census sources (#1899)
mattbowen-usds Sep 19, 2022
aca2261
Issue 1900: Tribal overlap with Census tracts (#1903)
lucasmbrown-usds Sep 20, 2022
f70f30d
Improve score test documentation based on Lucas's feedback (#1835) (#…
mattbowen-usds Sep 23, 2022
d8dd4cf
Cleanup source tests (#1912)
mattbowen-usds Sep 23, 2022
6e0ef33
Add tribal count notebook (#1917) (#1919)
mattbowen-usds Sep 23, 2022
9e85375
Add tribal overlap to downloads (#1907)
mattbowen-usds Sep 23, 2022
9fb9874
Issue 1910: Do not impute income for 0 population tracts (#1918)
lucasmbrown-usds Sep 26, 2022
15d946c
updating click
esfoobar-usds Sep 26, 2022
2f61900
updating click
esfoobar-usds Sep 26, 2022
48d961b
Bump just jupyterlab (#1930)
mattbowen-usds Sep 27, 2022
4da55a9
Fixing link checker (#1929)
lucasmbrown-usds Sep 27, 2022
0f0d6db
Update deps safety says are vulnerable (#1937) (#1938)
mattbowen-usds Sep 28, 2022
8e5ed5b
Add demos for island areas (#1932)
mattbowen-usds Sep 29, 2022
247db4a
Reorder download fields, add plumbing back (#1942)
mattbowen-usds Sep 29, 2022
f4adf17
refactoring tribal (#1960)
lucasmbrown-usds Sep 30, 2022
f284d75
renaming geocorr to geocorr_urban
lucasmbrown-usds Sep 21, 2022
d4d72c8
placeholder etl files
lucasmbrown-usds Sep 21, 2022
7ceab51
wip on ETL
lucasmbrown-usds Sep 21, 2022
a3ad7e0
fixing up validation
lucasmbrown-usds Sep 21, 2022
9f0918d
adding todos
lucasmbrown-usds Sep 22, 2022
ed364fb
updating to directly calculate overlay
lucasmbrown-usds Sep 28, 2022
a6ba9f6
fixing pylint error
lucasmbrown-usds Sep 28, 2022
a7a4df0
wip
lucasmbrown-usds Sep 28, 2022
f080464
renaming
lucasmbrown-usds Sep 28, 2022
bfb08e4
pynb
lucasmbrown-usds Sep 28, 2022
74bf497
updating with tract area
emma-nechamkin Oct 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/data-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ jobs:
run: poetry show -v
- name: Install dependencies
run: poetry install
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
# TODO: investigate why caching layer started failing.
# if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
- name: Run tox
run: poetry run tox
28 changes: 17 additions & 11 deletions .github/workflows/deploy_be_staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ jobs:
uses: snok/install-poetry@v1
- name: Print Poetry settings
run: poetry show -v
- name: Install GDAL/ogr2ogr
run: |
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get -y install gdal-bin
ogrinfo --version
- name: Install dependencies
run: poetry add s4cmd && poetry install
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
Expand All @@ -47,12 +53,21 @@ jobs:
aws-access-key-id: ${{ secrets.DATA_DEV_AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.DATA_DEV_AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Download census geo data for later user
run: |
poetry run python3 data_pipeline/application.py pull-census-data -s aws
- name: Generate Score
run: |
poetry run python3 data_pipeline/application.py score-full-run
- name: Generate Score Post
run: |
poetry run python3 data_pipeline/application.py generate-score-post -s aws
poetry run python3 data_pipeline/application.py generate-score-post
- name: Generate Score Geo
run: |
poetry run python3 data_pipeline/application.py geo-score
- name: Run Smoketests
run: |
poetry run pytest data_pipeline/ -m smoketest
- name: Deploy Score to Geoplatform AWS
run: |
poetry run s4cmd put ./data_pipeline/data/score/csv/ s3://justice40-data/data-pipeline-staging/${{env.PR_NUMBER}}/${{env.SHA_NUMBER}}/data/score/csv --recursive --force --API-ACL=public-read
Expand All @@ -71,12 +86,6 @@ jobs:
repo-token: ${{ secrets.GITHUB_TOKEN }}
repo-token-user-login: "github-actions[bot]"
allow-repeats: false
- name: Install GDAL/ogr2ogr
run: |
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get -y install gdal-bin
ogrinfo --version
- name: Set timezone for tippecanoe
uses: szenius/set-timezone@v1.0
with:
Expand All @@ -94,9 +103,6 @@ jobs:
mkdir -p /usr/local/bin
cp tippecanoe /usr/local/bin/tippecanoe
tippecanoe -v
- name: Generate Score Geo
run: |
poetry run python3 data_pipeline/application.py geo-score
- name: Generate Tiles
run: |
poetry run python3 data_pipeline/application.py generate-map-tiles
Expand All @@ -111,7 +117,7 @@ jobs:
# Deploy to S3 for the staging URL
message: |
** Map Deployed! **
Map with Staging Backend: https://screeningtool.geoplatform.gov/en/?flags=stage_hash=${{env.PR_NUMBER}}/${{env.SHA_NUMBER}}
Map with Staging Backend: https://screeningtool.geoplatform.gov/en?flags=stage_hash=${{env.PR_NUMBER}}/${{env.SHA_NUMBER}}
Find tiles here: https://justice40-data.s3.amazonaws.com/data-pipeline-staging/${{env.PR_NUMBER}}/${{env.SHA_NUMBER}}/data/score/tiles
repo-token: ${{ secrets.GITHUB_TOKEN }}
repo-token-user-login: "github-actions[bot]"
Expand Down
3 changes: 2 additions & 1 deletion data/data-pipeline/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ RUN apt-get update && apt-get install -y \
unzip \
wget \
python3-dev \
python3-pip
python3-pip \
gdal-bin

# tippeanoe
ENV TZ=America/Los_Angeles
Expand Down
56 changes: 46 additions & 10 deletions data/data-pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,14 @@
- [2. Extract-Transform-Load (ETL) the data](#2-extract-transform-load-etl-the-data)
- [3. Combined dataset](#3-combined-dataset)
- [4. Tileset](#4-tileset)
- [5. Shapefiles](#5-shapefiles)
- [Score generation and comparison workflow](#score-generation-and-comparison-workflow)
- [Workflow Diagram](#workflow-diagram)
- [Step 0: Set up your environment](#step-0-set-up-your-environment)
- [Step 1: Run the script to download census data or download from the Justice40 S3 URL](#step-1-run-the-script-to-download-census-data-or-download-from-the-justice40-s3-url)
- [Step 2: Run the ETL script for each data source](#step-2-run-the-etl-script-for-each-data-source)
- [Table of commands](#table-of-commands)
- [ETL steps](#etl-steps)
- [Step 3: Calculate the Justice40 score experiments](#step-3-calculate-the-justice40-score-experiments)
- [Step 4: Compare the Justice40 score experiments to other indices](#step-4-compare-the-justice40-score-experiments-to-other-indices)
- [Data Sources](#data-sources)
Expand All @@ -26,21 +29,27 @@
- [MacOS](#macos)
- [Windows Users](#windows-users)
- [Setting up Poetry](#setting-up-poetry)
- [Downloading Census Block Groups GeoJSON and Generating CBG CSVs](#downloading-census-block-groups-geojson-and-generating-cbg-csvs)
- [Running tox](#running-tox)
- [The Application entrypoint](#the-application-entrypoint)
- [Downloading Census Block Groups GeoJSON and Generating CBG CSVs (not normally required)](#downloading-census-block-groups-geojson-and-generating-cbg-csvs-not-normally-required)
- [Run all ETL, score and map generation processes](#run-all-etl-score-and-map-generation-processes)
- [Run both ETL and score generation processes](#run-both-etl-and-score-generation-processes)
- [Run all ETL processes](#run-all-etl-processes)
- [Generating Map Tiles](#generating-map-tiles)
- [Serve the map locally](#serve-the-map-locally)
- [Running Jupyter notebooks](#running-jupyter-notebooks)
- [Activating variable-enabled Markdown for Jupyter notebooks](#activating-variable-enabled-markdown-for-jupyter-notebooks)
- [Miscellaneous](#miscellaneous)
- [Testing](#testing)
- [Background](#background)
- [Configuration / Fixtures](#configuration--fixtures)
- [Score and post-processing tests](#score-and-post-processing-tests)
- [Updating Pickles](#updating-pickles)
- [Future Enchancements](#future-enchancements)
- [ETL Unit Tests](#etl-unit-tests)
- [Future Enhancements](#future-enhancements)
- [Fixtures used in ETL "snapshot tests"](#fixtures-used-in-etl-snapshot-tests)
- [Other ETL Unit Tests](#other-etl-unit-tests)
- [Extract Tests](#extract-tests)
- [Transform Tests](#transform-tests)
- [Load Tests](#load-tests)
- [Smoketests](#smoketests)

<!-- /TOC -->

Expand Down Expand Up @@ -196,7 +205,7 @@ Here's a list of commands:

## Local development

You can run the Python code locally without Docker to develop, using Poetry. However, to generate the census data you will need the [GDAL library](https://github.com/OSGeo/gdal) installed locally. Also to generate tiles for a local map, you will need [Mapbox tippecanoe](https://github.com/mapbox/tippecanoe). Please refer to the repos for specific instructions for your OS.
You can run the Python code locally without Docker to develop, using Poetry. However, to generate the census data you will need the [GDAL library](https://github.com/OSGeo/gdal) installed locally. For score generation, you will need [libspatialindex](https://libspatialindex.org/en/latest/). And to generate tiles for a local map, you will need [Mapbox tippecanoe](https://github.com/mapbox/tippecanoe). Please refer to the repos for specific instructions for your OS.

### VSCode

Expand All @@ -218,6 +227,7 @@ To install the above-named executables:

- gdal: `brew install gdal`
- Tippecanoe: `brew install tippecanoe`
- spatialindex: `brew install spatialindex`

Note: For MacOS Monterey or M1 Macs, [you might need to follow these steps](https://stackoverflow.com/a/70880741) to install Scipy.

Expand All @@ -229,10 +239,19 @@ If you want to run tile generation, please install TippeCanoe [following these i

- Start a terminal
- Change to this directory (`/data/data-pipeline/`)
- Make sure you have at least Python 3.7 installed: `python -V` or `python3 -V`
- Make sure you have at least Python 3.8 installed: `python -V` or `python3 -V`
- We use [Poetry](https://python-poetry.org/) for managing dependencies and building the application. Please follow the instructions on their site to download.
- Install Poetry requirements with `poetry install`

### Running tox

Our full test and check suite is run using tox. This can be run using commands such
as `poetry run tox`.

Each run can take a while to build the whole environment. If you'd like to save time,
you can use the previously built environment by running `poetry run tox -e lint`
which will drastically speed up the process.

### The Application entrypoint

After installing the poetry dependencies, you can see a list of commands with the following steps:
Expand Down Expand Up @@ -303,7 +322,11 @@ see [python-markdown docs](https://github.com/ipython-contrib/jupyter_contrib_nb

### Background

For this project, we make use of [pytest](https://docs.pytest.org/en/latest/) for testing purposes. To run tests, simply run `poetry run pytest` in this directory (i.e., `justice40-tool/data/data-pipeline`).
<!-- markdown-link-check-disable -->
For this project, we make use of [pytest](https://docs.pytest.org/en/latest/) for testing purposes.
<!-- markdown-link-check-enable-->

To run tests, simply run `poetry run pytest` in this directory (i.e., `justice40-tool/data/data-pipeline`).

Test data is configured via [fixtures](https://docs.pytest.org/en/latest/explanation/fixtures.html).

Expand Down Expand Up @@ -350,7 +373,8 @@ We have four pickle files that correspond to expected files:

To update the pickles, let's go one by one:

For the `score_transformed_expected.pkl`, put a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L58), before the `pdt.assert_frame_equal` and run:
For the `score_transformed_expected.pkl`, put a breakpoint on [this line]
(https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L62), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_transform_score`

Once on the breakpoint, capture the df to a pickle as follows:
Expand Down Expand Up @@ -378,7 +402,7 @@ score_data_actual.to_pickle(data_path / "data_pipeline" / "etl" / "score" / "tes

Then take out the breakpoint and re-run the test: `pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_score_data`

For the `tile_data_expected.pkl`, put a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L86), before the `pdt.assert_frame_equal` and run:
For the `tile_data_expected.pkl`, put a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L90), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_tile_data`

Once on the breakpoint, capture the df to a pickle as follows:
Expand Down Expand Up @@ -418,7 +442,9 @@ In the future, we could adopt any of the below strategies to work around this:

1. We could use [pytest-snapshot](https://pypi.org/project/pytest-snapshot/) to automatically store the output of each test as data changes. This would make it so that you could avoid having to generate a pickle for each method - instead, you would only need to call `generate` once , and only when the dataframe had changed.

<!-- markdown-link-check-disable -->
Additionally, you could use a pandas type schema annotation such as [pandera](https://pandera.readthedocs.io/en/stable/schema_models.html?highlight=inputschema#basic-usage) to annotate input/output schemas for given functions, and your unit tests could use these to validate explicitly. This could be of very high value for annotating expectations.
<!-- markdown-link-check-enable-->

Alternatively, or in conjunction, you could move toward using a more strictly-typed container format for read/writes such as SQL/SQLite, and use something like [SQLModel](https://github.com/tiangolo/sqlmodel) to handle more explicit type guarantees.

Expand Down Expand Up @@ -485,3 +511,13 @@ See above [Fixtures](#configuration--fixtures) section for information about whe
These make use of [tmp_path_factory](https://docs.pytest.org/en/latest/how-to/tmp_path.html) to create a file-system located under `temp_dir`, and validate whether the correct files are written to the correct locations.

Additional future modifications could include the use of Pandera and/or other schema validation tools, and or a more explicit test that the data written to file can be read back in and yield the same dataframe.

### Smoketests

To ensure the score and tiles process correctly, there is a suite of "smoke tests" that can be run after the ETL and score data have been run, and outputs like the frontend GEOJSON have been created.
These tests are implemented as pytest test, but are skipped by default. To run them.

1. Generate a full score with `poetry run python3 data_pipeline/application.py score-full-run`
2. Generate the tile data with `poetry run python3 data_pipeline/application.py generate-score-post`
3. Generate the frontend GEOJSON with `poetry run python3 data_pipeline/application.py geo-score`
4. Select the smoke tests for pytest with `poetry run pytest data_pipeline/tests -k smoketest`
18 changes: 18 additions & 0 deletions data/data-pipeline/data_pipeline/application.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
score_post,
)
from data_pipeline.etl.sources.census.etl_utils import (
check_census_data_source,
reset_data_directories as census_reset,
zip_census_data,
)
Expand Down Expand Up @@ -96,6 +97,23 @@ def census_data_download(zip_compress):
sys.exit()


@cli.command(help="Retrieve census data from source")
@click.option(
"-s",
"--data-source",
default="local",
required=False,
type=str,
help=dataset_cli_help,
)
def pull_census_data(data_source: str):
logger.info("Pulling census data from %s", data_source)
data_path = settings.APP_ROOT / "data" / "census"
check_census_data_source(data_path, data_source)
logger.info("Finished pulling census data")
sys.exit()


@cli.command(
help="Run all ETL processes or a specific one",
)
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def validate_new_data(
assert (
checking_df[score_col].nunique() <= 3
), f"Error: there are too many values possible in {score_col}"
assert (True in checking_df[score_col].unique()) & (
assert (True in checking_df[score_col].unique()) | (
False in checking_df[score_col].unique()
), f"Error: {score_col} should be a boolean"

Expand Down
3 changes: 2 additions & 1 deletion data/data-pipeline/data_pipeline/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@

# set root dir
settings.APP_ROOT = pathlib.Path(data_pipeline.__file__).resolve().parent

settings.DATA_PATH = settings.APP_ROOT / "data"
settings.REQUESTS_DEFAULT_TIMOUT = 3600
# To set an environment use:
# Linux/OSX: export ENV_FOR_DYNACONF=staging
# Windows: set ENV_FOR_DYNACONF=staging
Loading