Codebase for the Health Equity Tracker, Satcher Health Leadership Institute, Morehouse School of Medicine
Prompted by the COVID-19 pandemic, the Health Equity Tracker was created in 2020 to aggregate up-to-date demographic data from the hardest-hit communities. The Health Equity Tracker aims to give a detailed view of health outcomes by race, ethnicity, sex, socioeconomic status, and other critical factors. Our hope is that it will help policymakers understand what resources and support affected communities need to be able to improve their outcomes.
-
In your browser, create a fork of the Health Equity Tracker repo: https://github.com/SatcherInstitute/health-equity-tracker/fork
-
In your terminal, clone your new forked repo down to your local development machine (replace placeholder with your github username):
git clone https://github.com/<your-github-username>/health-equity-tracker.git
-
Set the original repo to be "origin":
git remote set-url origin https://github.com/SatcherInstitute/health-equity-tracker.git
-
Set your forked repo to a memorable remote name:
git remote add <your-remote-name> <your-forked-git-url>
For example, Ben would do
git remote add ben https://github.com/benhammondmusic/health-equity-tracker.git
-
Confirm your remote and origin are set up as expected:
git remote -v
Example output for Ben:
ben https://github.com/benhammondmusic/health-equity-tracker.git (fetch) ben https://github.com/benhammondmusic/health-equity-tracker.git (push) origin https://github.com/SatcherInstitute/health-equity-tracker.git (fetch) origin https://github.com/SatcherInstitute/health-equity-tracker.git (push)
Our repo requires Node, and is also configured to run helpful code linting, formatting, and error-checking automatically on every commit. On Mac, the easiest way is to first ensure you have Homebrew installed, and then use that to install pre-commit. If you don't already have it installed, you can also install Node via Homebrew as well.
brew install pre-commit
After it installs successfully, you need to use it to install the HET pre-commit hooks within your local .git
pre-commit install
Note: If you have existing git hooks (like from Husky) you need to force install:
pre-commit install -f
On machines without Homebrew you can use Python to install pre-commit:
-
Install Python: Make sure Python is installed on your system. You can download and install Python from the official website: https://www.python.org/downloads/.
-
Install pre-commit package: Open the command prompt and run the following command to install the pre-commit package using pip:
pip install pre-commit
-
Add Python Scripts directory to PATH: If Python Scripts directory is not added to your PATH environment variable, you need to add it. The Python Scripts directory is usually located at
C:\Python<version>\Scripts
. You can add it to your PATH by following these steps:- Right-click on "This PC" or "My Computer" and select "Properties".
- Click on "Advanced system settings" on the left side.
- In the System Properties window, click on the "Environment Variables" button.
- In the Environment Variables window, under "System variables", select the "Path" variable and click on "Edit".
- Click on "New" and add the path to the Python Scripts directory (e.g.,
C:\Python<version>\Scripts
). - Click "OK" on all windows to save the changes.
-
Verify installation: To verify that pre-commit is installed correctly, you can run the following command:
pre-commit --version
This should display the version of pre-commit installed on your system. Now pre-commit should be installed system-wide on your Windows machine.
-
Run pre-commit install to set up the git hook scripts:
pre-commit install
.Your output should look something like this:
pre-commit installed at .git/hooks/pre-commit
-
In your terminal, change into the health-equity-tracker frontend directory:
cd health-equity-tracker/frontend
-
Duplicate the example environmental variables file into a new, automatically git-ignored local development file:
cp -i .env.example .env.development
-
Install the node modules:
npm i
Note: If you are using VSCode, ensure you install the recommended extensions including Biome, which we use for linting and formatting JavaScript-based files.
-
While still in the
health-equity-tracker/frontend/
folder, runnpm run dev
-
In your browser, visit http://localhost:3000
-
To run once:
npm run test
-
To run in watch mode, so saved changes to the codebase will trigger reruns of affected tests:
npm run test:watch
- These tests automatically run:
- against the dynamic Netlify deploy link on all PR updates
- against the <dev.healthequitytracker.org> staging site on PR merges to
main
- against the <healthequitytracker.org> production site every night
- To manually run full suite of tests locally (ensure the localhost server is still running first):
npm run e2e
- To run subsets of the full test suite locally, just add the filename (without the path) or even a portion of a work after the command:
npm run e2e statins.nightly.spec.ts
runs the single filenpm run e2e hiv
runs all tests that include the stringhiv
in the filename
- To run the tests locally, but target either the production or staging deployments instead of localhost:
npm run e2e-prod
andnpm run e2e-staging
respectivally. Target specific test files the same way described above.
-
Ensure you assign yourself to the issue(s) that this PR will address. Create one if it doesn't exist, assigning the correct Milestones if needed.
-
Ensure your local main branch is up to date with the origin main branch:
git pull origin main
-
Ensure your forked repo's main branch is up to date:
-
first time to set the upstream for the main branch
git push -u <your-remote-name> main
-
ongoing, simply
git push
-
-
Create and switch to a local feature branch from main:
git checkout -b <new-feature-branch-name>
(we don't follow any particular conventions here or in commit messages, just make it easy to type and relevant)
-
Continuously ensure the branch is up to date with the origin main branch which is often updated several times a day:
git pull origin main
-
If you encounter merge conflicts, resolve them. Ben likes VSCode's new conflict resolution split screen feature, and also prefers setting VSCode as the default message editor rather than VIM:
git config --global core.editor "code --wait"
-
Make changes to the code base, save the files, add those changes to staging:
git add -p`# yes/no your way through the chunks of changes
-
Commit those changes when you're ready:
git commit -m "adds new stuff"
-
Ensure the pre-commit checks pass. If not, make the fixes as required by the linters and type-checker, etc., and run the same commit command again (hit ⬆ key to cycle through your previously run terminal commands)
-
Push to your forked remote:
-
First time:
git push -u <your-remote-name> <new-feature-branch-name>
-
Ongoing code changes:
git push
-
-
CMD+Click (CTRL+Click for Windows) on the URL under this line in the logged message:
Create a pull request for 'new-feature-branch-name' on GitHub by visiting:
to launch the web UI for your new pull request -
In the browser with the new PR open, edit the title to make it a meaningful description of what the PR actively does to the code.
-
Please fill in the templated sections as relevant, especially triggering auto-completion of issues if true using
closes #1234
orfixes #1234
somewhere in the description text of the PR. -
A preview link is generated automatically by Netlify and posted to the PR comments; check it out to manually confirm changes appeared to the frontend as you expected.
-
When ready, request a review. If you are unable to request a review, your username may need permissions first; please reach out to a team member.
-
Once your PR is approved (and you've ensured CI tests have passed), you can "Squash and Merge" your PR. Once complete, feel free to delete the branch from your remote fork (using the purple button).
-
Switch back to main:
git switch main
-
Delete the feature branch
git branch -D <new-feature-branch-name>
-
Pull those new updates from origin main into your local main:
git pull origin main
-
Push those new updates to your remote main:
git push
Everything below is more detailed, advanced info that you probably won't need right away. Congratulations!!
The frontend consists of
-
health-equity-tracker/frontend/
: A React app that contains all code and static resources needed in the browser (html, TS, CSS, images). This app was bootstrapped with Create React App and later migrated to Vite. -
health-equity-tracker/frontend_server/
: A lightweight server that serves the React app as static files and forwards data requests to the data server. -
health-equity-tracker/data_server/
: A data server that responds to data requests by serving data files that have been exported from the data pipeline.
You can force specific dataset files to read from the /public/tmp
directory by setting an environment variable with the name VITE_FORCE_STATIC
variable to a comma-separated list of filenames. For example, VITE_FORCE_STATIC=my_file1.json,my_file2.json
would force my_file1.json
and my_file2.json
to be served from /public/tmp
even if VITE_BASE_API_URL
is set to a real server url.
The VITE_BASE_API_URL
can be changed for different setups:
- You can deploy the frontend server to your own GCP project
- You can run the frontend server locally (see below)
- You can run Docker locally (see below)
- You can set it to an empty string or remove it to make the frontend read files from the
/public/tmp
directory. This allows testing behavior by simply dropping local files into that directory.
Note: Building manually is not required for development, but helpful for debugging deployment issues as this step is run during CI. To create a "production" development build do: npm run preview
. For more finetuned control, run npm run build:${DEPLOY_CONTEXT}
This will use the frontend/.env.${DEPLOY_CONTEXT}
file for environment variables and outputs bundled files in the frontend/build/
directory. These are the files that are used for hosting the app in production environments.
The backend consists of:
health-equity-tracker/airflow/
: Code that controls the DAGs which orchestrate the execution of these various microserviceshealth-equity-tracker/config/
: Terraform configuration for setting permissions and provisioning needed resources for cloud computinghealth-equity-tracker/data/
: In code-base "bucket" used to store manually downloaded data from outside sources where it isn't possible to fetch new data directly via and API endpoint or linkable file URLhealth-equity-tracker/e2e_tests/
: Automated tests ensuring all services work together as expected; not to be confused with the Playwright E2E tests found in/frontend
health-equity-tracker/exporter/
: Code for the microservice responsible for taking HET-style data from HET BigQuery tables and storing them in buckets as .json files. NOTE: County-level files are broken up by state when exporting.health-equity-tracker/python/
: Code for the Python modules responsible for fetching data from outside sources and wrangling into a HET-style table with rows for every combination of demographic group, geographic area, and optionally time period, and columns for each measured metrichealth-equity-tracker/requirements/
: Packages required for the HEThealth-equity-tracker/run_gcs_to_bq/
: Code for the microservice responsible for running datasource specific modules found in/python
and ultimately exporting the produced dataframes to BigQueryhealth-equity-tracker/run_ingestion/
: (PARTIALLY USED) Code for the microservice responsible for caching datasource data into GCP buckets, for later use by therun_gcs_to_bq
operator. This service is only used by some of our older data sources, likeacs_population
, but often for newer datasources we simply load data directly from therun_gcs_to_bq
microservicehealth-equity-tracker/aggregator/
: DEPRECATED: Code for the microservice previously responsible for running SQL merges of Census data
- (One-time) Ensure you have the right version of Python installed (as found in pyproject.toml). You can install the correct version using Homebrew (on Mac) with
brew install python@3.12
- (One-time) Create a virtual environment in your project directory, for example:
python3 -m venv .venv
- (Every time you develop on Python code) Activate the venv (every time you want to update Python ):
source .venv/bin/activate
- (One-time) Install pip-tools and other packages as needed:
pip install pip-tools
- (One-time) Install all dependencies across all Python services on your local machine:
./install-all-python.sh
Note: If you are using VSCode, ensure you install the recommend extensions, including Black which is used for linting/formatting.
- Follow the rest of the instructions below these steps for one-time configurations needed.
- Pull the latest changes from the official repo.
- Tip: If your official remote is named
origin
, rungit pull origin main
- Tip: If your official remote is named
- Create a local branch, make changes, and commit to your local branch. Repeat until changes are ready for review.
- From your local directory floor, change branches to the backend feature branch you want to test.
- Run
git push origin HEAD:infra-test -f
which will force push an exact copy of your local feature branch to the HET origin (not your fork)infra-test
branch. - This will trigger a build and deployment of backend images to the HET Infra TEST GCP project using the new backend code (and will also build and deploy the frontend the dev site using the frontend code from the
main
branch) - Once the
deployBackendToInfraTest
GitHub action completes successfully (ignoring the(infra-test) Terraform / Airflow Configs Process completed with exit code 1.
that unintentionally appears in the Annotations section), navigate to the test GCP projectNote: if you run this command again too quickly before the first run has completed, you might encounter
Error acquiring the state lock
and the run will fail. If you are SURE that this occurred because of your 2nd run being too soon after the 1st (and not because another team member is usinginfra-test
) then you can manually go into the Google Cloud Storage bucket that holds the terraform state, find the file nameddefault.tflock
and delete it or less destructively rename by adding today's date to the file name. - Navigate to Composer > Airflow and trigger the DAG that corresponds to your updated backend code
- Once DAG completes successfully, you should be able to view the updated data pipeline output in the test GCP project's BigQuery tables and also the exported .json files found in the GCP Buckets.
- Push your branch to your remote fork, use the github UI to open a pull request (PR), and add reviewer(s).
- When ready to merge, use the "Squash and merge" option
- Ensure all affected pipelines are run after both merging to
main
and after cutting a release to production.
Note: Pipeline updates should be non-breaking, ideally pushing additional data to the production codebase, followed by pushing updated frontend changes to ingest the new pipeline data, finally followed by removal of the older, now-unused data.
Note: All files in the airflows/dags directory will be uploaded to the test airflow environment. Please only put DAG files in this directory.
Unit tests run using pytest, which will recursively look for and execute test files (which contain the string test
in the file name).
To install, ensure your venv is activated, and run: pip install pytest
To run pytest against your entire, updated backend code:
pip install python/data_server/ python/datasources/ python/ingestion/ && pytest python/tests/
To run single test file follow this pattern (the -s
flag enables print()
statements to log even on passing tests):
pip install python/datasources/ && pytest python/tests/datasources/test_cdc_hiv.py -s
Much of the guidance in this readme is aimed towards ongoing development of the platform available at healthequitytracker.org, however we highly encourage interested parties to leverage this open-sourced code base and the data access it provides to advance health equity in their own research and communities.
The following section is not required for regular maintenance of the Health Equity Tracker, but can be extremely helpful for local development and cloud deployment of similar, forked projects.
Expand advanced configuration details
Copy frontend_server/.env.example
into frontend_server/.env.development
, and update DATA_SERVER_URL
to point to a specific data server url, similar to above.
To run the frontend server locally, navigate to the frontend_server/
directory and run:
node -r dotenv/config server.js dotenv_config_path=.env.development
This will start the server at http://localhost:8080
. However, since it mostly serves static files from the build/
directory, you will either need to
- run the frontend server separately and set the
VITE_BASE_API_URL
url tohttp://localhost:8080
(see above), or - go to the
frontend/
directory and runnpm run build:development
. Then copy thefrontend/build/
directory tofrontend_server/build/
Similarly to the frontend React app, the frontend server can be configured for local development by changing environment variables in frontend_server/.env.development
. Copy frontend_server/.env.example
to get started.
If you need to test Dockerfile changes or run the frontend in a way that more closely mirrors the production environment, you can run it using Docker. This will build both the frontend React app and the frontend server.
Run the following commands from the root project directory:
- Build the frontend Docker image:
docker build -t <some-identifying-tag> -f frontend_server/Dockerfile . --build-arg="DEPLOY_CONTEXT=development"
- Run the frontend Docker image:
docker run -p 49160:8080 -d <some-identifying-tag>
- Navigate to
http://localhost:49160
.
When building with Docker, changes will not automatically be applied; you will need to rebuild the Docker image.
Refer to Deploying your own instance with terraform for instructions on deploying the frontend server to your own GCP project.
To test a Cloud Run service triggered by a Pub/Sub topic, run
gcloud pubsub topics publish projects/<project-id>/topics/<your_topic_name> --message "your_message" --attribute=KEY1=VAL1,KEY2=VAL2
See Documentation for details.
Most python code should go in the /python
directory, which contains packages that can be installed into any service. Each sub-directory of /python
is a package with an __init__.py
file, a setup.py
file, and a requirements.in
file. Shared code should go in one of these packages. If a new sub-package is added:
-
Create a folder
/python/<new_package>
. Inside, add:- An empty
__init__.py
file - A
setup.py
file with options:name=<new_package>
,package_dir={'<new_package>': ''}
, andpackages=['<new_package>']
- A
requirements.in
file with the necessary dependencies
- An empty
-
For each service that depends on
/python/<new_package>
, follow instructions at Adding an internal dependency
To work with the code locally, run pip install ./python/<package>
from the root project directory. If your IDE complains about imports after changing code in /python
, re-run pip install ./python/<package>
.
Note: generally this should only be done for a new service. Otherwise, please add python code to the python/
directory.
When adding a new python root-level python directory, be sure to update .github/workflows/linter.yml
to ensure the directory is linted and type-checked.
-
Add the dependency to the appropriate
requirements.in
file.- If the dependency is used by
/python/<package>
, add it to the/python/<package>/requirements.in
file. - If the dependency is used directly by a service, add it to the
<service_directory>/requirements.in
file.
- If the dependency is used by
-
For each service that needs the dependency (for deps in
/python/<package>
this means every service that depends on/python/<package>
):- Run
cd <service_directory>
, thenpip-compile requirements.in
where<service_directory>
is the root-level directory for the service. This will generate arequirements.txt
file. - Run
pip install -r requirements.txt
to ensure your local environment has the dependencies, or runpip install <new_dep>
directly. Note, you'll first need to have followed the python environment setup described above Python environment setup.
- Run
-
Update the requirements.txt for unit tests
pip-compile python/tests/requirements.in -o python/tests/requirements.txt
If a service adds a dependency on /python/<some_package>
:
- Add
-r ../python/<some_package>/requirements.in
to the<service_directory>/requirements.in
file. This will ensure that any deps needed for the package get installed for the service. - Follow step 2 of Adding an external dependency to generate the relevant
requirements.txt
files. - Add the line
RUN pip install ./python/<some_package>
to<service_directory>/Dockerfile
Install Cloud SDK (Quickstart) Install Terraform (Getting started) Install Docker Desktop (Get Docker)
gcloud config set project <project-id>
- Install Docker
- Install Docker Compose
- Set environment variables
- PROJECT_ID
- GCP_KEY_PATH (See documentation on creating and downloading keys.)
- DATASET_NAME
- GCS_LANDING_BUCKET
- GCS_MANUAL_UPLOADS_BUCKET
- MANUAL_UPLOADS_DATASET
- MANUAL_UPLOADS_PROJECT
- EXPORT_BUCKET
From inside the airflow/dev/
directory:
-
Build the Docker containers
make build
-
Stand up the multi-container environment
make run
-
At the UI link below, you should see the list of DAGs pulled from the
dags/
folder. These files will automatically update the Airflow webserver when changed. -
To run them manually, select the desired DAG, toggle to
On
and clickTrigger Dag
. -
When finished, turn down the containers
make kill
More info on Apache Airflow in general.
To upload to BigQuery from your local development environment, use these setup directions with an experimental Cloud project. This may be useful when iterating quickly if your Cloud Run ingestion job isn’t able to upload to BigQuery for some reason such as JSON parsing errors.
Before deploying, make sure you have installed Terraform and a Docker client (e.g. Docker Desktop). See Set up above.
-
Edit the
config/example.tfvars
file and rename it toconfig/terraform.tfvars
-
Login to glcoud
gcloud auth application-default login
- Login to docker
gcloud auth configure-docker
- Build and push docker images
./scripts/push_images
- Setup your cloud environment with
terraform
pushd config
terraform apply --var-file digest.tfvars
popd
- Configure the airflow server
pushd airflow
./upload-dags.sh
./update-environment-variables.sh
popd
- Build and push docker images
./scripts/push_images
- Setup your cloud environment with
terraform
pushd config
terraform apply --var-file digest.tfvars
popd
- To redeploy, e.g. after making changes to a Cloud Run service, repeat steps 4-5. Make sure you run the docker commands from your base project dir and the terraform commands from the
config/
directory.
Terraform doesn't automatically diff the contents of cloud run services, so simply calling terraform apply
after making code changes won't upload your new changes. This is why Steps 4 and 5 are needed above. Here is an alternative:
Use terraform taint
to mark a resource as requiring redeploy. Eg terraform taint google_cloud_run_service.ingestion_service
.
You can then set the ingestion_image_name
variable in your tfvars file to <your-ingestion-image-name>
and gcs_to_bq_image_name
to <your-gcs-to-bq-image-name>
. Then replace Step 5 above with just terraform apply
. Step 4 is still required.
-
Go to Cloud Console.
-
Search for Composer
-
A list of environments should be present. Look for data-ingestion-environment
-
Click into the details, and navigate to the environment configuration tab.
-
One of the properties listed is Airflow web UI link.