This repository contains data and code supporting a BuzzFeed News article on nursing home facilities, published April 21, 2020. See below for details.
All data analyzed here comes from the Centers for Medicare and Medicaid Services (CMS) Nursing Home Compare.
The following files are downloaded via the Socrata API and stored locally:
- data/health_deficiencies.csv: Health related deficiencies recorded from the past three inspection cycles. Note: This file is too large to store on GitHub and must be downloaded via the notebooks/get-deficiencies.ipynb notebook.
- data/staffing.csv: Staffing levels recorded by CMS.
- data/survey.csv: The date of each standard inspection across three inspection cycles.
This following file was downloaded manually from the Nursing Home Compare open data portal:
- data/NHCDownloadableDatabaseDictionaries: Data dictionaries for the above.
The following Jupyter notebooks, written in Python, fetch and analyze the data described above:
- notebooks/get-deficiencies.ipynb: Fetches the latest CMS nursing home health deficiencies dataset and saves it in
data
. - notebooks/get-staffing.ipynb: Fetches the latest CMS nursing home staffing data and saves it in
data
. - notebooks/get-survey.ipynb: Fetches the latest CMS survey data and saves it in
data
. - notebooks/analyze-deficiencies.ipynb: Analyzes the CMS nursing home health inspection data. Also makes use of the survey dataset.
- notebooks/analyze-staffing.ipynb: Analyzes the CMS nursing home staffing data.
The deficiencies dataset is organized as one deficiency per row. A facility may, and often does, incur more than one deficiency per inspection. A facility with no deficiencies will not appear in this dataset.
CMS-regulated nursing homes are inspected in 15-month "cycles." The time-based analyses in this repository divide facilities by cycle, rather than by year. To calculate the percent of facilities with certain deficiencies, we took as numerator all facilities inspected in a given inspection cycle. That number can be derived from the separate survey dataset.
Deficiencies are categorized by certain "F tags" which are defined in the supporting documents on data.medicare.gov. In 2017, a number of F tags for the same deficiency changed. CMS provides a crosswalk to link older tags with newer ones. To count deficiencies, we searched on both the old and the new F tags to capture a range of time. Infection control deficiencies were indicated with F tags 0880 and 0441. Serious medication errors were indicated with F tags 0760 and 0333. Pressure sore deficiencies were indicated with F tags 0686 and 0314.
CMS assigns a "scope and severity" level to violations. The most severe are labeled "immediate jeopardy" deficiencies. These are violations where one or more residents or employees are in immediate danger of being harmed, although a resident does not have to be harmed at the time for this violation to occur. According to the CMS Nursing Home Compare technical guide, there are three categories of immediate jeopardy violations which correspond to their scope: "isolated", "pattern" and "widespread". They are assigned letter codes I, J and K, respectively. These are available in the scope_severity_code
column of health_deficiencies.csv.
The analysis is written in Python 3, and requires the Python libraries specified in requirements.txt
. To replicate the analysis, run make replicate
from this repository's root directory.
All code in this repository is available under the MIT License. Files in the output/
directory are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Contact Scott Pham at scott.pham@buzzfeed.com.
Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.