Data on confirmed COVID-19 cases and tests as reported by Immigration and Customs Enforcement (ICE) from March 24, 2020 through present. This data originates from hourly downloads of https://ice.gov/coronavirus. Vera updates the data in this repository on a daily basis.
Vera's dashboard visualizing this data is available here: https://www.vera.org/tracking-covid-19-in-immigration-detention
On March 24, 2020, ICE launched a webpage that compiles information about COVID-19 in detention facilities, including the reported numbers of confirmed COVID-19 cases and tests for people detained by ICE. Researchers at the Vera Institute of Justice immediately began routinely archiving the page data, on a daily basis at first and then hourly as of April 17, 2020 at 5:00 PM EDT.
While ICE typically updates the numbers of cumulative and current cases for people in detention on this page about once per weekday, it replaces previously reported numbers rather than presenting them longitudinally, making it difficult to track how the numbers of confirmed COVID-19 cases and tests administered have changed over time. The datasets included in this repository present selected data from these archived page downloads over time in four domains:
- "Facility cases": The number of confirmed COVID-19 cases among people in detention per facility
- "National cases": The number of confirmed COVID-19 cases among people in detention nationally
- "National population": The current number of people in ICE detention nationally
- "National tests": The number of COVID-19 tests reportedly administered to detained people
These data all pertain to people detained by ICE; while the page previously listed case data for ICE employees at detention centers, these statistics were not regularly updated by ICE, do not include other facility staff, and have since been removed from the webpage following the V5 update.
As this dataset is intended to make transparent what ICE has reported on COVID-19
over time, the data are presented "as is" and reflect what ICE reported on the
webpage for each given download, including any errors in what ICE has reported.
For example, at Immigration Centers of America Farmville, a decrease in cumulative
cases was reported: on the page archived 2020-07-13 23:01:00 (page_id
:
9d206de59d1e1aad42e49ca5c7f6cc60
), the page listed 106 cumulative cases; on
the page archived 2020-07-15 14:01:00 (page_id
: df376692b19409cacac5d990e742b44f
),
315 cases were reported; on the page archived 2020-07-25 01:03:00, this cumulative
case count decreased to 289 (page_id
: 3a805c09ad6c87fa5507ded71b3eaa0f
).
While the content of the ICE webpage changes roughly once per weekday, the page layout has undergone five major revisions affecting the data available and how it is parsed, outlined below:
Version | First Appearance | Description | Example |
---|---|---|---|
V1 | 2020-03-24 22:47:00 | Original page layout. ICE included all information in a single "tab," reporting the national and facility-level number of confirmed cases. Test and population information not yet included. | Link |
V2 | 2020-04-22 17:01:00 | ICE began reporting the number of tests administered nationally within its webpage, reported in text as: "As of %m %d, ICE has administered {N} tests." Additionally, ICE partitions case data into a separate tab, "Confirmed Cases" | Link |
V3 | 2020-04-28 22:01:00 | ICE began reporting test information in its own container and began reporting the currently detained population in text. | Link |
V4 | 2020-06-01 21:01:00 | ICE created a separate tab, "ICE Detainee Statistics," for all COVID-19 summary statistics for people in detention. | Link |
V5 | 2020-11-23 18:02:00 | ICE removed an existing tab, "Employee Confirmed Cases," which ICE reportedly last updated on June 18, 2020. | Link |
V6 | 2021-01-13 00:02:00 | Current page layout. Page styling updated to reflect the USWDS standard. Tab layout was replaced by anchored navigation, but COVID-19 statistics among people detained was relatively unchanged. | Link |
Vera cautions users from relying solely on the
the listed and available timestamps on a given archived page due to inconsistencies and issues observed. Instead, we recommend using the page_downloaded
timestamp (that is, the date and time when Vera archives the webpage) alongside other timestamps provided by ICE.
The ice_date_updated
timestamp, which on its face seems to be automatically
generated, does not maintain fidelity to this expectation in all cases. For
example, while the page data was known to have been updated between page_id
0d14fa23d8f6d79b9be212468c473164
(archived 2020-03-30 21:48) and page_id
bb9d58d22b788cd20ff41b5e15c1e55e
(archived 2020-04-02 01:24), this timestamp
does not change.
The "as of" page timestamp has been subject to similar issues. For example, the page archived on 2020-06-30 18:01
(page_id
: 8f0aba53265c8fd4012fdb8f19765ff8
) lists the "as of" date as "6/2/2020,"
an apparent mistyping of "6/29/2020."
All timestamps within these data files have been converted to UTC.
A pervasive issue encountered in the archived pages is an inconsistency in how ICE refers to unique detention facilities across webpage versions - an issue similarly found in other ICE datasets. The facility_lookup
table shows how Vera standardizes facility strings from all archived webpages with consistent names, assigned unique identifiers, and relevant metadata (such as geolocation). The table contains all detention facilities that have appeared in each available version of ICE's published dedicated and non-dedicated facility lists since March 13, 2020, including those with no COVID-19 cases reported by ICE to date.
To standardize facility strings, Vera uses the "name" column from the dedicated and non-dedicated facility lists as a standardized facility_name
where a match is found; otherwise, Vera uses the listed facility name. As new facilities appear on ICE's COVID-19 webpage, facility_id
and geolocation data for these entries may not be immediately available, as Vera manually matches newly listed facilities to known existing facilities on an ongoing basis.
Vera uses the following sources to standardize facility strings and assign geolocation to facilities as reported by ICE on its COVID-19 webpage:
-
Facility locations table from The Marshall Project: First, Vera attempts to match to the locations table produced by The Marshall Project, which draws from a November 2017 detention facilities dataset obtained by the Immigrant Legal Resource Center (ILRC) through the Freedom of Information Act (FOIA).
-
ICE.gov facility list: Vera then attempts to match to ICE's list of detention centers and uses available location information to determine facility latitude and longitude.
-
Dedicated and non-dedicated facility lists: Vera uses the latitude and longitude associated with the facility zipcode as listed in these facility lists.
-
Manual geocoding: If no match is found within any of these sources, Vera manually assigns an identifier and determines location data via Google Maps.
ICE's COVID-19 webpage reports only on the number of currently detained people nationally. The agency does not report on the number of people detained cumulatively since the start of the pandemic, making it difficult to understand the relationship between the total number of tests administered, positive tests, and detained people at risk.
To fill this gap, Vera calculates the cumulative detained population by routinely archiving ICE's detention statistics reports, which are updated periodically and reported elsewhere on the ICE website. To compute the cumulative number of people detained between the start date and each available date thereafter, Vera uses the reported number of people currently detained as of March 14, 2020 and incrementally adds the number of people initially booked in to custody ("initial book-ins") between March 14, 2020 and each available date of subsequent archived reports.
Vera uses March 14, 2020 as the start date for the cumulative detained population, as it is the closest available report "as of" date relative to the date ICE confirmed the first COVID-19 case in detention (March 24, 2020), accounting for a seven-day-lag between exposure to COVID-19 and the onset of symptoms. (See the technical appendix to Vera's report "The Hidden Curve" for further explanation of this assumption.)
The archived ICE detention statistics reports aggregate the number of initial book-ins by month. Where the archived report's "as of" date occurs before the last date of a given month, the initial book-ins capture month-to-date. For initial book-ins between March 14-30, 2020, Vera subtracts the total for March 2020 reported as of March 14, 2020 (i.e., initial book-ins between March 1-14, 2020) from the final reported value for March 2020 (i.e. initial book-ins between March 1-30, 2020).
We include in this repository data on the four domains above in two versions:
- "
_hourly
" These tables contain parsed data from all page archives available, and are made available as Apache Parquet files due to file size limitations in GitHub. - "
_daily
": These tables are subsets of the correspondingdata_hourly/
tables. Rather than including records for all archived pages for a given day, these tables retain only the records associated with the latest page_downloaded timestamp for a given day (defined as 0:00 to 23:59 Eastern Time).
The directory tree below outlines the organization, followed by data dictionaries for each data file.
ice-detention-covid/
|-- README.md
|-- License.md
|-- data_hourly/
| |-- facility_cases_hourly.parquet
| |-- national_cases_hourly.parquet
| |-- national_population_hourly.parquet
| `-- national_tests_hourly.parquet
|-- data_daily/
| |-- facility_cases_daily.csv
| |-- national_cases_daily.csv
| |-- national_population_daily.csv
| `-- national_tests_daily.csv
|-- data_supplemental/
| `-- cumulative_detained.csv
`-- metadata/
|-- facility_lookup.csv
`-- parsing_metadata.csv
Reported COVID-19 cases, by facility
Variable | Type | Description |
---|---|---|
page_downloaded | datetime |
The date and time when Vera archived the webpage |
page_id | string |
A unique identifier for each archived page |
page_md5 | string |
The md5 checksum of the target page content, for deduplication |
facility_listed | string |
The facility name ICE listed on the archived webpage |
facility_id | string |
A unique identifier that Vera manually assigned to each listed facility |
facility_name | string |
The facility name standardized by Vera associated with each facility_id |
facility_cases_cumulative | numeric |
The number ICE reported on the archived webpage for each detention facility as "total confirmed COVID-19 cases." |
facility_cases_current | numeric |
Where available, the number ICE reported on the archived webpage for each detention facility as "confirmed cases currently under isolation or monitoring." |
facility_deaths | numeric |
Where available, the number ICE reported on the archived webpage for each detention facility as "detainee deaths" (i.e., the total number of people per facility who died in ICE custody after testing positive for COVID-19). |
ice_date_updated | datetime |
The italicized "Updated %mm/%dd/%yyyy" timestamp that appears below the page section corresponding to reports about people detained by ICE; for example, in the current version (V4), this timestamp is below the "COVID-19 ICE Detainee Statistics by Facility" table. |
ice_date_as_of_case | datetime |
Where available, the "AS OF %m/%d/%yyyy" timestamp at the top of the "COVID-19 ICE Detainee Statistics by Facility" table on the archived webpage. |
Reported COVID-19 cases, nationally
Variable | Type | Description |
---|---|---|
page_downloaded | datetime |
The date and time when Vera archived the webpage |
page_id | string |
A unique identifier for each archived page |
page_md5 | string |
The md5 checksum of the target page content, for deduplication |
cases_cumulative | numeric |
The cumulative number of confirmed COVID-19 cases ICE reported nationally. |
cases_current_header | numeric |
Where available, the number of current COVID-19 cases reported nationally in the page header |
cases_current_total_row | numeric |
Where available, the number of current COVID-19 cases reported nationally in the "TOTAL" row of the facility table |
deaths_cumulative | numeric |
Where available, the number of cumulative COVID-19 deaths reported nationally in the "TOTAL" row of the facility table |
ice_date_updated | datetime |
The italicized "Updated %mm/%dd/%yyyy" timestamp at the bottom-right of the case section |
ice_date_as_of_case | datetime |
Where available, the "AS OF %m/%d/%yyyy" timestamp at the top of the facility case table |
ice_date_as_of_case_current | datetime |
Where available, the "AS OF %m/%d/%yyyy" timestamp found alongside the current case report in the page header |
The reported population of people detained by ICE
Variable | Type | Description |
---|---|---|
page_downloaded | datetime |
The date and time when Vera archived the webpage |
page_id | string |
A unique identifier for each archived page |
page_md5 | string |
The md5 checksum of the target page content, for deduplication |
population_current | numeric |
Where available, the reported population in ICE detention, nationally |
ice_date_as_of_pop | datetime |
Where available, the "AS OF" timestamp reported alongside national population detained reports |
Reported COVID-19 tests administered, nationally
Variable | Type | Description |
---|---|---|
page_downloaded | datetime |
The date and time when Vera archived the webpage |
page_id | string |
A unique identifier for each archived page |
page_md5 | string |
The md5 checksum of the target page content, for deduplication |
tests_cumulative | numeric |
Where available, the number of total COVID-19 tests administered nationally |
ice_date_updated | datetime |
The italicized "Updated %mm/%dd/%yyyy" timestamp at the bottom-right of the case section |
ice_date_as_of_test | datetime |
Where available, the "AS OF" timestamp reported alongside test administration reports |
These tables are subsets of the corresponding data_hourly/
tables. Rather than including records for all archived pages for a given day, these tables retain only the records associated with the latest page_downloaded
timestamp for a given day (corresponding to 0:00 to 23:59 Eastern Time).
Note one exception where the cases_current
value in data_hourly/national_cases_daily.csv
does not reflect that of the latest page_downloaded
timestamp for a given day. On the page archived 2022-01-27 20:03:00, ICE reported "3,1292" current cases as of 2022-01-26. ICE did not correct this typo until the following morning (on the page archived 2022-01-28 13:03:00), when it updated the figure as 3,129 current cases as of 2022-01-26. Since the extra "2" in this typo incorrectly increased the number of current cases by an order of magnitude, Vera attributed the corrected value (3,192) to 2022-01-27 when summarizing daily numbers. The original values reported by ICE are documented in table data_hourly/national_cases_hourly.parquet
.
Variable | Type | Description |
---|---|---|
facility_id | string |
A unique identifier that Vera manually assigned to each listed facility |
page_downloaded_day | date |
The page_downloaded field, floored to the day |
facility_cases_cumulative | numeric |
The latest value assumed by the facility_cases_cumulative field in "_hourly" for this facility-day |
facility_cases_current | numeric |
Where available, the latest value assumed by the facility_cases_current field in "_hourly" for this facility-day |
facility_deaths | numeric |
Where available, the latest value assumed by the facility_deaths field in "_hourly" for this facility-day |
facility_name | string |
The facility name standardized by Vera associated with each facility_id |
facility_listed | string |
The latest value assumed by the facility_listed field in "_hourly" for a given facility_id on this day |
ice_date_updated | datetime |
The ice_date_updated field, floored to the day |
page_id | string |
The page_id for this record |
Variable | Type | Description |
---|---|---|
page_downloaded_day | date |
The page_downloaded field, floored to the day |
cases_cumulative | numeric |
The latest value assumed by the cases_cumulative field in "_hourly" for this day |
cases_current | numeric |
The latest value assumed by the max(cases_current_total_row , cases_current_header ) field in "_hourly" for this day |
deaths_cumulative | numeric |
The latest value assumed by the deaths_cumulative field in "_hourly" for this day |
ice_date_updated | datetime |
The latest value assumed by the ice_date_updated field for a given page_download_day |
page_id | string |
The page_id for this record |
Variable | Type | Description |
---|---|---|
page_downloaded_day | date |
The page_downloaded field, floored to the day |
population_current | numeric |
Where available, the latest value assumed by the tests field in "_hourly" for this day |
ice_date_as_of_pop | date |
The latest value assumed by the ice_date_as_of_pop field for a given page_download_day |
page_id | string |
The page_id for this record |
Variable | Type | Description |
---|---|---|
page_downloaded_day | date |
The page_downloaded field, floored to the day |
tests_cumulative | numeric |
Where available, the latest value assumed by the tests_cumulative field in the "_hourly" table for this day |
ice_date_as_of_test | datetime |
The latest value assumed by the ice_date_as_of_test field for a given page_download_day |
page_id | string |
The page_id for this record |
Calculated cumulative detained population over time
Variable | Type | Description |
---|---|---|
as_of_date | date |
The date on which the detention statistics resource was updated on ICE's website |
cumulative_detained_cbp_arrest | numeric |
The number of people cumulatively detained from CBP arrests, up to date |
cumulative_detained_ice_arrest | numeric |
The number of people cumulatively detained from ICE arrests, up to date |
cumulative_detained_total | numeric |
The number of people cumulatively detained from either agency, up to date |
Variable | Type | Description |
---|---|---|
facility_id | string |
A unique identifier that Vera manually assigned to each listed facility |
facility_first_case | date |
For facilities with any reported positive COVID-19 cases, the page_download_day on which this facility was first listed |
facility_cases_cumulative_max | numeric |
For facilities with any reported positive COVID-19 cases, the latest value assumed by facility_cases_cumulative |
facility_cases_current_latest | numeric |
For facilities with any reported positive COVID-19 cases, the latest value assumed by facility_cases_current |
facility_name | string |
The facility name standardized by Vera associated with each facility_id |
city | string |
Where available, the facility city |
state | string |
Where available, the facility state |
zip | string |
Where available, the facility 5-digit postal zipcode |
lat | numeric |
Where available, the facility latitude, in degrees |
lng | numeric |
Where available, the facility longitude, in degrees |
facility_listed | string |
The facility name(s) as listed by ICE on the archived webpage |
Page timestamps and other associated metadata
Variable | Type | Description |
---|---|---|
page_downloaded | datetime |
The date and time when Vera archived the webpage |
page_id | string |
A unique identifier for each archived page |
page_md5 | string |
The md5 checksum of the target page content, for deduplication |
ice_date_updated | datetime |
The italicized “Updated %mm/%dd/%yyyy” timestamp that appears below the “COVID-19 ICE Detainee Statistics by Facility” table on the archived webpage. |
ice_date_as_of_case | datetime |
Where available, the “AS OF %m/%d/%yyyy” timestamp at the top of the “COVID-19 ICE Detainee Statistics by Facility” table on the archived webpage. |
ice_date_as_of_case_current | datetime |
Where available, the “AS OF %m/%d/%yyyy” timestamp found alongside the current case report in the page header. |
ice_date_as_of_test | datetime |
Where available, the "AS OF" timestamp reported alongside test administration reports |
ice_date_as_of_pop | datetime |
Where available, the "AS OF" timestamp reported alongside national population detained reports |
ice_page_version | string |
The page version; see above for details |
By downloading the data, you hereby agree to all the terms specified in this license.
For questions or feedback about the data, please contact Noelle Smart at nsmart@vera.org.
In acknowledgement of the effort put forth to assemble this resource and in an effort to make this data available to all interested parties, please attribute this data to "researchers from the Center on Immigration and Justice at the Vera Institute of Justice" and link to this repository when using this resource. This work was a collaborative effort involving Adam Garcia, Noelle Smart, Zachary Lawrence, and Dennis Kuo. This data was first used in Vera's report "The Hidden Curve: Estimating the Spread of COVID-19 among People in ICE Detention."
The Vera Institute of Justice is a non-profit organization that works to build and improve justice systems that ensure fairness, promote safety, and strengthen communities.