-
Notifications
You must be signed in to change notification settings - Fork 33
Synthetic Data Guide
For development and testing purposes, the following claims data APIs:
- CMS Blue Button 2.0 (BB2.0)
- Beneficiary Claims Data API (BCDA)
- Data at the Point of Care (DPC)
- Medicare Claims Data to Part D Sponsors API (AB2D)
provide a set of public synthetic beneficiary and claims data, captured in synthetic Patient
, Coverage
, and ExplanationOfBenefit
(EOB) resources. This is a resource that you can use as you explore and develop against the APIs in the sandbox and production environments. The synthetic data contains no personally-identifiable information (PII) or personal health information (PHI).
Our synthetic data will remain unchanged so that you can rely on it for ongoing use. That is, you can write tests, demos, etc. against this data, and the underlying data should not change later and produce unexpected results.
A more detailed description of the synthetic data and what it contains can be found at the CMS synthetic data page.
For an even more detailed look into how the synthetic data is modeled, comparisons to production data spreads, and what it contains, see this white paper published by MITRE, the group that generates the synthetic data.
This data is intended for use in development and testing activities, e.g. building out an application prior to getting production access and having a safe set of data to use for ongoing development activities. It's also a great resource when demoing your application, giving you a safe way to do that without exposing any beneficiary PII or PHI.
The synthetic data will be available in each API's sandbox environment. Refer to specific API documentation for details.
- CMS Blue Button 2.0 (BB2.0)
- Beneficiary Claims Data API (BCDA) - Guide
- Data at the Point of Care (DPC)
- Medicare Claims Data to Part D Sponsors API (AB2D) - Understanding AB2D Data
It's easy to distinguish the synthetic data from production data: all synthetic records have Patient.id
and ExplanationOfBenefit.id
values that are negative. All production records have positive values for those fields.
Please note that, while we are working to improve the coverage and quality of our synthetic data, there are absolutely still differences between it and our production data. In general, you should expect that:
- Not all fields will be present in the synthetic data.
- Some of the fields that are present may not have realistic values.
- The various releases of synthetic data (see below) will have differences from each other.
Title | Beneficiary ID Ranges | Characteristics | Updates | Notes |
---|---|---|---|---|
Rolling Claims (10/23) | -10000010288392 to -10000010354108 | Beneficiary characteristics file | every Weds 7am EST until 11-12-2025 | 20k generation, 10k each to contracts Z1001 and Z1002 |
Rolling Claims (03/23) | -10000010254618 to -10000010288010 | Beneficiary characteristics file | every Weds 7am EST until 03-26-2025 | Beneficiary IDs starting with this set are sometimes non-consecutive to more accurately simulate production |
from Spring, 2022 | -10000000010000 to -10000000019999 | Beneficiary characteristics file | N/A | |
from October, 2021 | -10000000000000 to -10000000009999 | Beneficiary characteristics file | N/A | |
Initial load 2017 | -19990000000001 TO -19990000010000 | N/A | ||
Initial load 2017 | -20000000000001 TO -20000000010000 | N/A | ||
Initial load 2017 | -20140000000001 TO -20140000010000 | N/A | ||
Golden Beneficiary | -88888888888888 | Should contain all fields | N/A |
The beneficiary ID -88888888888888 is meant to contain most/all fields in use, so that various data elements can be checked at once.
Data to test this case:
- Bene id: -88888888888888
- MBI: 1S00ABBAA00
- MBI hash: 03aa83a37f9104254ac98ef38ce4590bf9270733d48bbf759bd09e3eeba9e7be
The beneficiary IDs -19990000008101 and -10000000000000 intentionally have the same MBI and MBI hash, and will cause an error when requested; this mimics an issue that occurs in production rarely that a bene can have the same mbi as another (This can happen due to data entry mistake, improper merged bene ids in CMS, upstream data issue, etc). If you do not wish to test this case, please avoid these beneficiaries and their associated MBI/MBI hash.
Data to test this case:
- Bene ids: -19990000008101 or -10000000000000
- MBI: 1S00E00AA00
- MBI hash: 6d382a8ac7419ea4c01021d99b1ebffef035df5189cd15f10e685e1a57131853
The expected API response in this case will look similar to this when searching by mbi hash:
{\“resourceType\“:\“OperationOutcome\“,\“issue\“:[{\“severity\“:\“error\“,\“code\“:\“processing\“,\“diagnostics\“:\“By hash query found more than one distinct BENE_ID: -19990000008101, -10000000000000\“}]}
The beneficiary ID -10000010288391 is set up specially to test the case where a patient's MBI has changed and returns MBI history when searching the Patient endpoint by the historical hashed MBI.
Data to test this case:
- Bene ids: -10000010288391
- MBI: 1S00EU8FE91 or 9V99EU8XY91
- MBI hash: 64f988837f5d2b1115751596a5a7ec3d0f802df460cce3cbda751058022f2152
The expected API response in this case will contain a 'historical' MBI extension under identifiers in the response.
Starting in 2023 noted synthetic data loads will have weekly claims added to some beneficiaries, in an attempt to simulate the way new claims arrive weekly in production. Datasets with updates are indicated in the table above in the Updates column, and will denote if there are update and how long they plan to update for. Entries with N/A in the Updates column will not receive updates.
- Updates to data sets that have them will always be applied early Wednesday (midnight GMT, or 7-8am EST) every week
- New claims will be applied to random beneficiaries in the dataset; the updated beneficiaries/claims should be obtainable using _lastUpdated, similar to production data
- The new claims will have CLM_THRU_DT (aka service-date in the API parameters) set to 1-2 weeks before from the week they are updated on, to simulate how in prod the claims require processing time before they reach the BFD database
- The first weekly update to a data set will start 2 weeks after the initial data load, to simulate real claim processing time (second Wednesday after the dataset is loaded will be the first update to the data). This two week delay will only occur once when the new set is first loaded.
- Updates will vary in claim types added, number of claims added, and number of beneficiaries affected; some will be larger or smaller than others
- Certain rare changes to BFD (such as a database schema change) may cause existing update data to be invalidated. If this occurs:
- It will be announced
- Existing data will still be available, but will no longer be updated weekly
- After the change goes live, a new dataset will be generated and loaded with weekly updates to ensure at least one set of data is updating (there will be new beneficiary ids/mbis for this new set)
Time | Description | Details |
---|---|---|
Generated 03/03/2023, loaded 06/12/23 | 10,000 beneficiaries with rolling updates and enhanced data accuracy |
Every week a number of beneficiaries in this dataset will be automatically updated similar to how beneficiaries receive weekly updates in production. This is intended to allow downstream consumers to test beneficiary updates using this set. These updates will occur every Wednesday at 7am EST (12:00Z) and continue until early 2025 or until the next schema change in BFD, whichever comes first. This dataset also contains MITRE adjustments to improve data to be more realistic by increasing the range of ages represented in the medicare data and other adjustments. |
Spring, 2022 | Additional 10,000 enhanced synthetic beneficiaries |
This updated synthetic data set includes many enhancements and fixes described in the categories below. Now covers a nationwide population, as opposed to a single state previously. Populations in different states reflect the demographics in those states based on US Census data as well as geographic price adjustments (based on the 2019 CMS Geographic Variation Public Use File). More accurate Medicare eligibility determinations based on End Stage Renal Disease. Additional beneficiary eligibility fields are now populated. Approximately 20% of beneficiaries change their current Part D plan each January, to simulate real rates of plan changes. Beneficiaries will also now take advantage of Indian Health Services (IHS) Health Centers and Health Stations, if they are eligible.Annual wellness encounter claims are now moved from the outpatient file to the carrier file, based on provider. Outpatient and Inpatient claims can now include encounters with coded "reasons for visit", even if no diagnoses or procedures are recorded for that encounter. Diagnoses that were "present on admission" are now properly identified, and external diagnosis codes are populated. Durable Medical Equipment (DME) claims now include supplies, not just implantable medical devices. This synthetic data set adds the following conditions and treatments: COVID-19 vaccinations, updates to Sepsis, Spina Bifida, Cerebral Palsy, and the prescribing of opioids for chronic pain and Opioid Use Disorder (OUD) treatment. This data set includes 143 additional claim fields. The count per claim file is below. Note that some fields repeat across claim types.
This release fixes a tab-related formatting error, correctly handles "future" death dates, corrects the data types for Claim ID and Beneficiary ID (from Integer to Long), and now correctly groups claim line items together. |
October, 2021 | 10,000 Enhanced synthetic beneficiaries | Adding 10,000 more synthetic beneficiaries to the sandbox and production environments with more realistic and robust data in response to user requests. For example, this set will contain:
|
Early 2021 | Added outpatient claims | Enhanced the initial set of beneficiaries to include outpatient claims, bringing the full list of EOB profiles to:
|
2017 | Initial 30,000 Synthetic Beneficiaries | 30,000 synthetic beneficiaries and about 1,000,000 synthetic claims that covered the following Explanation of Benefits (EOB) profiles:
|
For further questions on this synthetic data, please see the FAQ.
- Home
- For BFD Users
- Making Requests to BFD
- API Changelog
- Migrating to V2 FAQ
- Synthetic and Synthea Data
- BFD SAMHSA Filtering