Warning
This repository is no longer maintained. Development has moved to the ISARIC 3.0 Pipeline repository at https://github.com/globaldothealth/isaric-pipeline. The pipeline repository will standardise information in FHIR format, instead of bespoke schemas such as the one in this repository.
ISARIC Clinical Data Model development. This repository has the schemas and parser specifications. For the parsing library that does the data transformation, see adtl.
Each table in the ISARIC schema has a corresponding JSON Schema specification in schemas. These schemas supersede the schemas.py file in previous versions of this repository, as well as the taxonomy files, which are now contained within the JSON schemas.
Schemas are versioned by the folder name (dev
, v1
, v2
) under schemas. At
present, ISARIC schemas are under development, so they are located under dev
.
Once the schema is finalised, it will be renamed to v1
, following which only
additive changes will be performed on the schema. Breaking changes will require
a new version to be assigned.
Parser specification files, such as
isaric-ccpuk under parsers
describe the
field mappings that are parsed by adtl. The parser TOML (or JSON) file
follows the adtl
specification.
To transform the input files (usually database snapshots from REDCap), install
adtl. Use adtl --help
to look at
the options. As an example, to transform the REDCap data to the ISARIC schema
for the CCPUK study:
adtl isaric/parsers/isaric-ccpuk.toml data.csv
This will create a file isaric-ccpuk-{table}.csv
for each table specified in
the specification file. The file prefix (isaric-ccpuk
) can be changed by
passing the -o
(--output
) flag.
If a schema is specified for a particular table in the parser file, then adtl
uses it for validation. Validation status (true/false) and error messages are
reported in the adtl_valid
and adtl_error
columns in the output
respectively.
ISARIC source datasets have unique visit IDs, with every patient assigned a new
ID on every visit. There is a separate table (RELSUB in SDTM), which matches
visit IDs for the same subject. So if visit A012
and A342
refer to the same
patient, there would be an entry in the RELSUB table like: A012,A342,SAME
. For
datasets that have relsub matching (ref = "relsub"
present in subject ID
definition), we need to generate the RELSUB matching definition first, before
calling adtl with the RELSUB map. As an example, for the CCPUK RELSUB file
(corresponding parser), this is the
procedure to transform the source data with RELSUB mapping:
# Create the RELSUB mapping
python3 scripts/relsub.py CCPUK_RELSUB.csv -o isaric-ccpuk-relsub.json
adtl isaric/parsers/isaric-ccpuk.toml ../isaric-data/ccpuk.csv --include-def isaric-ccpuk-relsub.json
The RELSUB script expects the ID columns to be named USUBJID, RSUBJID; these can
be changed via parameters, see python3 scripts/relsub.py --help
.
Install pre-commit and setup pre-commit hooks
(pre-commit install
) which will do linting checks before commit.