Skip to content

Under Development: Electronic health record queries

meliao edited this page Jul 17, 2020 · 1 revision

NOTE: This feature is currently under development.

The feature has been added to the EHR_development branch, which is not currently stable.

Querying EHR with YAML file

EHR queries work much like the yaml-enabled phenotype queries. To filter for certain individuals, a samples_filters block can be specified just as in a phenotype query. In the query section, a table must be specified, and then optional tags are max_records, columns, and records_filters. We will go through these tags individually, and then see an example putting them together.

  • table specifies the name of the database table to query. The tables are loaded into ukbREST with the same names given to them in the UK Biobank data showcase.
  • columns specifies which columns should be returned. If not specified, ukbREST defaults to returning all columns from the table.
  • records_filters is a list of filters for the rows of the EHR table. This acts much like the samples_filters section. Row filters could be things like - diag_icd9 = 'A123' or - dsource != 'HES'.
  • max_records can be specified to set a limit on the number of records returned by ukbREST. By default, there is no limit.

So an EHR query located at ~/ehr_query.yaml may look like this:

$ cat ~/ehr_query.yaml
samples_filters:
    - eid not in (select eid from withdrawals)
    - c31_0_0 = 0

ehr_query:
    table: gp_clinical
    records_filters:
        - data_provider = 2
        - event_dt > 31/12/1999
    columns:
        - event_dt
        - read_2
        - value_1
        - value_2
        - value_3
    max_records: 1000

This query requests records from gp_clinical, the table of clinical primary care events. Columns event_dt, read_2, value_1, value_2, and value_3 are requested. Records requested are those provided by data_provider 2 (a GB data provider -- check the UK Biobank's primary care documentation for more details) and those occurring after December 31, 1999. The individuals are filtered to ensure that they have not withdrawn consent eid not in (select eid from withdrawals) and they are female c31_0_0 = 0. This will likely generate a large amount of records, so only the first 1000 will be returned.

This query can be executed similarly to the phenotype queries using curl. The yaml file and section are specified in the curl command to access the resource at /ukbrest/api/v1.0/ehr :

curl -H accept:text/csv \
  "http://127.0.0.1:5000/ukbrest/api/v1.0/ehr" \
  -F file=@ehr_query.yaml \
  -F section=ehr_query \
  > my_data.csv