Skip to content

Commit a6c6ea2

Browse files
committed
Add gbd config, remove nonalphanumerics from codes
Add json config for bulk upload of GBD ICD 10 codelists from IHME. These are supplied in dot-separated notation, which the ICD10 coding system on OpenCodelists does not expect so remove dots and remove blank rows which also cause issues.
1 parent 9997f7f commit a6c6ea2

File tree

2 files changed

+26
-0
lines changed

2 files changed

+26
-0
lines changed

codelists/scripts/bulk_import_codelists.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,13 @@ def update_name(name):
308308
for column in relevant_df_columns:
309309
codelist_df[column] = codelist_df[column].str.strip()
310310

311+
# Remove non-alphanumeric from code column
312+
if config.get("strip_nonalphanumeric_from_codes", False):
313+
codelist_df["code"] = codelist_df["code"].str.replace(
314+
"[^a-zA-Z0-9]", "", regex=True
315+
)
316+
codelist_df = codelist_df.dropna(subset=["code"])
317+
311318
return codelist_df
312319

313320

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"comment": "For loading of the IHME Global Burden of Disease ICD 10 bundles",
3+
"organisation": "ihme",
4+
"coding_systems": {
5+
"icd10": {
6+
"id": "icd10",
7+
"release": "icd10_2019-covid-expanded_20190101"
8+
}
9+
},
10+
"column_aliases": {
11+
"codelist_name": "bundle_name",
12+
"code": "cause_code",
13+
"term": "cause_name",
14+
"codelist_description": "bundle_id"
15+
},
16+
"description_template": "IHME GBD Bundle ID: %s",
17+
"strip_nonalphanumeric_from_codes": true,
18+
"sheet": "icd_10_mapping"
19+
}

0 commit comments

Comments
 (0)