Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIMIC-IV FHIR Import Crash - Could not initialize class org.apache.commons.text.lookup.StringLookupFactory #1783

Open
cyrilzakka opened this issue Mar 18, 2024 · 2 comments

Comments

@cyrilzakka
Copy link

Hello,

I'm following the instructions at https://github.com/kind-lab/mimic-fhir/blob/main/tutorial/mimic-fhir-tutorial-pathling.ipynb to import the MIMIC-IV FHIR ndjson into Pathling using the docker image here: https://hub.docker.com/r/aehrc/pathling. When running the following script:

from pathlib import Path
import requests
import json
import ndjson
import pandas as pd

import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams.update({'font.size': 20})

from fhirclient.models.parameters import Parameters, ParametersParameter
from py_mimic_fhir.lookup import MIMIC_FHIR_PROFILES

import_folder = 'file:///usr/share/staging' 
server = 'http://localhost:8080/fhir'


def generate_import_parameters(import_folder, profile, resource, mode):
    param_resource = Parameters()

    param_resource_type = ParametersParameter()
    param_resource_type.name= 'resourceType'
    param_resource_type.valueCode = resource

    param_url = {}
    param_url['name'] = 'url'
    param_url['valueUrl'] = f'{import_folder}/{profile}.ndjson'

    param_mode = ParametersParameter()
    param_mode.name= 'mode'
    param_mode.valueCode = mode

    param_source = ParametersParameter()
    param_source.name = 'source'
    param_source.part = [param_resource_type, param_url, param_mode]
    param_resource.parameter = [param_source]
    
    return param_resource.as_json()

def post_import_ndjson(server, param):
    url = f'{server}/$import'

    resp = requests.post(url,  json = param, headers={"Content-Type": "application/fhir+json"} )
    return resp 

mode = 'merge' # overwrite for fresh load (but not really since need to merge Observations not overwrite)

for profile, item in MIMIC_FHIR_PROFILES.items():
    resource = item['resource']
    # ObservationChartevents too large and crashing all the observation searches
    if (profile != 'ObservationChartevents'):
        param = generate_import_parameters(import_folder, profile, resource, mode)
        resp = post_import_ndjson(server, param)
        print(f"{profile}: {resp.json()['issue'][0]['diagnostics']}")

I get the error:

mimic-pathling-1  | 20:31:40.214 [Executor task launch worker for task 1.0 in stage 34.0 (TID 456)] [] ERROR org.apache.spark.executor.Executor - Exception in task 1.0 in stage 34.0 (TID 456): Could not initialize class org.apache.commons.text.lookup.StringLookupFactory
mimic-pathling-1  | 20:31:40.214 [Executor task launch worker for task 38.0 in stage 34.0 (TID 493)] [] ERROR o.a.s.s.e.d.FileFormatWriter - Job job_202403182028225607545436317964348_0034 aborted.
mimic-pathling-1  | 20:31:40.215 [Executor task launch worker for task 38.0 in stage 34.0 (TID 493)] [] ERROR org.apache.spark.util.Utils - Uncaught exception in thread Executor task launch worker for task 38.0 in stage 34.0 (TID 493)
mimic-pathling-1  | java.lang.NullPointerException: null
mimic-pathling-1  |     at org.apache.spark.scheduler.Task.$anonfun$run$3(Task.scala:144)
mimic-pathling-1  |     at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1509)
mimic-pathling-1  |     at org.apache.spark.scheduler.Task.run(Task.scala:142)
mimic-pathling-1  |     at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
mimic-pathling-1  |     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
mimic-pathling-1  |     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
mimic-pathling-1  |     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
mimic-pathling-1  |     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
mimic-pathling-1  |     at java.base/java.lang.Thread.run(Thread.java:829)
mimic-pathling-1  | 20:31:40.215 [Executor task launch worker for task 38.0 in stage 34.0 (TID 493)] [] ERROR org.apache.spark.executor.Executor - Exception in task 38.0 in stage 34.0 (TID 493): Could not initialize class org.apache.commons.text.lookup.StringLookupFactory

Any help or guidance would be greatly appreciated.

@johngrimes
Copy link
Member

Thanks for sending this through - let me try to reproduce and I will get back to you.

@johngrimes
Copy link
Member

Hi @cyrilzakka,

I've had a go at loading the full MIMIC-IV data set to reproduce your problem.

I was unable to reproduce the exact error you described, but I did come across a couple of data quality issues that would make it impossible to import this data set in its current form.

I have documented them here:

I am also hanging out for these issues to be resolved so that I can start playing around with this wonderful data set!

I did fix the former issue using a script (attached to the issue), but the second issue is a bit more involved to fix using post-processing and I have not been able to get to a fully importable data set yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants