Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly fewer matches for fluid administration (electrolytes) for new mimic-iv 3.1 anchor_year_group = "2020 - 2022" data #1839

Open
1 task done
mlondschien opened this issue Dec 28, 2024 · 0 comments

Comments

@mlondschien
Copy link

mlondschien commented Dec 28, 2024

Prerequisites

Description

We observe significantly fewer matches for fluid administration in the inputevents table for the new mimic-iv version 3.1 data with anchor_year_group = "2020 - 2022".

fluid.csv

import pandas as pd
from pathlib import Path
import gzip

path_new = Path("/path/to/miiv/")

with gzip.open(path_new / "icu" / "icustays.csv.gz") as f:
    icustays_new = pd.read_csv(f)

with gzip.open(path_new / "hosp" / "patients.csv.gz") as f:
    patients_new = pd.read_csv(f)

merged_new = pd.merge(
    left=icustays_new,
    right=patients_new,
    on="subject_id",
    how="left",
    validate="m:1"
)

fluids = pd.read_csv("fluid.csv")

with gzip.open(path_new / "icu" / "inputevents.csv.gz") as f:
    iter_csv = pd.read_csv(f, chunksize=10000, usecols=["stay_id", "itemid"])
    inputevents_new = pd.concat([c[c["itemid"].isin(fluids["itemid"])] for c in iter_csv])

print(f"inputevents_new: {len(inputevents_new)}")

merged = pd.merge(
    left=merged_new,
    right=inputevents_new,
    on="stay_id",
    how="left",
    validate="1:m"
)
print("\nraw counts")
sized = merged.groupby(["anchor_year_group", "itemid"]).size()
print(sized.reset_index().pivot(columns="anchor_year_group", index="itemid"))

print("\ncounts by los")
sized = sized / merged_new.groupby(["anchor_year_group"])["los"].sum()
print(sized.reset_index().pivot(columns="anchor_year_group", index="itemid"))

prints

inputevents_new: 2395955

raw counts
                            0                                                
anchor_year_group 2008 - 2010 2011 - 2013 2014 - 2016 2017 - 2019 2020 - 2022
itemid                                                                       
225158.0               467118      302927      321691      338959      147961
225159.0                 5382        3104        1659         981         405
225161.0                 1218        1362        3494        3480        1289
225943.0               185124      137696      154092      157131       65449
225944.0                27006       16943       18161       22241       10697
228341.0                   25          64         143         105          48

counts by los
                            0                                                
anchor_year_group 2008 - 2010 2011 - 2013 2014 - 2016 2017 - 2019 2020 - 2022
itemid                                                                       
225158.0             4.608710    4.666950    4.915976    5.411347    3.051171
225159.0             0.053100    0.047821    0.025352    0.015661    0.008352
225161.0             0.012017    0.020983    0.053394    0.055557    0.026581
225943.0             1.826483    2.121370    2.354783    2.508535    1.349654
225944.0             0.266448    0.261027    0.277530    0.355069    0.220588
228341.0             0.000247    0.000986    0.002185    0.001676    0.000990

Note that there is only 1/2 as many matches for each itemid for anchor_year_group = "2020 - 2022", both in absolute numbers and if normalized by length of stay.

Is this expected? Fluid management is an important task in the ICU. We were surprised to see such a drop in matches for the new mimic-iv 3.1 data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant