-
Notifications
You must be signed in to change notification settings - Fork 44
setup
- Access to the AmsterdamUMCdb csv files: request access from Amsterdam Medical Data Science.
- Operating system: any OS capable of running Python and PostgreSQL, including Windows, macOS and Linux.
- Internal memory: 8GB should suffice for basic analysis and running the Jupyter notebooks. However, the recommended memory specification to run both PostgreSQL and the Jupyter Notebooks on the same machine is 16-32 GB.
- Disk space: Downloading and extracting the database files will require 90 GB of hard disk space. In addition, creating the PostgreSQL database requires about 128 GB of hard disk space and and an additional 144 GB for creating the indices to improve query performance.
We strongly recommend installing Python using Anaconda, a popular distribution that includes many useful modules for data science out-of-the-box. Install the (latest) Python 3.7 version distribution from Anaconda's distribution page.
PostgreSQL is an open source database management system (DBMS), available for most operating systems, including Windows,
macOS and Linux. We recommend the installation of the most recent version of PostgreSQL (version 12) from the
PostgreSQL download page. Please note your password for the postgres
superuser, and if you did not chose postgres
as the password, you need to modify these settings in
the config.SAMPLE.ini
file in the root
of the repository. Save the file
as config.ini
.
To connect to your postgreSQL server from Python, the psycopg2 package needs to be installed from the Anaconda Prompt/Shell using conda:
conda install -c anaconda psycopg2
Clone or download the AmsterdamUMCdb repository from GitHub. Follow the instructions on GitHub's online step-by-step guide, if needed: https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository.
Download the AmsterdamUMCdb zip file from and extract the files from the zip file to the data
folder of the cloned
AmsterdamUMCdb repository.
Start Jupyter notebook server from the command line (using Command Prompt on Windows or Terminal on Mac/Linux) by running:
jupyter notebook
From the Jupyter file browser, open the setup-amsterdamumc.ipynb
file from the setup-amsterdamumc
folder in the
cloned repository. The code in the notebook assumes there is a default postgres installation with a dabase
named postgres
, user postgres
with password postgres
. You should change these settings in
the config.SAMPLE.ini
file in the root
of the repository and save the file as [config.ini
]. To create the tables in the database run this Jupyter notebook,
either cell by cell (amsterdamumc
schema will be created, and all tables will be
added to this schema.
After the notebook has been run completely, the postgres database should contain all tables with the same number of
records we released. The output should state Verification: PASSED
.
It's highly recommended to create some useful indices to improve performance for common queries on identifiers like admissionid, itemid and measured times.
While the indices are being created, the postgreSQL should be available for querying using the notebooks in
the tables
folder (with lower performance). We
use plotly (version >4) for interactive plots in some notebooks. Plotly can be installed by using conda:
conda install -c plotly plotly
AmsterdamUMCdb - Freely Accessible ICU Database
version 1.0.2 March 2020
Copyright © 2003-2022 Laboratory for Critical Care Computational Intelligence - Amsterdam UMC - Amsterdam Medical Data Science