Skip to content

Commit

Permalink
Document GAE setup of PIXL (#177)
Browse files Browse the repository at this point in the history
* Document GAE setup of PIXL

Used https://ucl-arc.slack.com/archives/C067480C989/p1701105627240909
as basis for documentation.

* Use consistent name for pixl directory

* Fix indentation

* Format the whole README while I'm at it

* Update README.md

Co-authored-by: Milan Malfait <m.malfait@ucl.ac.uk>

---------

Co-authored-by: Milan Malfait <m.malfait@ucl.ac.uk>
  • Loading branch information
stefpiatek and milanmlft authored Dec 11, 2023
1 parent f69aea7 commit 2e8a496
Showing 1 changed file with 134 additions and 20 deletions.
154 changes: 134 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# PIXL

PIXL Image eXtraction Laboratory

`PIXL` is a system for extracting, linking and de-identifying DICOM imaging data, structured EHR data and free-text data from radiology reports at UCLH.
Expand All @@ -8,92 +9,205 @@ PIXL is intended run on one of the [GAE](https://github.com/UCLH-Foundry/Book-of
several services orchestrated by [Docker Compose](https://docs.docker.com/compose/).

## Services

### [PIXL CLI](./cli/README.md)

Primary interface to the PIXL system.

### [Hasher API](./hasher/README.md)

HTTP API to securely hash an identifier using a key stored in Azure Key Vault.

### [Orthanc Raw](./orthanc/orthanc-raw/README.md)

A DICOM node which receives images from the upstream hospital systems and acts as cache for PIXL.

### [Orthanc Anon](./orthanc/orthanc-anon/README.md)

A DICOM node which wraps our de-identifcation and cloud transfer components.

### PostgreSQL

RDBMS which stores DICOM metadata, application data and anonymised patient record data.

### [Electronic Health Record Extractor](./pixl_ehr/README.md)
HTTP API to process messages from the `ehr` queue and populate raw and anon tables in the PIXL postgres instance.

HTTP API to process messages from the `ehr` queue and populate raw and anon tables in the PIXL postgres instance.

### [PACS Image Extractor](./pixl_pacs/README.md)
HTTP API to process messages from the `pacs` queue and populate the raw orthanc instance with images from PACS/VNA.

HTTP API to process messages from the `pacs` queue and populate the raw orthanc instance with images from PACS/VNA.

## Setup

### 0. Choose deployment environment
### 0. UCLH infrastructure setup

<details>
<summary>Install shared miniforge installation if it doesn't exist</summary>

Follow the suggestion for installing a central [miniforge](https://github.com/conda-forge/miniforge)
installation to allow all users to be able to run modern python without having admin permissions.

```shell
# Create directory with correct structure (only if it doesn't exist yet)
mkdir /gae/miniforge3
chgrp -R docker /gae/miniforge3
chmod -R g+rwxs /gae/miniforge3 # inherit group when new directories or files are created
setfacl -R -m d:g::rwX /gae/miniforge3
# Install miniforge
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh -p /gae/miniforge3
conda update -n base -c conda-forge conda
conda create -n pixl_dev python=3.10.*
```

The directory should now have these permissions

```shell
> ls -lah /gae/miniforge3/
total 88K
drwxrws---+ 19 jstein01 docker 4.0K Nov 28 12:27 .
drwxrwx---. 18 root docker 4.0K Dec 1 19:35 ..
drwxrws---+ 2 jstein01 docker 8.0K Nov 28 12:27 bin
drwxrws---+ 2 jstein01 docker 30 Nov 28 11:49 compiler_compat
drwxrws---+ 2 jstein01 docker 32 Nov 28 11:49 condabin
drwxrws---+ 2 jstein01 docker 8.0K Nov 28 12:27 conda-meta
-rw-rws---. 1 jstein01 docker 24 Nov 28 11:49 .condarc
...
```

</details>
<details>

<summary>If you haven't just installed the miniforge yourself, update your configuration</summary>

Edit `~/.bash_profile` to add `/gae/miniforge3/bin` to the PATH. for example

```shell
PATH=$PATH:$HOME/.local/bin:$HOME/bin:/gae/miniforge3/bin
```

Run the updated profile (or reconnect to the GAE) so that conda is in your PATH

```shell
source ~/.bash_profile
```

Initialise conda

```shell
conda init bash
```

Run the updated profile (or reconnect to the GAE) so that conda is in your PATH

```shell
source ~/.bash_profile
```

Activate an existing pixl environment

```shell
conda activate pixl_dev
```

</details>
<details>
<summary>Create an instance for the GAE if it doesn't already exist</summary>

Select a place for the deployment. On UCLH infrastructure this will be in `/gae`, so `/gae/pixl_dev` for example.

```shell
mkdir /gae/pixl_dev
chgrp -R docker /gae/pixl_dev
chmod -R g+rwxs /gae/pixl_dev # inherit group when new directories or files are created
setfacl -R -m d:g::rwX /gae/pixl_dev
# now clone the repository or copy an existing deployment
```

</details>

### 1. Choose deployment environment

This is one of dev|test|staging|prod and referred to as `<environment>` in the docs.

### 1. Initialise environment configuration
### 2. Initialise environment configuration

Create a local `.env` and `pixl_config.yml` file in the _PIXL_ directory:

```bash
cp .env.sample .env && cp pixl_config.yml.sample pixl_config.yml
```

Add the missing configuration values to the new files:

#### Environment

Set `ENV` to `<environment>`.

#### Credentials
- `EMAP_DB_`*
UDS credentials are only required for `prod` or `staging` deployments of when working on the EHR & report retriever component.
You can leave them blank for other dev work.
- `PIXL_DB_`*
These are credentials for the containerised PostgreSQL service and are set in the official PostgreSQL image.

- `EMAP_DB_`*
UDS credentials are only required for `prod` or `staging` deployments of when working on the EHR & report retriever component.
You can leave them blank for other dev work.
- `PIXL_DB_`*
These are credentials for the containerised PostgreSQL service and are set in the official PostgreSQL image.
Use a strong password for `prod` deployment but the only requirement for other environments is consistency as several services interact with the database.
- `PIXL_EHR_API_AZ_`*
These credentials are used for uploading a PIXL database to Azure blob storage. They should be for a service principal that has `Storage Blob Data Contributor`
on the target storage account. The storage account must also allow network access from the PIXL host machine.

#### Ports
Most services need to expose ports that must be mapped to ports on the host. The host port is specified in `.env`
Ports need to be configured such that they don't clash with any other application running on that GAE.

Most services need to expose ports that must be mapped to ports on the host. The host port is specified in `.env`
Ports need to be configured such that they don't clash with any other application running on that GAE.

## Run

### Start

From the _PIXL_ directory:

```bash
bin/pixldc pixl_dev up
```

### Stop

From the _PIXL_ directory:

```bash
bin/pixldc pixl_dev down
```

## Analysis

The number of DICOM instances in the raw Orthanc instance can be accessed from
`http://<pixl_host>:<ORTHANC_RAW_WEB_PORT>/ui/app/#/settings` and similarly with
`http://<pixl_host>:<ORTHANC_RAW_WEB_PORT>/ui/app/#/settings` and similarly with
the Orthanc Anon instance, where `pixl_host` is the host of the PIXL services
and `ORTHANC_RAW_WEB_PORT` is defined in `.env`.

The number of reports and EHR can be interrogated by connecting to the PIXL
database with a database client (e.g. [DBeaver](https://dbeaver.io/)), using
the connection parameters defined in `.env`. For example, to find the number of
The number of reports and EHR can be interrogated by connecting to the PIXL
database with a database client (e.g. [DBeaver](https://dbeaver.io/)), using
the connection parameters defined in `.env`. For example, to find the number of
non-null reports

```sql
select count(*) from emap_data.ehr_anon where xray_report is not null;
```


## Develop
See each service's README for instructions for individual developing and testing instructions.
For Python development we use [isort](https://github.com/PyCQA/isort) and [black](https://black.readthedocs.io/en/stable/index.html) alongside [pytest](https://www.pytest.org/).
There is support (sometimes through plugins) for these tools in most IDEs & editors.

See each service's README for instructions for individual developing and testing instructions.
For Python development we use [ruff](https://docs.astral.sh/ruff/) alongside [pytest](https://www.pytest.org/).
There is support (sometimes through plugins) for these tools in most IDEs & editors.
Before raising a PR, **run the full test suite** from the _PIXL_ directory with

```bash
bin/run-all-tests.sh
```
and not just the component you have been working on as this will help us catch unintentional regressions without spending GH actions minutes :-)

and not just the component you have been working on as this will help us catch unintentional regressions without spending GH actions minutes :-)

We run [pre-commit](https://pre-commit.com/) as part of the GitHub Actions CI. To install and run it locally, do:

Expand Down

0 comments on commit 2e8a496

Please sign in to comment.