Skip to content

Commit

Permalink
feat(docs): transfer lapis and silo tutorial #547
Browse files Browse the repository at this point in the history
  • Loading branch information
fengelniederhammer committed Jan 15, 2024
1 parent 877852f commit db4932d
Show file tree
Hide file tree
Showing 5 changed files with 272 additions and 9 deletions.
8 changes: 6 additions & 2 deletions lapis2-docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ export default defineConfig({
label: 'Database Config',
link: '/references/database-config/',
},
{
label: 'Configuration',
link: '/references/configuration/',
},
],
},
{
Expand Down Expand Up @@ -107,8 +111,8 @@ export default defineConfig({
label: 'Maintainer tutorials',
items: [
{
label: 'Start your own instance of LAPIS and SILO',
link: '/tutorials/maintainer-tutorials/start-your-own-instance-of-lapis-and-silo',
label: 'Start LAPIS and SILO',
link: '/tutorials/maintainer-tutorials/start-lapis-and-silo',
},
],
},
Expand Down
19 changes: 19 additions & 0 deletions lapis2-docs/src/content/docs/references/configuration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Configuration
description: Reference for how to configure LAPIS and SILO
---

TODO https://github.com/GenSpectrum/LAPIS/issues/561

:::caution
TODO: probably incomplete
:::

SILO currently supports the following metadata types:

- `int`
- `float`
- `string`: String columns support indexing (configured via `generateIndex: true`). SILO internally stores precomputed bitmaps for those columns to speed up queries. Generating an index makes most sense for columns with many equal values.
- `pango_lineage`: Systematic classification of lineage with inheritance structure that can be computed for some pathogens. Also see https://github.com/GenSpectrum/LAPIS/wiki/4.6-Pango-lineage-query.
- `date`: Values must be valid dates in the form `YYYY-MM-DD`.
- `insertion`: A comma separated list of insertions. Each insertion has the form `<position>:<symbols>`. Example value: `123:CCG,501:AAAGGG`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
---
title: Start LAPIS and SILO
description: Tutorial to start LAPIS and SILO with Docker
---

Every LAPIS instance needs to be backed by a SILO instance, that acts as data source.
SILO could be operated stand-alone.
LAPIS is meant as a layer of convenience and abstraction around SILO.

We provide Docker images of SILO and LAPIS that are ready to use.
We recommend using those Docker images, so in this tutorial, we explain how to use them.
You will build a Docker Compose file step by step.

### Prerequisites

- You have [Docker](https://www.docker.com/) installed.
- Some knowledge on how to use Docker and Docker Compose.
- Make sure you have the latest Docker images:

```shell
docker pull ghcr.io/genspectrum/lapis-v2
docker pull ghcr.io/genspectrum/lapis-silo
```

- Create a directory for the tutorial:

```shell
mkdir ~/lapisExample
cd ~/lapisExample
```

### Writing Configuration

Both LAPIS and SILO need to know which metadata columns are available in the dataset.
Furthermore, you need to define which column acts as primary key
and which column should be used to generate partitions in SILO.
Also, LAPIS is configured to be an open instance, meaning that the underlying data requires no visibility restrictions.

```yaml
// ~/lapisExample/config/database_config.yaml
schema:
instanceName: testInstance
metadata:
- name: primaryKey
type: string
- name: date
type: date
- name: region
type: string
generateIndex: true
- name: country
type: string
generateIndex: true
- name: division
type: string
generateIndex: true
- name: pangoLineage
type: pango_lineage
- name: age
type: int
- name: qc_value
type: float
opennessLevel: OPEN
primaryKey: primaryKey
dateToSortBy: date
partitionBy: pangoLineage
```
:::tip
See the [config reference](../../references/configuration) for a full specification.
:::
### Starting SILO Preprocessing
SILO contains an in-memory database.
Building this database from the raw input data is computation intensive,
thus this is done before starting SILO.
This is called "preprocessing".
The result is a serialized version of the database that can be loaded into SILO in a much shorter time.
Download the example dataset from the [end-to-end tests](https://github.com/GenSpectrum/LAPIS/tree/main/siloLapisTests/testData):
- pangolineage_alias.json
- reference_genomes.json
- small_metadata_set.tsv
- all fasta files for the sequences
SILO expects fasta files (possibly compressed via zstandard or xz)
in the same directory with naming scheme `nuc_<sequence_name>.fasta` for nucleotide sequences
or `gene_<sequence_name>.fasta` for amino acid sequences.
The `sequence_names`s have to match the names defined in the `reference_genomes.json`.

Put those files into the folder `~/lapisExample/data/`.

Now SILO needs to know where it can find those files.
You have to provide a "preprocessing config" for that.
Note that you need to provide the paths where the files will be stored in the Docker container.
Filenames are relative to the input directory.
Since you don't provide the input directory explicitly, SILO will fall back to the default `/data`.

```yaml
// ~/lapisExample/config/preprocessing_config.yaml
metadataFilename: 'small_metadata_set.tsv'
pangoLineageDefinitionFilename: 'pangolineage_alias.json'
referenceGenomeFilename: 'reference_genomes.json'
```

To start the preprocessing, you have to:

- start SILO in the `preprocessing` mode
- mount the data into the container to the default location
- mount the preprocessing config into the container to the default location
- mount the database config into the container to the default location
- mount the output directory into the container to the default location

Add a corresponding service to the `docker-compose.yaml`:

```yaml
// ~/lapisExample/docker-compose.yaml
version: '3.9'
services:
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output
```

After this has completed, the output directory should contain the result of the preprocessing.
That result has to be provided to SILO in the next step.

### Starting SILO

To start the SILO api, you have to:

- start SILO in the `api` mode,
- expose port 8081,
- mount the preprocessing result into the container,
- wait for the preprocessing to complete.

Add a corresponding service to the `docker-compose.yaml`:

```yaml {14-99}
// ~/lapisExample/docker-compose.yaml
version: '3.9'
services:
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output
silo-api:
image: ghcr.io/genspectrum/lapis-silo
command: --api
ports:
- '8081:8081'
volumes:
- ~/lapisExample/output:/data
depends_on:
silo-preprocessing:
condition: service_completed_successfully
```

Execute

```bash title="Start the services"
docker compose up
```

Now SILO should be available at http://localhost:8081
and http://localhost:8081/info should show that SILO contains sequences.

### Starting LAPIS

Now you can start LAPIS. You have to:

- expose port 8080 to the host.
- mount the database configuration and the reference genomes to the default locations in the Docker container.
- provide LAPIS with the SILO URL.

Add a corresponding service to the `docker-compose.yaml`:

```yaml {4-12}
// ~/lapisExample/docker-compose.yaml
version: '3.9'
services:
lapis:
image: ghcr.io/genspectrum/lapis-v2
command: --silo.url=http://silo-api:8081
ports:
- '8080:8080'
volumes:
- ~/lapisExample/config/database_config.yaml:/workspace/database_config.yaml
- ~/lapisExample/data/reference_genomes.json:/workspace/reference_genomes.json
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output
silo-api:
image: ghcr.io/genspectrum/lapis-silo
command: --api
ports:
- '8081:8081'
volumes:
- ~/lapisExample/output:/data
depends_on:
silo-preprocessing:
condition: service_completed_successfully
```

Execute

```bash title="Start the services"
docker compose up
```

again.
Now LAPIS should be available at http://localhost:8080.
LAPIS offers a [Swagger UI](http://localhost:8080/swagger-ui/index.html) that serves as a good starting point for exploring its functionalities.

:::note
`--silo.url=http://silo-api:8081` makes use of Docker Compose's internal network.
:::

## Further Reading

- Documentation of SILO in its [GitHub repository](https://github.com/GenSpectrum/LAPIS-SILO/).
- Our tests also use a [docker compose file](https://github.com/GenSpectrum/LAPIS/blob/main/lapis2/docker-compose.yml) that can also serve as an example.
The CI makes sure that it works at any time.

This file was deleted.

3 changes: 2 additions & 1 deletion lapis2-docs/tests/docs.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@ const pages = [
'Open API / Swagger',
'Reference Genome',
'Database Config',
'Configuration',
'Data versions',
'Mutation filters',
'Pango lineage query',
'Request methods: GET and POST',
'Response format',
'Variant query',
'Plot the global distribution of all sequences in R',
'Start your own instance of LAPIS and SILO',
'Start LAPIS and SILO',
'Introduction and Goals',
'Architecture and Constraints',
'System Scope and Context',
Expand Down

0 comments on commit db4932d

Please sign in to comment.