POS — Open Targets Pipeline Output Stage

Create Platform backend (OpenSearch and Clickhouse) and release data.

Summary

This application uses the Otter library to define the steps of the pipeline that create and push all the output artifacts for a run of the Open Targets pipeline.

Check out the config.yaml file to see the steps and the tasks that make them up.

Installation and running

Dependencies

uv is the package manager POS. It is compatible with PIP, so you can also fall back to it if you feel more comfortable.
just is the POS interface which is a similar but more suitable alternative to GNU make for this purpose.
terraform is the IaC tool by which the necessary infrastructure is assembled and destroyed.

Recipes

$ just
Platform Output Support
Set the profile with `just profile=foo <RECIPE>` to use `profiles/foo.tfvars`. Defaults to `profiles/default.tfvars` if no profile is set.
    help
    snapshots    # Create Google cloud disk snapshots (Clickhouse and OpenSearch).
    clean        # Clean the credentials and the infrastructure used for creating the Google cloud disk snapshots
    clean_all
    bigquerydev  # Big Query Dev
    bigqueryprod # Big Query Prod
    gcssync      # Sync data to production Google Cloud Storage
    ftpsync      # Sync data to FTP

Private recipes are prefixed with '_' in the justfile.

Configuring the profile for any of the recipes

All the configuration you need should be possible by modifying a profile such as the default one. This file is symlinked to the terraform.tfvars when the recipes are executed. If you want to use a different profile, copy/paste the default to foo.tfvars and whenever you run just do it like so (the profile param must come before the recipe):

just profile=foo <RECIPE>` to use `profiles/foo.tfvars`

Create the data backend for the platform

just snapshots

starts a Google compute engine with external drives (one for clickhouse, one for opensearch)
runs the otter steps for croissant, clickhouse and opensearch - see startup script
- _optional: create tarballs (see Configuration)

Release data to BigQuery

# dev
just bigquerydev

# prod
just bigqueryprod

creates a local otter config based on the terraform.tfvars profile.
runs the otter step for releasing to Google BigQuery.

Release data to FTP

just ftpsync

uses the terraform.tfvars profile as configuration.
runs a shell script that runs a gcloud container on the EBI HPC.
from the container it syncs the data from GCS to the EBI FTP.

Release data to GCS

just gcssync

uses the terraform.tfvars profile as configuration.
runs a gcloud command to sync one GCS with another.

Configuration

You should only ever need to configure the terraform profile. This is used as the point of configuration even where terraform is not actually used. See here for details.

Terraform will apply this configuration, or in the cases where terraform is not used, an HCL library will read and apply the configuration as needed.

A folder for all the configuration is here, which has the following:

Main config for otter: config.yaml
Config for datasets, data sources/table names/settings etc. for Clickhouse, OpenSearch, BigQuery: datasets.yaml
Clickhouse configs/schema/sql: clickhouse
OpenSearch Dockerfile/index settings: opensearch

It's configured by default to load all the necessary datasets, but it can be modified. Make sure that the dataset names in the config.yaml have a corresponding entry in the datasets.yaml and so on.

Copyright

This software was developed as part of the Open Targets project. For more information please see: http://www.opentargets.org

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 788 Commits
bin		bin
config		config
deployment		deployment
profiles		profiles
src/pos		src/pos
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

POS — Open Targets Pipeline Output Stage

Summary

Installation and running

Dependencies

Recipes

Configuring the profile for any of the recipes

Create the data backend for the platform

Release data to BigQuery

Release data to FTP

Release data to GCS

Configuration

Copyright

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 10

Languages

opentargets/platform-output-support

Folders and files

Latest commit

History

Repository files navigation

POS — Open Targets Pipeline Output Stage

Summary

Installation and running

Dependencies

Recipes

Configuring the profile for any of the recipes

Create the data backend for the platform

Release data to BigQuery

Release data to FTP

Release data to GCS

Configuration

Copyright

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 10

Languages

Packages