Skip to content

Genomic Variants search API, working from the VCF format.

License

Notifications You must be signed in to change notification settings

bento-platform/gohan

Repository files navigation

Gohan - A Genomic Variants API

bowl-of-rice

Prerequisites


TL;DR

Typical use-case walkthrough

# environment
cp ./etc/example.env .env # modify to your needs

# kickstart dockerized gohan environment
make init

# (optional): if you plan on modifying the api codebase before deploying
make init-dev

# gateway & certificates
mkdir -p gateway/certs/dev

openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt


# build services
make build-gateway
make build-api

# run services
make run-gateway
make run-elasticsearch
make run-drs
make run-api


# initiate genes catlogue:
curl -k https://gohan.local/genes/ingestion/run

# monitor progress:
curl -k https://gohan.local/genes/ingestion/requests
curl -k https://gohan.local/genes/ingestion/stats

# view catalogue
curl -k https://gohan.local/genes/overview

# move vcf.gz files to `$GOHAN_API_VCF_PATH`

# ingest vcf.gz
curl -k https://gohan.local/variants/ingestion/run\?fileNames=<filename>\&assemblyId=GRCh37\&filterOutReferences=true\&dataset=00000000-0000-0000-0000-000000000000

# monitor progress:
curl -k https://gohan.local/variants/ingestion/requests
curl -k https://gohan.local/variants/ingestion/stats

# view variants
curl -k https://gohan.local/variants/overview

Getting started

Environment :

First, from the project root, create a local file for environment variables with default settings by running

cp ./etc/example.env .env

and make any necessary changes, such as the Elasticsearch GOHAN_ES_USERNAME and GOHAN_ES_PASSWORD when in production.

note: a known current bug is that GOHAN_ES_USERNAME must remain its default..


Initialization

Run

make init

Elasticsearch & Kibana :

Run

make run-elasticsearch

and (optionally)

make run-kibana

DRS :

Run

make run-drs

Data Access Authorization with OPA (more on this to come..) :

Run

make build-authz
make run-authz


Development

architecture

Gateway

To create and use development certs from the project root, run

mkdir -p gateway/certs/dev

openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/gohan_fullchain1.crt
openssl req -newkey rsa:2048 -nodes -keyout gateway/certs/dev/es_gohan_privkey1.key -x509 -days 365 -out gateway/certs/dev/es_gohan_fullchain1.crt

Note: Ensure your CN matches the hostname (gohan.local by default)

These will be incorporated into the Gateway service (using NGINX by default, see gateway/Dockerfile and gateway/nginx.conf for details). Be sure to update your local /etc/hosts (on Linux) or C:/System32/drivers/etc/hosts (on Windows) file with the name of your choice.

Next, run

make build-gateway
make run-gateway

API

Containerized:

 To simply run a working instance of the api "out of the box", build the docker image and spawn the container with an fresh binary build by running

make build-api
make run-api

 and the docker-compose.yaml file will handle the configuration.


Local Development:

 This can be done multiple ways.

  1. Terminal : From the project root, run
# load variables from local file
set -a
. ./.env
set +a

cd src/api

go run .
  1. IDE (preferably VSCode)
- follow the recommended instructions listed at https://code.visualstudio.com/docs/languages/go

- configure the `.vscode/launch.json` to inject the above mentioned variables as recommended by https://stackoverflow.com/questions/29971572/how-do-i-add-environment-variables-to-launch-json-in-vscode

- click 'Run & Debug' > "Play"

Local Release

 To build / test from source;

make build-api-local-binaries

 The binary can then be found at bin/api_${GOOS}_${GOARCH} and executed locally with

# load variables from local file
set -a
. ./.env
set +a

# navigate to binary directory
cd bin/

# execute binary
./api_${GOOS}_${GOARCH}

Endpoints:

/variants

Request

  GET /variants/overview
   params: none


Response

{
    "chromosomes": {
        "<CHROMOSOME>": `number`,
        // ...
    },
    "sampleIDs": {
        "<SAMPLEID>": `number`,
        // ...
    },
    "variantIDs": {
        "<VARIANTID>": `number`,
        // ...
    }
}

Example:

 {
     "chromosomes": {
         "21": 90548
     },
     "sampleIDs": {
         "hg00096": 33664,
         "hg00099": 31227,
         "hg00111": 25657
     },
     "variantIDs": {
         ".": 90548
     }
 }


Requests

  GET /variants/get/by/variantId
   params:

  • chromosome : string
  • lowerBound : number
  • upperBound : number
  • reference : string an allele ( "A" | "C" | "G" | "T" | "N" or some combination thereof )
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )
  • getSampleIdsOnly : bool (optional) - default: false

  GET /variants/count/by/variantId
   params:

  • chromosome : string
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (a comma-deliminated list of variant ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/get/by/sampleId
   params:

  • chromosome : string
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • size : number (maximum number of results per id)
  • sortByPosition : string (<empty> | asc | desc)
  • includeInfoInResultSet : boolean (true | false)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

  GET /variants/count/by/sampleId
   params:

  • chromosome : string
  • lowerBound : number
  • upperBound : number
  • reference : string an allele
  • alternative : string an allele
  • alleles : string ordered comma-deliminated list of alleles (max: 2)
  • ids : string (comma-deliminated list of sample ID alphanumeric codes)
  • genotype : string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )

Generalized Response Body Structure

{
    "status":  `number` (200 - 500),
    "message": `string` ("Success" | "Error"),
    "results": [
        {
            "query":  `string`,       // reflective of the type of id queried for, i.e 'variantId:abc123', or 'sampleId:HG0001
            "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),    // reflective of the assembly id queried for
            "count":  `number`,   // this field is only present when performing a COUNT query
            "start":  `number`,   // reflective of the provided lowerBound parameter, 0 if none
            "end":  `number`,     // reflective of the provided upperBound parameter, 0 if none
            "chromosome":  `string`,       // reflective of the chromosome queried for - no `chr` prefix
            "calls": [            // this field is only present when performing a GET query
                {
                   "id": `string`, // variantId
                   "chrom":  `string`,
                   "pos": `number`,
                   "ref": `[]string`,  // list of alleles
                   "alt": `[]string`,  // list of alleles
                   "alleles": `[]string`,  // ordereed list of alleles
                   "info": [
                       {
                           "id": `string`,
                           "value": `string`,
                       },
                       ...
                   ],
                   "format":`string`,
                   "qual": `number`,
                   "filter": `string`,
                   "sampleId": `string`,
                   "genotype_type": `string ( "HETEROZYGOUS" | "HOMOZYGOUS_REFERENCE" | "HOMOZYGOUS_ALTERNATE" )`,
                   "assemblyId": `string` ("GRCh38" | "GRCh37" | "NCBI36" | "Other"),
                },
                ...
            ]
        },
    ]
}

Examples :





Request

  GET /variants/ingestion/run
   params:

  • filename : string (required)

Response

 {
     "state":  `number` // ("Queuing" | "Running" | "Done" | "Error"),
     "id": `string`,
     "filename": `string`,
     "message": `string`,
 }


Request

  GET /variants/ingestion/requests
   params: none


Response

[
  {
    "state":  `number` // ("Queuing" | "Running" | "Done" | "Error"),
    "id": `string`,
    "filename": `string`,
    "message": `string`,
    "createdAt": `timestamp string`,
    "updatedAt": `timestamp string`
  },
  ...
]


Deployments :

All in all, run

make run-elasticsearch
make run-drs
make build-gateway && make run-gateway
make build-api && make run-api

# and optionally
make run-kibana

For other handy tools, see the Makefile. Among those already mentionned here, you'll find other build, run, stop and clean-up commands.


Tests :

Once elasticsearch, drs, the api, and the gateway are up, run

make test-api-dev

Dev Container debug

Interactive debug in VSCode is only possible When using the development image of gohan-api.

Using the "Attach to PID(Bento)" debug config, select the PID associated with the following path:

/gohan-api/src/api/tmp/main