Skip to content

API Clients

Nathan Watson edited this page Dec 3, 2018 · 20 revisions

The official Python client for Pulsar LIMS is pulsarpy. There is also an extension called pulsarpy-dx which extends pulsarpy in order to add support for importing sequencing results from the DNAnexus platform into Pulsar.

Here, I discuss how to configure the Heroku app to support running Python scripts - particularly a scheduled service that runs each day to look for new sequencing results on DNAnexus and import them into Pulsar. This will be the script import_seq_results.py from the pulsarpy-dx git repository.

Adding the Python buildpack

Heroku apps uses what are called buildpacks to manage the deployment of various aspects of your app, a process that is triggered each time that you push your code in GitHub to Heroku. Normally, however, your app begins with a single build-pack, which is automatically set when you push for the first time. In the case of Pulsar LIMS, that buildpack is heroku/ruby, which can be verified via the command heroku buildpacks. When deploying a new version of the app, this buildpack will install the dependencies specified in our Gemfile.

It is often useful to have one or more additional buildpacks configured for an app, however, and this is the scenario we run into with Pulsar. Since the client is coded in Python, if we want the app to run scheduled Python jobs via Heroku Scheduler; see below, we need to enable Python support for our app. This is done by adding the Python buildpack:

heroku buildpacks:add heroku/python

Heroku Scheduler

Heroku Scheduler is a free add-on that can be used to set up cron-like jobs. Jobs are run in one-off dynos.

heroku addons:create scheduler:standard

The scheduler's browser interface, where jobs can be scheduled, can be opened with:

heroku addons:open scheduler

A job can be scheduled to run daily, hourly, or every 10 minutes. For finer grained control on job scheduling, Heroku recommends using what they call clock processes, which are more complex to set up.

The scheduler suggests using rake tasks for Rails apps, but since the client API is built in Python, rake tasks won't always be the right method to use. A Python script can be run instead. It's important to note that one-off dynos run in the app directory, thus to run a Python script one needs to provide the relative path to the script from within the app directory. That is, unless the Python dependency is installed as a package, i.e. via pip from PyPI, which is the case for pulsarpy-dx.

Specifying an explicit Python version in runtime.txt

You can be explicit about what version of Python support you get by adding a file by the name of runtime.txt in the app's root directory, and adding a version, i.e. python-3.7.1; see Specifying a Python Runtime for more details. Otherwise, you'll get the default version, which will be some version in Python 3.

Adding a requirements.txt file

Since our scheduled job will make use of the pulsarpy-dx package, which is in PyPI, we need to add it to a requirements.txt file in the app's root folder. I added a single line to that file which reads pulsarpy-dx==x.x.x, where x.x.x is the current version of the pulsarpy-dx package in PyPI. Note that it is important to always explicitly state the version of each package that you need in the requirements.txt file, because Heroku caches your dependencies to make rebuilds of your app quicker. Thus, the builder won't fetch a previously installed dependency unless a new version is explicitly stated.

Job monitoring with DMS

DMS is an add-on that can be used in tandem with Heroku Scheduler to provide job monitoring alerts. It works by "checking-in" to a unique URL, or "snitch" immediately after the jobs runs. DMS looks for a check-in event at configured intervals, and if it doesn't find one, then you'll be alerted via email. Thus, if your job doesn't run, or does run but errors out, it won't check-in and you'll be notified. This is quite handy since Heroku Scheduler doesn't provide this layer of support.

There are two main ways that a job can check-in:

  1. Write your command-line like so: do.py && curl https://nosnch.in/3d3315e51d
  2. Use a wrapper called FieldAgent, in which case the command looks something like: dms 3d3315e51d do.py

The latter is the approach that I recommend, because it will additionally capture STDOUT to a log file.