Skip to content

API Clients

Nathan Watson edited this page Aug 6, 2019 · 20 revisions

The official Python client for Pulsar LIMS is pulsarpy. There is also an extension called pulsarpy-dx which extends pulsarpy in order to add support for importing sequencing results from the DNAnexus platform into Pulsar.

There is another package called pulsarpy-to-encodedcc that is used for submitting datasets from Pulsar LIMS to the ENCODE Portal.

Here, I discuss how to configure the Heroku app to support running Python scripts - particularly a scheduled service that runs each day to look for new sequencing results on DNAnexus and import them into Pulsar. This will be the script import_seq_results.py from the pulsarpy-dx git repository.

Adding the Python buildpack

Heroku apps uses what are called buildpacks to manage the deployment of various aspects of your app, a process that is triggered each time that you push your code in GitHub to Heroku. Normally, however, your app begins with a single build-pack, which is automatically set when you push for the first time. In the case of Pulsar LIMS, that buildpack is heroku/ruby, which can be verified via the command heroku buildpacks. When deploying a new version of the app, this buildpack will install the dependencies specified in our Gemfile.

It is often useful to have one or more additional buildpacks configured for an app, however, and this is the scenario we run into with Pulsar. Since the client is coded in Python, if we want the app to run scheduled Python jobs via Heroku Scheduler; see below, we need to enable Python support for our app. This is done by adding the Python buildpack:

heroku buildpacks:add heroku/python

Heroku Scheduler

Heroku Scheduler is a free add-on that can be used to set up cron-like jobs. Jobs are run in one-off dynos.

heroku addons:create scheduler:standard

The scheduler's browser interface, where jobs can be scheduled, can be opened with:

heroku addons:open scheduler

A job can be scheduled to run daily, hourly, or every 10 minutes. For finer grained control on job scheduling, Heroku recommends using what they call clock processes, which are more complex to set up.

The scheduler suggests using rake tasks for Rails apps, but since the client API is built in Python, rake tasks won't always be the right method to use. A Python script can be run instead. It's important to note that one-off dynos run in the app directory, thus to run a Python script one needs to provide the relative path to the script from within the app directory. That is, unless the Python dependency is installed as a package, i.e. via pip from PyPI, which is the case for pulsarpy-dx.

Specifying an explicit Python version in runtime.txt

You can be explicit about what version of Python support you get by adding a file by the name of runtime.txt in the app's root directory, and adding a version, i.e. python-3.7.1; see Specifying a Python Runtime for more details. Otherwise, you'll get the default version, which will be some version in Python 3.

Adding a requirements.txt file

Since the scheduled job will make use of the pulsarpy-dx package, which is in PyPI, I need to add it to a requirements.txt file in the app's root folder. Note that Heroku recommends explicitly stating the versions of such dependencies in the requirements file, i.e. pulsarpy-dx==x.x.x, where x.x.x is the current version of the pulsarpy-dx package in PyPI, because Heroku caches the app's dependencies to make rebuilds faster. However, this does not work that well and I don't recommend relying on this strategy. If your package depends on another PyPI package and you want this latter dependency to always be at the latest version, the cached version will always prevent an updated version from being installed the next time a new version of pulsarpy-dx is installed. The quick, easy, and predictably way is to use a Heroku plugin to clear your build cache; details below.

Clearing the app's build cache

Install the heroku-builds plugin:

heroku plugins:install heroku-builds

Then purge the cache:

heroku builds:cache:purge -a example-app

Finally, make a new commit to your app to trigger a rebuild. You may find using an empty commit to be handy for such occasions:

git commit --allow-empty -m "Trigger build"
git push
git push heroku master

Job monitoring with DMS

DMS is an add-on that can be used in tandem with Heroku Scheduler to provide job monitoring alerts. It works by "checking-in" to a unique URL, or "snitch" immediately after the jobs runs. DMS looks for a check-in event at configured intervals, and if it doesn't find one, then you'll be alerted via email. Thus, if your job doesn't run, or does run but errors out, it won't check-in and you'll be notified. This is quite handy since Heroku Scheduler doesn't provide this layer of support.

There are two main ways that a job can check-in:

  1. Write your command-line like so:

    do.py && curl https://nosnch.in/3d3315e51d

  2. Use a wrapper called FieldAgent, in which case the command looks something like:

    dms 3d3315e51d do.py

The latter is the approach that I recommend, because it will additionally capture STDOUT to a log file.