Detecting binary black hole mergers from gravitational wave strain timeseries data using neural networks, with an emphasis on
- Efficiency - making effective use of accelerated hardware like GPUs in order to minimize time-to-solution.
- Scale - validating hypotheses on large volumes of data to obtain high-confidence estimates of model performance
- Flexibility - modularizing functionality to expose various levels of abstraction and make implementing new ideas simple
- Physics first - taking advantage of the rich priors available in GW physics to build robust models and evaluate them accoring to meaningful metrics
aframe represents a framework for optimizing neural networks for detection of CBC events from time-domain strain, rather than any particular network architecture.
NOTE: right now, aframe can only be run by LIGO members
NOTE: Running aframe out-of-the-box requires access to an enterprise-grade GPU (e.g. P100, V100, T4, A[30,40,100], etc.). There are several nodes on the LIGO Data Grid which meet these requirements.
In order to access the LIGO data services required to run aframe, start by following the instructions here to set up a kerberos keytab for passwordless authentication to LIGO data services
$ ktutil
ktutil: addent -password -p albert.einstein@LIGO.ORG -k 1 -e aes256-cts-hmac-sha1-96
Password for albert.einstein@LIGO.ORG:
ktutil: wkt ligo.org.keytab
ktutil: quit
with albert.einstein
replaced with your LIGO username. Move this keytab file to ~/.kerberos
mkdir ~/.kerberos
mv ligo.org.keytab ~/.kerberos
Then, create directories for storing X509 credentials, input data, and aframe outputs.
mkdir -p ~/cilogon_cert ~/aframe/data ~/aframe/results
You'll also want to set the KRB5_KTNAME
and X509_USER_PROXY
environment variables (for data authentication)
and the LIGO_USERNAME
and LIGO_GROUP
environment variables (for submitting jobs to condor) in
your ~/.bash_profile
(or, the equivalent for whichever shell you are using) so that they are set every time you login:
echo export KRB5_KTNAME=~/.kerberos/ligo.org.keytab >> ~/.bash_profile
echo export X509_USER_PROXY=~/cilogon_cert/CERT_KEY.pem >> ~/.bash_profile
echo export LIGO_USERNAME=$USER >> ~/.bash_profile
echo export LIGO_GROUP=ligo.dev.o4.cbc.explore.test >> ~/.bash_profile
Finally, running
ligo-proxy-init --kerberos $KRB5_KTNAME
should generate your X509 credentials which will be automatically stored at the location
of the X509_USER_PROXY
environment variable set above. If you ever find issues discovering data due to
authentication problems, it is likely you will need to re run the above ligo-proxy-init
command to renew your credentials.
aframe leverages both Conda and Poetry to manage the environments of its projects. For this reason, end-to-end execution of the aframe pipeline relies on the pinto
command line utility. Please see the Conda-based installation instructions for pinto
in its documentation and continue once you have it installed. You can confirm your installation by running
pinto --version
The default aframe experiment is the sandbox
pipeline found under the projects
directory. If you're on a GPU-enabled node on the LIGO Data Grid (LDG) and have completed the steps above, start by defining a couple environment variables
# BASE_DIR is where we'll write all logs, training checkpoints,
# and inference/analysis outputs. This should be unique to
# each experiment you run
export BASE_DIR=~/aframe/results/my-first-run
# DATA_DIR is where we'll write all training/testing
# input data, which can be reused between experiment
# runs. Just be sure to delete existing data or use
# a new directory if a new experiment changes anything
# about how data is generated, because aframe by default
# will opt to use cached data if it exists.
export DATA_DIR=~/aframe/data
then from the projects/sandbox
directory, just run
BASE_DIR=$BASE_DIR DATA_DIR=$DATA_DIR pinto run
This will execute training and inference pipeline which will:
- Download background and glitch datasets and generate a dataset of raw gravitational waveforms
- Train a 1D ResNet architecture on this data
- Accelerate the trained model using TensorRT and export it for as-a-service inference
- Serve up this model with Triton Inference Server via Singularity, and use it to run inference on a dataset of timeshifted background and waveform-injected strain data
- Use these predictions to generate background and foreground event distributions
- Serve up an application for visualizing and analyzing those distributions at
localhost:5005
.
!! NOTE: You may run into issues with HDF5 I/O when running on LDG. To mitigate these, consider running with HDF5_USE_FILE_LOCKING=FALSE
, or setting this environment variable in the .env
file discussed below !!
Note that the first execution may take a bit longer than subsequent runs, since pinto
will build all the necessary environments at run time if they don't already exist. The environments for data generation and training in particular can be expensive to build because the former is built with Conda and the latter requires building GPU libraries.
Since pinto
supports using .env
files to specify environment variables, consider creating a projects/sandbox/.env
file and specifying BASE_DIR
and DATA_DIR
there:
BASE_DIR=$HOME/aframe/results/my-first-run
DATA_DIR=$HOME/aframe/data
HDF5_USE_FILE_LOCKING=FALSE
Then you can simplify the above expression to just
pinto run
Another useful way to set things up is to write projects/sandbox/.env
like
BASE_DIR=$HOME/aframe/results/$PROJECT
DATA_DIR=$HOME/aframe/data
then redefine the PROJECT
environment variable for each new experiment you run so that it's given its own results directory, e.g.
PROJECT=my-second-run pinto run
pinto
will automatically pick up the local .env
file and fill in the $PROJECT
variable with the value set at the command line.
The gravitational wave signatures generated by the merger of binary black hole (BBH) systems are well understood by general relativity, and powerful models for simulating these waveforms are easily accessible with modern GW physics software libraries. These simulated waveforms (or more accurately their frequency-domain representations) can then be used for matched-filter searches. This represents the most common existing method for detecting BBH events.
Matched filtering in this context, however, has its limitations.
- The number of parameters of the BBH system, which conditions the waveform generated by its merger, is sufficiently large that matched filter template banks must contain on the order of 10,000 templates for high-sensitivity searches. This makes real-time searches computationally intensive.
- Un-modelled non-Gaussian artifacts in the interferometer background, or glitches, can reduce the sensitivity of matched filters. Templated searches implement many veto mechanisms to mitigate the impact of these glitches, but they remain a persistent source of false alarms.
Deep learning algorithms represent an attractive alternative to these methods because they can "bake-in" the cost of evaluating large template banks up-front during training, trading this for efficient inference evaluation at run-time and drastically reducing the compute resources required for online searches. Moreover, the same simulation methods that enable matched filtering also allow for generation of arbitrary volumes of training data that can be used to fit robust models of the signal space.
While glitches can also represent problematic inputs for neural networks, they offer the potential to learn to exclude them by sheer "brute-force": providing networks with lots of examples of glitches during training in order to learn to distinguish them from real events.
aframe attempts to apply deep learning methods to this problem by combining these observations and leveraging both the powerful existing models of BBH signals and the enormous amount of existing data collected by LIGO to build robust datasets of both background and signal-containing samples. More specifially, we:
- Use the
ml4gw
library to project a dataset of pre-computed gravitational waveforms to interferometer responses on-the-fly on the GPU. This allows us to efficiently augment our dataset of signals by "observing" the same event at any point on the celestial sphere and at arbitrary distances (the latter achieved by remapping its SNR relative to the background PSD of the training set. Note that this will by extension change the observed mass of the black holes in the detector frame) - Use the
pyomicron
utility to search through the training set (and periods before it) for glitches which we can oversample during training and randomly use to replace each interferometer channel independently.
TODO: fill this out or just refer to the documentation of infer
.
By default, pinto
uses Poetry to install all local libraries editably. This means that changes you make to your local code will automatically be reflected in the libraries used at run time. For information on how to help your new code best fit the structure of this repository, see the contribution guidelines.
The code here is structured like a monorepo, with applications siloed off into isolated environments to keep dependencies lightweight, but built on top of a shared set of libraries to keep the code modular and consistent.
Note that this means there is no "aframe environment:" you won't find an environment.yaml
or poetry config at this root level. Each project is associated with its own environment which is defined locally with respect to the project itself. For instructions on installing each project, see its associated documentation (though in general, running pinto build
from the project's directory will be sufficient).
If you run the pipeline using the instructions above, the environment associated with each step in the pipeline (i.e. each child project's environment) will be built before running the step if it does not already exist. This is true of running pinto run
for each step individually as well.
Note as well that each project is associated with one or multiple scripts or applications, which are defined in the [tool.poetry.scripts]
table of the project's pyproject.toml
. These scripts will be able to be executed as command-line executables inside the project's environment, e.g.
# run in projects/sandbox/train
pinto run train -h
Some projects, such as projects/sandbox/datagen
will be associated with several scripts that perform different functionality using the same environment, e.g.
# run these in projects/sandbox/datagen
pinto run generate-background -h
pinto run generate-glitches -h
pinto run generate-timeslides -h
Given the more advanced structure of the repo outlined above, there are some best practices you can adopt while developing which can make your life easier and your code simpler to get integrated:
- Most scripts within projects parse their commands using the
typeo
utility, which will automatically create a command-line parser using the arguments and associated type annotations of a function and execute this parser when the function is called with no arguments. This means you can execute scripts by passing the arguments of the associated function explicitly:
pinto run train --learning-rate 1e-3 ... resnet --layers 2 2 2 2 ...
or by pointing to a pyproject.toml
(or directory containing a pyproject.toml
) which defines all the relevant arguments
# this says "run the train command, but parse the arguments
# for it from [tool.poetry.scripts.train] table of the
# pyproject.toml contained in the directory directly above this
# (signified by the ..) using the resnet subcommand (a sub-table
# of the [tool.poetry.scripts.train] table)"
pinto run train --typeo ..:train:resnet
For information on how to read a typeo
config, see its README.
- Make aggressive use of branching (
git chekout -b debug-some-minor-issue
), even from development branches (i.e. notmain
). This will ensure that good ideas don't get lost in the process of debugging, and that your main development branch remains as stable as possible. Once you've solved the issue you branched out to fix, you cangit checkout
back to your main development branch,git merge debug-some-minor-issue
the fix in, then delete the temporary branchgit branch -d debug-some-minor-issue
- When you want to pull the latest changes in from
upstream main
to fork off a new development branch, consider usinggit pull --rebase upstream main
. This will ensure thatgit pull
doesn't create an extraneous merge commit that starts to diverge the histories of your localmain
branch and the upstreammain
branch, which can make future pull requests harder to scrutinize. pinto
is, at its core, a pretty thin wrapper aroundconda
andpoetry
. If you're experiencing any issues with your environments, try running the relevantconda
orpoetry
commands explictly to help debug. If things seem truly hopeless, delete the environment entirely (e.g.rm -rf ~/minicondae3/envs/<env-name>
) and rebuild it using the right combination of conda and poetry commands (conda env create -f ...
followed by aconda activate
andpoetry install
for conda-managed projects and just plain oldpoetry install
for poetry-managed projects).