Skip to content

wbenoit26/aframe-legacy

 
 

Repository files navigation

aframe

Detecting binary black hole mergers from gravitational wave strain timeseries data using neural networks, with an emphasis on

  • Efficiency - making effective use of accelerated hardware like GPUs in order to minimize time-to-solution.
  • Scale - validating hypotheses on large volumes of data to obtain high-confidence estimates of model performance
  • Flexibility - modularizing functionality to expose various levels of abstraction and make implementing new ideas simple
  • Physics first - taking advantage of the rich priors available in GW physics to build robust models and evaluate them accoring to meaningful metrics

aframe represents a framework for optimizing neural networks for detection of CBC events from time-domain strain, rather than any particular network architecture.

Quickstart

NOTE: right now, aframe can only be run by LIGO members

NOTE: Running aframe out-of-the-box requires access to an enterprise-grade GPU (e.g. P100, V100, T4, A[30,40,100], etc.). There are several nodes on the LIGO Data Grid which meet these requirements.

1. Setting up your environment for data access

In order to access the LIGO data services required to run aframe, start by following the instructions here to set up a kerberos keytab for passwordless authentication to LIGO data services

$ ktutil
ktutil:  addent -password -p albert.einstein@LIGO.ORG -k 1 -e aes256-cts-hmac-sha1-96
Password for albert.einstein@LIGO.ORG:
ktutil:  wkt ligo.org.keytab
ktutil:  quit

with albert.einstein replaced with your LIGO username. Move this keytab file to ~/.kerberos

mkdir ~/.kerberos
mv ligo.org.keytab ~/.kerberos

Then, create directories for storing X509 credentials, input data, and aframe outputs.

mkdir -p ~/cilogon_cert ~/aframe/data ~/aframe/results

You'll also want to set the KRB5_KTNAME and X509_USER_PROXY environment variables (for data authentication) and the LIGO_USERNAME and LIGO_GROUP environment variables (for submitting jobs to condor) in your ~/.bash_profile (or, the equivalent for whichever shell you are using) so that they are set every time you login:

echo export KRB5_KTNAME=~/.kerberos/ligo.org.keytab >> ~/.bash_profile
echo export X509_USER_PROXY=~/cilogon_cert/CERT_KEY.pem >> ~/.bash_profile
echo export LIGO_USERNAME=$USER >> ~/.bash_profile
echo export LIGO_GROUP=ligo.dev.o4.cbc.explore.test >> ~/.bash_profile

Finally, running

ligo-proxy-init --kerberos $KRB5_KTNAME

should generate your X509 credentials which will be automatically stored at the location of the X509_USER_PROXY environment variable set above. If you ever find issues discovering data due to authentication problems, it is likely you will need to re run the above ligo-proxy-init command to renew your credentials.

2. Install pinto

aframe leverages both Conda and Poetry to manage the environments of its projects. For this reason, end-to-end execution of the aframe pipeline relies on the pinto command line utility. Please see the Conda-based installation instructions for pinto in its documentation and continue once you have it installed. You can confirm your installation by running

pinto --version

3. Run the sandbox pipeline

The default aframe experiment is the sandbox pipeline found under the projects directory. If you're on a GPU-enabled node on the LIGO Data Grid (LDG) and have completed the steps above, start by defining a couple environment variables

# BASE_DIR is where we'll write all logs, training checkpoints,
# and inference/analysis outputs. This should be unique to
# each experiment you run
export BASE_DIR=~/aframe/results/my-first-run

# DATA_DIR is where we'll write all training/testing
# input data, which can be reused between experiment
# runs. Just be sure to delete existing data or use
# a new directory if a new experiment changes anything
# about how data is generated, because aframe by default
# will opt to use cached data if it exists.
export DATA_DIR=~/aframe/data

then from the projects/sandbox directory, just run

BASE_DIR=$BASE_DIR DATA_DIR=$DATA_DIR pinto run

This will execute training and inference pipeline which will:

  • Download background and glitch datasets and generate a dataset of raw gravitational waveforms
  • Train a 1D ResNet architecture on this data
  • Accelerate the trained model using TensorRT and export it for as-a-service inference
  • Serve up this model with Triton Inference Server via Singularity, and use it to run inference on a dataset of timeshifted background and waveform-injected strain data
  • Use these predictions to generate background and foreground event distributions
  • Serve up an application for visualizing and analyzing those distributions at localhost:5005.

!! NOTE: You may run into issues with HDF5 I/O when running on LDG. To mitigate these, consider running with HDF5_USE_FILE_LOCKING=FALSE, or setting this environment variable in the .env file discussed below !!

Note that the first execution may take a bit longer than subsequent runs, since pinto will build all the necessary environments at run time if they don't already exist. The environments for data generation and training in particular can be expensive to build because the former is built with Conda and the latter requires building GPU libraries.

3b. Simplify with a .env

Since pinto supports using .env files to specify environment variables, consider creating a projects/sandbox/.env file and specifying BASE_DIR and DATA_DIR there:

BASE_DIR=$HOME/aframe/results/my-first-run
DATA_DIR=$HOME/aframe/data
HDF5_USE_FILE_LOCKING=FALSE

Then you can simplify the above expression to just

pinto run

Another useful way to set things up is to write projects/sandbox/.env like

BASE_DIR=$HOME/aframe/results/$PROJECT
DATA_DIR=$HOME/aframe/data

then redefine the PROJECT environment variable for each new experiment you run so that it's given its own results directory, e.g.

PROJECT=my-second-run pinto run

pinto will automatically pick up the local .env file and fill in the $PROJECT variable with the value set at the command line.

Experiment overview

Binary black hole detection with deep learning

The gravitational wave signatures generated by the merger of binary black hole (BBH) systems are well understood by general relativity, and powerful models for simulating these waveforms are easily accessible with modern GW physics software libraries. These simulated waveforms (or more accurately their frequency-domain representations) can then be used for matched-filter searches. This represents the most common existing method for detecting BBH events.

Matched filtering in this context, however, has its limitations.

  • The number of parameters of the BBH system, which conditions the waveform generated by its merger, is sufficiently large that matched filter template banks must contain on the order of 10,000 templates for high-sensitivity searches. This makes real-time searches computationally intensive.
  • Un-modelled non-Gaussian artifacts in the interferometer background, or glitches, can reduce the sensitivity of matched filters. Templated searches implement many veto mechanisms to mitigate the impact of these glitches, but they remain a persistent source of false alarms.

Deep learning algorithms represent an attractive alternative to these methods because they can "bake-in" the cost of evaluating large template banks up-front during training, trading this for efficient inference evaluation at run-time and drastically reducing the compute resources required for online searches. Moreover, the same simulation methods that enable matched filtering also allow for generation of arbitrary volumes of training data that can be used to fit robust models of the signal space.

While glitches can also represent problematic inputs for neural networks, they offer the potential to learn to exclude them by sheer "brute-force": providing networks with lots of examples of glitches during training in order to learn to distinguish them from real events.

aframe attempts to apply deep learning methods to this problem by combining these observations and leveraging both the powerful existing models of BBH signals and the enormous amount of existing data collected by LIGO to build robust datasets of both background and signal-containing samples. More specifially, we:

  • Use the ml4gw library to project a dataset of pre-computed gravitational waveforms to interferometer responses on-the-fly on the GPU. This allows us to efficiently augment our dataset of signals by "observing" the same event at any point on the celestial sphere and at arbitrary distances (the latter achieved by remapping its SNR relative to the background PSD of the training set. Note that this will by extension change the observed mass of the black holes in the detector frame)
  • Use the pyomicron utility to search through the training set (and periods before it) for glitches which we can oversample during training and randomly use to replace each interferometer channel independently.

Evaluating the performance of a trained network

TODO: fill this out or just refer to the documentation of infer.

Development instructions.

By default, pinto uses Poetry to install all local libraries editably. This means that changes you make to your local code will automatically be reflected in the libraries used at run time. For information on how to help your new code best fit the structure of this repository, see the contribution guidelines.

Code Structure

The code here is structured like a monorepo, with applications siloed off into isolated environments to keep dependencies lightweight, but built on top of a shared set of libraries to keep the code modular and consistent.

Note that this means there is no "aframe environment:" you won't find an environment.yaml or poetry config at this root level. Each project is associated with its own environment which is defined locally with respect to the project itself. For instructions on installing each project, see its associated documentation (though in general, running pinto build from the project's directory will be sufficient).

If you run the pipeline using the instructions above, the environment associated with each step in the pipeline (i.e. each child project's environment) will be built before running the step if it does not already exist. This is true of running pinto run for each step individually as well.

Note as well that each project is associated with one or multiple scripts or applications, which are defined in the [tool.poetry.scripts] table of the project's pyproject.toml. These scripts will be able to be executed as command-line executables inside the project's environment, e.g.

# run in projects/sandbox/train
pinto run train -h

Some projects, such as projects/sandbox/datagen will be associated with several scripts that perform different functionality using the same environment, e.g.

# run these in projects/sandbox/datagen
pinto run generate-background -h
pinto run generate-glitches -h
pinto run generate-timeslides -h

Tips and tricks

Given the more advanced structure of the repo outlined above, there are some best practices you can adopt while developing which can make your life easier and your code simpler to get integrated:

  • Most scripts within projects parse their commands using the typeo utility, which will automatically create a command-line parser using the arguments and associated type annotations of a function and execute this parser when the function is called with no arguments. This means you can execute scripts by passing the arguments of the associated function explicitly:
pinto run train --learning-rate 1e-3 ... resnet --layers 2 2 2 2  ...

or by pointing to a pyproject.toml (or directory containing a pyproject.toml) which defines all the relevant arguments

# this says "run the train command, but parse the arguments
# for it from [tool.poetry.scripts.train] table of the
# pyproject.toml contained in the directory directly above this
# (signified by the ..) using the resnet subcommand (a sub-table
# of the [tool.poetry.scripts.train] table)"
pinto run train --typeo ..:train:resnet

For information on how to read a typeo config, see its README.

  • Make aggressive use of branching (git chekout -b debug-some-minor-issue), even from development branches (i.e. not main). This will ensure that good ideas don't get lost in the process of debugging, and that your main development branch remains as stable as possible. Once you've solved the issue you branched out to fix, you can git checkout back to your main development branch, git merge debug-some-minor-issue the fix in, then delete the temporary branch git branch -d debug-some-minor-issue
  • When you want to pull the latest changes in from upstream main to fork off a new development branch, consider using git pull --rebase upstream main. This will ensure that git pull doesn't create an extraneous merge commit that starts to diverge the histories of your local main branch and the upstream main branch, which can make future pull requests harder to scrutinize.
  • pinto is, at its core, a pretty thin wrapper around conda and poetry. If you're experiencing any issues with your environments, try running the relevant conda or poetry commands explictly to help debug. If things seem truly hopeless, delete the environment entirely (e.g. rm -rf ~/minicondae3/envs/<env-name>) and rebuild it using the right combination of conda and poetry commands (conda env create -f ... followed by a conda activate and poetry install for conda-managed projects and just plain old poetry install for poetry-managed projects).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 50.7%
  • Python 49.3%