The gPhoton project aims to produce derived data products that are both quantitatively more correct and qualitatively more capable than original GALEX mission products, with a special emphasis on enabling high-precision short-time-domain science.
gPhoton 2 is a substantial rewrite of the original gPhoton library intended to improve deployment flexibility and help support complex, full-catalog surveys on short timelines. It is 1-3 orders of magnitude faster than original gPhoton and features several significant calibration improvements. It is currently in early beta.
gPhoton 2 is not currently distributed via any package manager. The easiest
way to install it is simply to clone this repository. We recommend installing
its requirements by feeding the provided environment.yml
file to conda
or mamba
. If you have git
and mamba
installed, this might be as easy as (modulo environment differences):
git clone https://github.com/MillionConcepts/gPhoton2.git
cd gphoton2
mamba env create -f environment.yml
gPhoton 2 also relies on several large aspect metadata files that are not
distributed along with this repository.
They are currently available here.
Place these files in the gPhoton/aspect/
directory after downloading
them.
While many of the components of gPhoton 2 can be used piecemeal, the
gPhoton.pipeline.execute_pipeline
function is the primary interface to its
integrated processing pipelines. It can be called either within a script
or using simple CLI hooks. gPhoton/run_pipeline.py
is a very simple
example of calling it from a script; gPhoton/pipeline_cli.py
is a simple
CLI hook. To use the CLI hook, pass arguments on the command line, using --
for keyword arguments. For instance, running:
python pipeline_cli.py 23456 NUV --depth=30
will call execute_pipeline()
with positional arguments 23456
(eclipse number) and NUV (band), and the keyword argument depth=30. This will
fetch raw GALEX telemetry (if it's not in your path already) over HTTP from
MAST
for eclipse 23456, reduce it to a 'photonlist' file, then use that photonlist
to make a gzip-compressed FITS image, 30-second-bin lightcurves (based on
automatically-detected source positions), and a table of
exposure times for each bin of that lightcurve. It will write all of these
files to the gPhoton/test_data
subdirectory.
Please refer to docstrings in gPhoton/pipeline.py
for a complete description
of all accepted arguments to this function.
More comprehensive documentation for gPhoton 2 is forthcoming.
gPhoton 2 development to date has been focused on performance and stability in our primary cloud deployment environments -- AWS EC2 instances running Ubuntu Linux. As such, Linux is the preferred OS for gPhoton 2, although it is also compatible with MacOS. Multithreading is poorly performant on MacOS. Please run gPhoton 2 in single-threaded mode if you are using it on MacOS. gPhoton 2 should also run on Windows Subsystem for Linux, although it has not been extensively tested.
gPhoton 2 does not have hard-and-fast system requirements. It is highly configurable and not all of the portions of its pipelines are equally expensive. Also, the GALEX raw data archive is extremely diverse, and some visits / eclipses are much cheaper to process than others -- processing a brief visit on a sparse field could easily take less than 20% as many resources as processing a long visit on a dense field.
However, here are some general notes:
- It should be possible to process any MIS-like eclipse to 30-second depth (bin size) in ~6 GB of free memory.
- gPhoton 2 processes each "leg" (nominal boresight position) of an eclipse separately, so eclipses with many "legs", like much of the All-Sky Imaging Survey (AIS) will often be very memory-cheap to process relative to overall visit time.
- We do not recommend executing the full gPhoton pipeline in an environment with less than ~8 GB of free memory, although some features of the pipeline may function well with considerably less, particularly on smaller raw data files.
- The most memory-expensive portion of the pipeline tends to be writing video files; if you are not writing video files to disk, you can generally get away with less memory.
- Decreasing the depth / increasing temporal resolution will tend to sharply increase the required memory for both producing and writing movies. If you only want to produce full-depth images with gPhoton 2 and neither want to write movies or generate photometry, time and memory requirements will tend to go sharply down.
- If you pass the
--burst=True
parameter, gPhoton 2 will write each frame of a movie as a separate file, which reduces memory pressure significantly but is somewhat slower. - There is no particular minimum CPU requirement; gPhoton 2 will run on quite slow processors, but can use all the processing power you can throw at it. Integrating video tends to be the most CPU-bound portion of the pipeline (especially at very high temporal resolution), followed by counting photometry on video frames, followed by producing photonlists.
- Running gPhoton 2 in multithreaded mode will tend to increase average (although not necessarily peak) working memory. Even in multithreaded mode, some portions of gPhoton 2 are thread-bound, and single-thread performance remains important.
- More than 8 parallel processes do not tend to be useful, except for high-resolution photometry on very dense fields.
- An x86_64 processor architecture is recommended but not required.
- If you are downloading raw6 files, a fast network connection will of course be useful.
- Some output files may be large enough that portions of the pipeline become I/O bound, and we do not recommend using a HDD or networked storage as working space for gPhoton 2.
Because of the diversity of the GALEX archive, the sizes of gPhoton's input and output files vary greatly, but typical sizes for files associated with NUV data from a MIS-like eclipse are:
- photonlist (as Snappy-compressed .parquet): 1 GB
- raw6 (as gzip-compressed FITS): 200 MB
- 30-second-depth movie (as gzip-compressed FITS): 300 MB
- full-depth image (as gzip-compressed FITS): 60 MB
- lightcurves (as flat CSV): 4.5 MB
- exposure time table (as flat CSV): 3 KB
FUV data will tend to be significantly smaller due to the relatively lower density of FUV events in almost all GALEX visits.
Making different choices about temporal resolution and compression can alter these figures significantly. These files tend to be much larger uncompressed, particularly high-temporal-resolution videos -- compression ratio increases almost linearly with temporal resolution due to increasing array sparsity.
gPhoton 2 requires Python 3.9 or 3.10. 3.10 is recommended. (3.11 support
pends upstream changes in numba
and is anticipated sometime in Q1 2023.)
It also depends on the following Python libraries:
- astropy
- dustgoggles
- fast-histogram
- fitsio
- more-itertools
- numba
- numpy
- pandas
- photutils
- pip
- pyarrow
- requests
- rich
- scipy
- sh
The following dependencies are optional:
- astroquery (sky object search functions)
- fire (pipeline CLI)
- icc_rt (performance improvements to numba)
- igzip (some gzip-handling features)
gPhoton 2 is an active component of multiple ongoing research projects. In particular, continued development is a core component of the GALEX Legacy Catalog (GLCAT) project. It is almost certain that it will receive meaningful new features and improvements, so interface stability is not guaranteed.
gPhoton 2 currently lacks comprehensive software tests and has been primarily (although not exclusively) deployed and tested in our preferred deployment environments. Some features or components may be unstable in other environments.
We would love to receive bug reports. We are also happy to clarify points of confusion or consider feature requests. Please file GitHub Issues. We also invite contributions; if you are interested in adding work to gPhoton 2, please email the repository maintainers and let us know what you might be interested in fixing, tweaking, or implementing.
If you use this software to support your research, please cite: M. St. Clair, C. Million, R. Albach, S.W. Fleming (2022) gPhoton2. DOI:10.5281/zenodo.7472061
This project has been supported by NASA Grants 80NSSC18K0084 and 80NSSC21K1421.