Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45860: Expand support for SIAv2 and VOTable #18

Merged
merged 71 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
802c1db
Ignore _build files
timj Aug 29, 2024
04910d3
Simple config suitable for ci_hsc_gen3 repo
timj Aug 27, 2024
4a31061
Add ObsCore felis definition
timj Aug 27, 2024
4bdef29
Add VOTable exporter option
timj Aug 27, 2024
d8dc4be
First SIAv2 prototype
timj Aug 27, 2024
dc9e786
Reorganize SIA query to support multiple instruments
timj Aug 28, 2024
2b7afc6
Allow for verbose log level
timj Aug 28, 2024
88ea57b
Add EXPTIME support
timj Aug 28, 2024
295a8a7
Move SIAv2 query handling into single method
timj Aug 28, 2024
fab2dc1
Use new DimensionGroup API to get relevant time/region dimension
timj Aug 29, 2024
0f90581
Add do not merge check workflow
timj Aug 29, 2024
61d1660
Rearrange logic to allow "exposure" queries to work with regions
timj Aug 29, 2024
5e2fab7
Reorganize to have a standalone siav2_query() interface
timj Aug 29, 2024
daf3d00
Refactor dataset type selection in config
timj Aug 29, 2024
ced1fd1
Use new ability to Butler.import_ without datastore
timj Aug 29, 2024
071d29b
Modernize test export file to stop it warning
timj Aug 29, 2024
7b5e4a3
Add test for votable export
timj Aug 29, 2024
31bf392
Fix docstring for siav2_query
timj Aug 29, 2024
b1235df
Fix units in wavelengths for test config
timj Aug 29, 2024
fd09899
Fix the types of extra columns in dp02 config
timj Aug 29, 2024
63d1ac2
Include the extra columns in ci_hsc config for testing
timj Aug 29, 2024
f5781f1
If no timespan is defined assume all timespans are valid in query
timj Aug 30, 2024
a8a6aab
Refactor the SIAv2 handling into a class
timj Aug 30, 2024
b7ad963
Add two helper models for SIAv2
timj Aug 30, 2024
dcd4c17
Add --calib to SIAv2 query
timj Aug 31, 2024
c0c2e4e
Add explicit dependency on resources
timj Aug 31, 2024
b6b80a2
Track upstream change of name in DimensionGroup.region_dimension
timj Sep 3, 2024
fbb0ffd
Use the new public butler query API
timj Sep 6, 2024
d42ebcd
Modify the SIAv2Parameters handling to allow multiples
timj Sep 9, 2024
0e67d88
For multiple POS regions create a single UnionRegion
timj Sep 10, 2024
f7063ac
Add all SIAv2 fields to parameters model
timj Sep 10, 2024
965ce1f
Add full set of parameters to siav2_query_from_raw
timj Sep 10, 2024
71f6d0e
Specify which siav2_query parameters should be kwargs
timj Sep 10, 2024
f93a166
Add support for SIAv2 MAXREC
timj Sep 10, 2024
a4dc3e4
When combining WhereBind return single element if only one
timj Sep 10, 2024
06b7506
Add limit test for votable export
timj Sep 10, 2024
a4dfa17
Move shared test code to class to allow future SIAv2 testing
timj Sep 10, 2024
36ba332
Add raws to the test yaml data
timj Sep 11, 2024
ade6692
Add visit definition to export file to allow exposure region queries
timj Sep 11, 2024
80443d6
Add some tests for SIAv2
timj Sep 11, 2024
d0cdec8
Do not allow all butler collections to be searched
timj Sep 11, 2024
fb13a47
Test that batching works
timj Sep 11, 2024
b4f97c8
Add support for int32 and int16 arrow types
timj Sep 11, 2024
1493f07
Install YAML config files
timj Sep 18, 2024
d0e5ee6
Improve command-line help text for SIAv2
timj Sep 18, 2024
839f637
Add Return docs to method
timj Sep 18, 2024
829fda7
Pull valid CALIB values out into a constant
timj Sep 18, 2024
f61bf66
Do not add tmp butlers from tests
timj Sep 18, 2024
253b157
Use a bind parameter for instrument query
timj Sep 18, 2024
0baec02
Use "instrument IN" in multi-instrument queries
timj Sep 23, 2024
c97cc2d
Issue all warnings at once and include in VOTable
timj Sep 23, 2024
98f463d
Cache the Felis schema
timj Sep 23, 2024
f625c28
Add explicit SIAv2 test with known and unknown instrument
timj Sep 25, 2024
b634e8e
Simplify the is None mask check
timj Sep 30, 2024
e253bc1
Rewrite the VOTable construction code to use arrow_to_numpy
timj Sep 30, 2024
007f438
Stop using deprecated mambaforge in action
timj Sep 30, 2024
dfbeb70
Explicitly copy the config for testing before setting batch size
timj Sep 30, 2024
20ec130
Add utype to output
timj Oct 2, 2024
3056da8
Fix some typos
timj Oct 3, 2024
6c2a4f7
fixup! Cache the Felis schema
timj Oct 3, 2024
2c0013f
fixup! Modify the SIAv2Parameters handling to allow multiples
timj Oct 3, 2024
81ea21c
Add some tests for WhereBind and allow identical duplicate binds
timj Oct 3, 2024
dfb2e2c
Remove unnecessary results variable
timj Oct 3, 2024
443c1cf
Fix parameter types in siav2 script API docstrings
timj Oct 3, 2024
0cb2a19
Use model_copy to copy config
timj Oct 3, 2024
ded600c
Trap when WhereBind.combine is called with nothing
timj Oct 4, 2024
f3d99e5
Handle empty wheres that can now come from exptime calculation
timj Oct 4, 2024
5f7f6e1
Return overflow status rather than use object property
timj Oct 4, 2024
b0ed10d
Use single limit variable in loop
timj Oct 4, 2024
6e5a58f
Make SIAv2Parameters public
timj Oct 4, 2024
06a2bc1
Add a comment to the ci_hsc_gen3 config
timj Oct 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/do_not_merge.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: "Check commits can be merged"
on:
push:
branches:
- main
pull_request:

jobs:
do-not-merge-checker:
runs-on: ubuntu-latest

steps:
- name: Check that there are no commits that should not be merged
uses: gsactions/commit-message-checker@v2
with:
excludeDescription: "true" # optional: this excludes the description body of a pull request
excludeTitle: "true" # optional: this excludes the title of a pull request
checkAllCommitMessages: "true" # optional: this checks all commits associated with a pull request
accessToken: ${{ secrets.GITHUB_TOKEN }} # github access token is only required if checkAllCommitMessages is true
# Check for message indicating that there is a commit that should
# not be merged.
pattern: ^(?!DO NOT MERGE)
flags: "i"
error: |
"This step failed because there is a commit containing the text
'DO NOT MERGE'. Remove this commit from the branch before merging
or change the commit summary."

- uses: actions/checkout@v4

- name: Check requirements.txt for branches
shell: bash
run: |
FILE="requirements.txt requirements/main.in requirements/test.in"
MATCH=tickets/DM-
if grep -q $MATCH $FILE
then
echo "Ticket branches found in $FILE:"
grep -n $MATCH $FILE
exit 1
fi
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
_build.*
*.o
*.so
*.os
Expand All @@ -20,3 +21,4 @@ config.log
.mypy_cache/
python/lsst/dax/obscore/version.py
tests/.tests/
tests/tmp*
58 changes: 58 additions & 0 deletions configs/ci_hsc_gen3.yaml
timj marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
facility_name: Subaru
obs_collection: LSST.CI
collections: ["HSC/runs/ci_hsc"]
dataset_types:
raw:
dataproduct_type: image
dataproduct_subtype: lsst.raw
calib_level: 1
obs_id_fmt: "{records[exposure].obs_id}"
o_ucd: phot.count
access_format: image/fits
calexp:
dataproduct_type: image
dataproduct_subtype: lsst.calexp
calib_level: 2
obs_id_fmt: "{records[visit].name}"
o_ucd: phot.count
access_format: image/fits
deepCoadd:
dataproduct_type: image
dataproduct_subtype: lsst.coadd
calib_level: 3
obs_id_fmt: "{skymap}-{tract}-{patch}"
o_ucd: phot.count
access_format: image/fits
extra_columns:
lsst_visit:
template: "{visit}"
type: "int"
lsst_detector:
template: "{detector}"
type: "int"
lsst_tract:
template: "{tract}"
type: "int"
lsst_patch:
template: "{patch}"
type: "int"
lsst_band:
template: "{band}"
type: "string"
lsst_filter:
template: "{physical_filter}"
type: "string"
spectral_ranges:
"HSC-G": [406.0e-9, 545.0e-9]
"HSC-R": [543.0e-9, 693.0e-9]
"HSC-R2": [542.0e-9, 693.0e-9]
"HSC-I": [690.0e-9, 842.0e-9]
"HSC-I2": [692.0e-9, 850.0e-9]
"HSC-Z": [852.0e-9, 928.0e-9]
"HSC-Y": [937.0e-9, 1015.0e-9]
"N921": [914.7e-9, 928.1e-9]
"g": [406.0e-9, 545.0e-9]
"r": [542.0e-9, 693.0e-9]
"i": [692.0e-9, 850.0e-9]
"z": [852.0e-9, 928.0e-9]
"y": [937.0e-9, 1015.0e-9]
4 changes: 2 additions & 2 deletions configs/dp02.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,10 @@ extra_columns:
type: "int"
lsst_band:
template: "{band}"
type: "str"
type: "string"
lsst_filter:
template: "{physical_filter}"
type: "str"
type: "string"
spectral_ranges:
"u": [330.0e-9, 400.0e-9]
"g": [402.0e-9, 552.0e-9]
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ dependencies = [
"lsst-utils",
"lsst-daf-butler",
"lsst-sphgeom",
"lsst-resources",
"lsst-felis",
]
dynamic = ["version"]

Expand All @@ -51,7 +53,7 @@ zip-safe = true
license-files = ["COPYRIGHT", "LICENSE"]

[tool.setuptools.package-data]
"lsst.dax.obscore" = ["py.typed",]
"lsst.dax.obscore" = ["py.typed", "configs/*.yaml"]

[tool.setuptools.dynamic]
version = { attr = "lsst_versions.get_lsst_version" }
Expand Down
79 changes: 77 additions & 2 deletions python/lsst/dax/obscore/cli/cmd/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@
)
@click.option(
"--format",
help="Output format, one of 'parquet' or 'csv'; default: parquet.",
type=click.Choice(["csv", "parquet"]),
help="Output format, one of 'parquet', 'votable', or 'csv'; default: parquet.",
type=click.Choice(["csv", "parquet", "votable"]),
default="parquet",
)
@dataset_type_option(
Expand Down Expand Up @@ -142,3 +142,78 @@
used after adding obscore support to existing repository.
"""
script.obscore_update_table(*args, **kwargs)


@obscore.command(
short_help="Run SIAv2 query and return results.",
cls=ButlerCommand,
)
@repo_argument(required=True)
@destination_argument(
required=True,
help="DESTINATION is the location of the output file.",
type=MWPath(file_okay=True, dir_okay=False, writable=True),
)
@click.option(
"--config",
"-c",
help="Location of the configuration file in YAML format, path or URL.",
required=True,
)
@dataset_type_option(
help=(
"Comma-separated list of Butler dataset types. "
"If specified it must be a subset of dataset types defined in configuration file."
)
)
@collections_option(
help="Butler collections (not SIAv2 collections) to include in the query. Default is to use the "
"collections specified in the SIAv2 configuration file."
)
@click.option(
"--instrument",
help="Name of instrument to use in query. If no instrument is specified all instruments are included.",
type=str,
multiple=True,
)
@click.option(
"--pos",
help="IVOA POS region to use to restrict results. CIRCLE, RANGE and POLYGON are supported.",
type=str,
multiple=True,
)
@click.option(
"--time",
help="A moment in time or a time span as a range to use to constrain the query. Uses MJD UTC.",
type=str,
multiple=True,
)
@click.option("--band", help="Wavelength range to constrain query. Units of meters.", type=str, multiple=True)
@click.option(
"--exptime",
help="Exposure time ranges in seconds.",
timj marked this conversation as resolved.
Show resolved Hide resolved
type=str,
multiple=True,
)
@click.option(
"--calib",
help="Calibration level of the data. Allowed values are 0, 1, 2, and 3",
timj marked this conversation as resolved.
Show resolved Hide resolved
multiple=True,
type=int,
)
@click.option(
"--maxrec",
help="Maximum number of records to return. 0 means no records.",
type=int,
)
@options_file_option()
def siav2(*args: Any, **kwargs: Any) -> None:
"""Export Butler datasets as ObsCore Data Model in parquet format.
timj marked this conversation as resolved.
Show resolved Hide resolved

For details on the SIAv2 parameters see https://www.ivoa.net/documents/SIA/

Multiple values can be specified for a single parameter and the results are
ORed together. All range parameters allow numbers and include -Inf and
+Inf as options.
"""
script.obscore_siav2(*args, **kwargs)

Check warning on line 219 in python/lsst/dax/obscore/cli/cmd/commands.py

View check run for this annotation

Codecov / codecov/patch

python/lsst/dax/obscore/cli/cmd/commands.py#L219

Added line #L219 was not covered by tests
81 changes: 76 additions & 5 deletions python/lsst/dax/obscore/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,71 @@

from __future__ import annotations

__all__ = ["ExporterConfig"]
__all__ = ["ExporterConfig", "WhereBind"]

from collections.abc import Iterable
from typing import Any, Literal

from lsst.daf.butler.registry.obscore import ObsCoreConfig
from pydantic import BaseModel, ConfigDict, Field


class WhereBind(BaseModel):
"""A where expression with associated bind parameters."""

model_config = ConfigDict(frozen=True)

where: str = ""
"""User expression to restrict the output."""
bind: dict[str, Any] = Field(default_factory=dict)
"""Bind values specified in the ``where`` expression."""
extra_dims: frozenset[str] = Field(default_factory=frozenset)
"""Extra dimensions required to be included in query."""

@classmethod
def combine(cls, wheres: list[WhereBind], mode: Literal["AND"] | Literal["OR"] = "AND") -> WhereBind:
"""Combine multiple clauses into a single where expression.
timj marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
wheres : `list` [ `WhereBind`]
The user expressions to combine.
mode : `str`
Combination mode. Can be ``AND`` or ``OR``.

Returns
-------
combo : `WhereBind`
A new `WhereBind` representing all the information of the input
clauses.
"""
timj marked this conversation as resolved.
Show resolved Hide resolved
if len(wheres) == 1:
return wheres[0]
where = f" {mode} ".join(f"({w.where})" for w in wheres)
bind: dict[str, Any] = {}
extras: set[str] = set()
for w in wheres:
# Warn if we are overwriting bind keys.
duplicates = bind.keys() & w.bind.keys()
if duplicates:
timj marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError(

Check warning on line 71 in python/lsst/dax/obscore/config.py

View check run for this annotation

Codecov / codecov/patch

python/lsst/dax/obscore/config.py#L71

Added line #L71 was not covered by tests
f"Combining multiple WHERE clauses with reused bind parameters of {duplicates}"
)
bind.update(w.bind)
extras.update(w.extra_dims)
return cls(where=where, bind=bind, extra_dims=extras)


class ExporterConfig(ObsCoreConfig):
"""Complete configuration for ObscoreExporter."""

where: str = ""
"""User expression to restrict the output. This value can be overridden
with command line options.
"""
where: WhereBind = Field(default_factory=WhereBind)
"""Default user expression to restrict the output. Not used if
per dataset type user expression is provided."""

dataset_type_constraints: dict[str, list[WhereBind]] = Field(default_factory=dict)
"""Specific user expressions for a given dataset type. If a dataset type
timj marked this conversation as resolved.
Show resolved Hide resolved
is not specified here the default ``where`` will be used."""

batch_size: int = 10_000
"""Number of records in a pyarrow RecordBatch"""
Expand All @@ -42,3 +95,21 @@

csv_null_string: str = r"\N"
"""Value to use for NULLs in CSV output."""

def select_dataset_types(self, dataset_types: Iterable[str]) -> None:
"""Update the configuration to include only these dataset types.

Parameters
----------
dataset_types : `~collections.abc.Iterable` [ `str` ]
Names of dataset types to select.
"""
dataset_type_set = set(dataset_types)
# Check that configuration has all requested dataset types.
if not dataset_type_set.issubset(self.dataset_types):
extras = dataset_type_set - set(self.dataset_types)
raise ValueError(f"Dataset types {extras} are not defined in configuration file.")

Check warning on line 111 in python/lsst/dax/obscore/config.py

View check run for this annotation

Codecov / codecov/patch

python/lsst/dax/obscore/config.py#L110-L111

Added lines #L110 - L111 were not covered by tests
# Remove dataset types that are not needed.
self.dataset_types = {
key: value for key, value in self.dataset_types.items() if key in dataset_type_set
}
Loading
Loading