vo-cutouts is versioned with semver. Dependencies are updated to the latest available version during each release, and aren't noted here.
Find changes for the upcoming release in the project's changelog.d directory.
- Switch to Wobbly for job storage. All previous job history will be lost unless the vo-cutouts database is converted into Wobbly's storage format and inserted into Wobbly's database.
- Catch errors from parsing the dataset ID or creating a Butler in the backend worker and report them as proper worker exceptions so that they don't produce uncaught exception errors.
- Append a colon after the error code when reporting UWS errors.
- Render all UWS XML output except for error VOTables using vo-models rather than hand-written XML templates.
- Use Alembic to manage the schema of the UWS database. When upgrading to this version, set
config.updateSchema
to true in the Helm configuration for the first deployment. This release contains no schema changes, but needs to perform a migration to add the Alembic version information. The vo-cutouts components will now refuse to start if the database schema has changed and the database has not yet been migrated.
- Restore logging configuration during startup of the backend worker, which re-adds support for the logging profile and log level and optionally configures structlog to use a JSON log format. This does not yet extend to the log messages issued directly by arq.
- The database worker pod now deletes the records for all jobs that have passed their destruction times once per hour.
- Restore support for execution duration and change the default execution duration back to 10 minutes. Use a very ugly hack to enforce a timeout in the backend worker that will hopefully not be too fragile.
- Add support for aborting jobs.
- Re-add the
CUTOUT_TIMEOUT
configuration option to change the default and maximum execution duration for cutout jobs. - Support pre-signed URLs returned by the backend worker. If the result URL is an
http
orhttps
URL, pass it to the client unchanged. - Abort jobs on deletion or expiration if they are pending, queued, or executing.
- Worker pods now wait for 30 seconds (UWS database workers) or 55 seconds (cutout workers) for jobs to finish on shutdown before cancelling them.
- Allow time durations in the configuration to be given in number of seconds as a string, which was accidentally broken in 3.0.0.
- Restore support for automatically starting an async job by setting
phase=RUN
in the POST body. The equivalent query parameter was always supported, but POST body support was accidentally dropped in 3.0.0. - Add a colon after the error code and before the error message in error replies.
- Stop setting
isPost
when returning UWS parameters. This undocumented field is supposed to only be set if the parameter contains a raw POST value rather than a regular parameter, which is never the case here.
- Stop upgrading the operating system packages in the worker image because the base image is so old that the package repositories no longer exist. This will hopefully be fixed in a future release of the Science Pipelines base image based on AlmaLinux.
- Some XML output from UWS handlers is now handled by vo-models instead of hand-written XML templates. More responses will hopefully be converted in the future.
- Cancelling or aborting jobs is not supported by the combination of arq and sync worker functions. Properly reflect this in job metadata by forcing execution duration to 0 to indicate that no limit is applied. Substantially increase the default arq job timeout since the timeout will be ineffective anyway.
- Drop the
CUTOUT_TIMEOUT
configuration option since we have no way of enforcing a timeout on jobs. - Upgrade the base image for the backend worker to the latest weekly. This includes a new version of
lsst.daf.butler
, which targets a new version of the Butler server with a backwards-incompatible REST API.
- Support human-readable
4h30m20s
-style strings forCUTOUT_LIFETIME
andCUTOUT_SYNC_TIMEOUT
in addition to numbers of seconds.
- Unknown failures in the worker backend are now recorded as fatal UWS errors rather than transient errors. This is the more conservative choice for unknown exceptions.
- Change the job queuing system from Dramatiq to arq. This change should be transparent to users when creating new jobs, but any in-progress jobs at the time of the upgrade will be orphaned.
- Use workload identity for all authentication when deployed on Google Cloud. Separate service account keys are no longer required or used. The
vo-cutouts
Google service account now requires thestorage.legacyBucketWriter
role in addition tostorage.objectViewer
.
- Add support for
gs
storage URLs in addition tos3
storage URLs. When ags
storage URL is used, the image cutout backend will use the Google Cloud Storage Python API to store the results instead of boto, which will work correctly with workload identity. - Catch the error thrown when the cutout has no overlap with the specified image and return a more specific error message to the user.
- Add support for sending Slack notifications for uncaught exceptions in route handlers.
- Add support for sending Slack notifications for unexpected errors when processing cutout jobs.
- If the backend image processing code fails with an exception, include a traceback of that exception in the detail portion of the job error.
- Queuing a job for execution in the frontend is now async and will not block the event loop, which may help with performance under load.
- Report fatal (not transient) errors on backend failures. We have no way of knowing whether a failure will go away on retry, so make the conservative assumption that it won't.
- Update to the latest weekly as a base image for the cutout worker, which picks up new versions of lsst-resources and the Butler client.
- Add support for querying the Butler server rather than instantiating local Butler instances. To support this, vo-cutouts now requires delegated tokens from Gafaelfawr so that it can make API calls on behalf of the user.
- Send uvicorn logs through structlog for consistent JSON formatting and add context expected by Google Cloud Logging to each log message.
- Standardize the environment variables used for configuration. Rename
SAFIR_
environment variables toCUTOUT_
, removeSAFIR_LOG_NAME
, and addCUTOUT_PATH_PREFIX
to control the API path prefix. This is handled by the Phalanx chart, so should be invisible to users. - Add a change log maintained using scriv.
- Use Ruff for linting and formatting instead of Black, flake8, and isort.
There are no major functionality changes in this release. It updates dependencies, packaging, and coding style, makes more use of Safir utility functions, and bumps the version to 1.0.0 since this is acceptable as a release candidate, even though we hope to add additional functionality later.
- Clip stencils at the edge of the image instead of raising an error in the backend. Practical experience with the Portal and deeper thought about possible scientific use cases have shown this to be a more practical and user-friendly approach.
- Stop masking pixels outside the cutout stencil. The current performance of masking is unreasonably slow for
CIRCLE
cutouts, and masking isn't required by the SODA standard. We may revisit this later with a faster algorithm.
- Dataset IDs are now Butler URIs instead of just the bare UUID. The
CUTOUT_BUTLER_REPOSITORY
configuration setting is no longer used. Instead, the backend maintains one instance of a Butler and corresponding cutout backend per named Butler repository, taken from the first component of the Butler URI.
- Build a Docker image (as
lsstsqre/vo-cutouts-worker
) for the backend worker, based on a Rubin stack container.
- Use
/api/cutout
as the prefix for all public routes. This was previously done via a rewrite in the ingress. Making the application's internal understanding of its routes match the exposed user-facing routes simplifies the logic and fixes the URLs shown in the/api/cutout/capabilities
endpoint. - Record all times in the database to second granularity, rather than storing microseconds for some times and not others.
- Fix retries of async database transactions when the database saw a simultaneous write to the same row from another worker.
- Enable results storage for the cutout worker to suppress a Dramatiq warning.
- Add logging to every state-changing route and for each Dramatiq worker operation.
This is the initial production candidate. Another release will be forthcoming to clean up some remaining issues, but this version contains the core functionality and uses a proper backend.
- The database schema of this version is incompatible with 0.1.0. The database must be wiped and recreated during the upgrade.
- Use
lsst.image_cutout_backend
as the backend instead ofpipetask
without conversion of coordinates to pixels. - Dataset IDs are now Butler UUIDs instead of colon-separated tuples.
- Support POLYGON and CIRCLE stencils and stop supporting POS RANGE, matching the capabilities of the new backend.
- Use a separate S3 bucket to store the output rather than a Butler collection. Eliminate use of Butler in the frontend, in favor of using that S3 bucket directly. This eliminated the
CUTOUT_BUTLER_COLLECTION
configuration setting and adds newCUTOUT_STORAGE_URL
andCUTOUT_TMPDIR
configuration settings. - Use a different method of generating signed S3 result URLs that works correctly with workload identity in a GKE cluster. This adds a new
CUTOUT_SERVICE_ACCOUNT
configuration setting specifying the service account to use for URL signing. The workload identity the service runs as must have theroles/iam.serviceAccountTokenCreator
role so that it can create signed URLs.
- Add new
--reset
flag tovo-cutouts init
to wipe the existing database.
- Stop using a FastAPI subapp. This was causing problems for error handling, leading to exceptions thrown in the UWS handlers to turn into 500 errors with no logged exception and no error details.
Initial version, which uses a colon-separated tuple as the ID
parameter and has an initial proof-of-concept backend that runs pipetask
via subprocess
and does not do correct conversion of coordinates to pixels.
This is only a proof of concept release. Some of the things it does are very slow and block the whole asyncio process. The backend will be changed significantly before the first production release.