Releases: lsst-sqre/mobu
Releases · lsst-sqre/mobu
8.1.0
New features
NotebookRunner
flocks can now pick up changes to their notebooks without having to restart the whole mobu process. This refresh can happen via:- GitHub
push
webhook post to/mobu/github/webhook
with changes to a repo and branch that matches the flock config monkeyflocker refresh <flock>
POST
to/mobu/flocks/{flock}/refresh
- GitHub
8.0.0
Backwards-incompatible changes
- NotebookRunner business now runs all notebooks in a repo, at the root and in all subdirs recursively, by default.
- Add
exclude_dirs
option to NotebookRunner business to list directories in which notebooks will not be run.
What's Changed
- Bump python from 3.12.2-slim-bookworm to 3.12.3-slim-bookworm by @dependabot in #344
- Update dependencies by @rra in #346
- DM-44397: Run all notebooks in repo directory by @fajpunk in #347
- DM-44397: Release 8.0.0 by @fajpunk in #348
New Contributors
Full Changelog: 7.1.1...8.0.0
7.1.1
Bug fixes
- Correctly extract cookies from the middle of the redirect chain caused by initial authentication to a Nublado lab. This fixes failures seen with labs containing jupyterhub 4.1.3.
What's Changed
- DM-43573: Handle cookies from JupyterLab redirects by @rra in #342
- DM-43573: Prepare 7.1.1 release by @rra in #343
Full Changelog: 7.1.0...7.1.1
7.1.0
New features
- Add
GitLFSBusiness
for testing Git LFS by storing and retrieving a Git LFS-managed artifact.
Bug fixes
- Properly handle the XSRF tokens for JupyterHub and the Jupyter lab by storing separate tokens for the hub and lab after initial login and sending the appropriate XSRF token in the
X-XSRFToken
header to the relevant APIs. This fixes a redirect loop at the Jupyter lab when running 4.1.0 or later.
Other changes
- mobu now uses uv to maintain frozen dependencies and set up a development environment.
What's Changed
- Bump actions/setup-python from 4 to 5 by @dependabot in #326
- Fix syntax in periodic CI by @rra in #327
- Install pre-commit before calling autoupdate by @rra in #328
- Update Python and pre-commit dependencies by @rra in #330
- Update Python dependencies by @rra in #332
- Update dependencies by @rra in #333
- Update Python and pre-commit dependencies by @rra in #335
- Bump python from 3.12.1-slim-bullseye to 3.12.2-slim-bullseye by @dependabot in #337
- Bump pre-commit/action from 3.0.0 to 3.0.1 by @dependabot in #336
- Update Python dependencies by @rra in #338
- DM-43423: Fix XSRF cookie and header handling for JupyterLab by @rra in #340
- tickets/DM:43203: add Git-LFS business by @athornton in #339
- DM-43423: Prepare 7.1.0 release by @rra in #341
Full Changelog: 7.0.0...7.1.0
7.0.0
Backwards-incompatible changes
- Drop support for cachemachine and Nublado v2. The
cachemachine_image_policy
anduse_cachemachine
configuration options are no longer supported and should be deleted. - Rename the existing
TAPQueryRunner
business toTAPQuerySetRunner
to more accurately capture what it does. Add a newTAPQueryRunner
business that runs queries chosen randomly from a list. Based on work by @stvoutsin. - Rename
JupyterPythonLoop
toNubladoPythonLoop
to make it explicit that it requires Nublado and will not work with an arbitrary JupyterHub.
New features
- Convert all configuration options that took intervals in seconds to
timedelta
. Bare numbers will still be interpreted as a number of seconds, but any format Pydantic recognizes as atimedelta
may now be used.
Other changes
- All environment variables used to configure mobu now start with
MOBU_
, and several have changed their names. The new settings areMOBU_ALERT_HOOK
,MOBU_AUTOSTART_PATH
,MOBU_ENVIRONMENT_URL
,MOBU_GAFAELFAWR_TOKEN
,MOBU_NAME
,MOBU_PATH_PREFIX
,MOBU_LOGGING_PROFILE
, andMOBU_LOG_LEVEL
. This is handled by the Phalanx application, so no configuration changes should be required.
What's Changed
- [neophile] Update dependencies by @neophile-square in #284
- DM-39989: Update GitHub Actions and dependencies by @rra in #285
- [neophile] Update dependencies by @neophile-square in #286
- [neophile] Update dependencies by @neophile-square in #287
- [neophile] Update dependencies by @neophile-square in #289
- [neophile] Update dependencies by @neophile-square in #290
- [neophile] Update dependencies by @neophile-square in #291
- [neophile] Update dependencies by @neophile-square in #292
- [neophile] Update dependencies by @neophile-square in #294
- Bump actions/checkout from 3 to 4 by @dependabot in #295
- Bump python from 3.11.4-slim-bullseye to 3.11.5-slim-bullseye by @dependabot in #293
- [neophile] Update dependencies by @neophile-square in #297
- [neophile] Update dependencies by @neophile-square in #298
- Update Python dependencies by @rra in #300
- [neophile] Update dependencies by @neophile-square in #301
- [neophile] Update dependencies by @neophile-square in #303
- [neophile] Update dependencies by @neophile-square in #305
- [neophile] Update dependencies by @neophile-square in #306
- [neophile] Update dependencies by @neophile-square in #307
- [neophile] Update dependencies by @neophile-square in #308
- [neophile] Update dependencies by @neophile-square in #309
- [neophile] Update dependencies by @neophile-square in #311
- DM-42182: Rename TAPQuerySetRunner, add new TAPQueryRunner by @rra in #312
- DM-42182: Update Python dependencies by @rra in #313
- DM-42182: Update to Pydantic v2 by @rra in #315
- Bump python from 3.11.5-slim-bullseye to 3.12.1-slim-bullseye by @dependabot in #314
- DM-42182: Update to Python 3.12 by @rra in #316
- DM-42182: Switch to Ruff for reformatting by @rra in #317
- DM-42182: Run pre-commit autoupdate with make update-deps by @rra in #318
- DM-42182: Tell Click testing to not catch exceptions by @rra in #319
- DM-42182: Use new Annotated syntax for handlers by @rra in #320
- DM-42182: Simplify the Docker build by @rra in #321
- DM-42182: Change some JupyterHub terminology to Nublado by @rra in #322
- DM-42225: Drop support for cachemachine by @rra in #323
- DM-42225: Convert intervals to timedeltas by @rra in #324
- DM-42225: Prepare 7.0.0 release by @rra in #325
New Contributors
- @neophile-square made their first contribution in #284
Full Changelog: 6.1.1...7.0.0
6.1.1
Bug fixes
- Rather than dumping the full monkey data when summarizing flocks, which can cause long enough delays that in-progress calls fail due to the huge amount of timing data, extract only the success and failure count from the running business. This should be considerably faster and avoid timeout problems.
- Improve error reporting by catching exceptions thrown while sending code to the lab WebSocket for execution.
What's Changed
- [neophile] Update dependencies by @sqrbot in #272
- DM-39552: Ignore comm messages from the lab by @rra in #273
- DM-39552: Catch exceptions while sending to the WebSocket by @rra in #274
- [neophile] Update dependencies by @sqrbot in #275
- Bump python from 3.11.3-slim-bullseye to 3.11.4-slim-bullseye by @dependabot in #276
- DM-39552: Remove pytest-httpx dependency by @rra in #277
- DM-39627: Run neophile from GitHub Actions by @rra in #278
- [neophile] Update dependencies by @sqrbot in #279
- Update frozen Python dependencies by @rra in #281
- DM-39552: Generate less data when summarizing flocks by @rra in #282
- DM-39552: Prepare release 6.1.1 by @rra in #283
Full Changelog: 6.1.0...6.1.1
6.1.0
New features
- The timeout when talking to JupyterHub and Jupyter labs can now be configured in the business options (as
jupyter_timeout
). The default is now 60s instead of 30s.
Bug fixes
- When reporting httpx failures to Slack, put the response body into an attachment instead of a block so that it will be collapsed if long.
- Fix reporting of WebSocket open timeouts to Slack.
What's Changed
- DM-39360: Also ignore execute_result messages by @rra in #266
- DM-39360: Update dependencies by @rra in #267
- DM-39360: Fix timeout reporting opening WebSocket by @rra in #268
- [neophile] Update dependencies by @sqrbot in #269
- DM-39360: Make Jupyter client timeout configurable by @rra in #270
- DM-39360: Prepare for 6.1.0 release by @rra in #271
Full Changelog: 6.0.0...6.1.0
6.0.0
Backwards-incompatible changes
- Configuration of whether to use cachemachine and, if so, what image policy to use is now done at the business level instead of globally. This allows the same mobu instance to test both Nublado v2 and Nublado v3.
New features
- The maximum allowable size for a WebSocket message from the Jupyter lab is now configurable per business and defaults to 10MB instead of 4MB.
Bug fixes
- Revert change in 5.0.0 to number all cells, and go back to counting only code cells for numbering purposes. This matches the way cell numbers are displayed in the Jupyter lab UI.
- When reporting errors to Slack, mobu 5.0.0 mistakenly started stripping ANSI escape sequences from the code being executed, which should be safe since it comes from local notebooks or configuration, instead of the error output, which is where Jupyter labs like to add formatting. Strip ANSI escape sequences from the error output instead of the code.
What's Changed
- [neophile] Update dependencies by @sqrbot in #260
- DM-39325: Revert cell numbering to only count code cells by @rra in #261
- DM-39325: Make max WebSocket message size configurable by @rra in #262
- DM-39325: Move cachemachine configuration into business options by @rra in #263
- DM-39325: Fix stripping of ANSI escapes in Slack messages by @rra in #264
- DM-39325: Prepare changes for 6.0.0 by @rra in #265
Full Changelog: 5.1.0...6.0.0
5.1.0
New features
- mobu now uses httpx instead of aiohttp for all HTTP requests (including websockets for WebSocket connections and httpx-sse for EventStream connections) and makes use of the Safir framework for parsing and reporting HTTP client exceptions. Alerts for failing web requests will be somewhat different and hopefully clearer.
- mobu now sends keep-alive pings on the WebSocket connection to the lab, hopefully allowing successful execution of cells that take more than five minutes to run.
- Nublado-based businesses can now set
debug
to true in the image specification to request that debugging be enabled in the spawned Jupyter lab. - mobu now catches timeouts attempting to open a WebSocket to the lab and reports them to Slack with more details.
- Slack alerts from monkeys now include the flock and monkey name as a field in the alert.
- Unexpected business exceptions now include an "Exception type" heading and use "Failed at" instead of "Date" to match the display of expected exceptions.
- The prefix for mobu routes (
/mobu
by default) can now be configured withSAFIR_PATH_PREFIX
. - Uncaught exceptions from mobu's route handlers are now also reported to Slack.
Bug fixes
- The code to determine the Docker reference and description of the running Nublado image is now more robust against unexpected output.
- Node and cell information in Slack error reports for Nublado errors are now formatted as full blocks rather than fields, since they are often too wide to fit nicely in the limited width of a Slack Block Kit field.
Other changes
- The default
error_idle_time
for Nublado-based business is back to 60 seconds instead of 10 minutes. The problem the longer timeout was working around should be fixed in the new Nublado lab controller. - Nublado-based notebooks now request the
JUPYTER_IMAGE_SPEC
environment variable instead ofJUPYTER_IMAGE
to get the running image for error reporting purposes. This is now the preferred environment variable andJUPYTER_IMAGE
is deprecated. - mobu now uses the Ruff linter instead of flake8, isort, and pydocstyle.
What's Changed
- DM-38425: Be more robust when getting the running image by @rra in #228
- [neophile] Update dependencies by @sqrbot in #229
- [neophile] Update dependencies by @sqrbot in #230
- [neophile] Update dependencies by @sqrbot in #231
- Bump python from 3.11.2-slim-bullseye to 3.11.3-slim-bullseye by @dependabot in #232
- [neophile] Update dependencies by @sqrbot in #233
- [neophile] Update dependencies by @sqrbot in #234
- [neophile] Update dependencies by @sqrbot in #236
- DM-38425: Switch to scriv for change log management by @rra in #237
- DM-38425: Revert error_idle_time change from 5.0.0 by @rra in #238
- DM-38425: Use JUPYTER_IMAGE_SPEC, not JUPYTER_IMAGE by @rra in #239
- DM-38425: Log image reference and description in monkey by @rra in #240
- DM-38425: Format node and cell as blocks by @rra in #241
- [neophile] Update dependencies by @sqrbot in #242
- DM-38425: Convert to httpx by @rra in #243
- DM-38425: Improve logged image description by @rra in #244
- DM-38425: Adopt current Safir conventions by @rra in #245
- DM-38425: Split out Gafaelfawr storage layer and use models by @rra in #246
- DM-38425: Change how we force a refresh of JupyterHub auth by @rra in #247
- DM-38425: Switch to websockets by @rra in #248
- DM-38425: Allow enabling of lab debugging by @rra in #249
- DM-38425: Remove WebSocket message size limit by @rra in #250
- DM-38425: Stream monkey logs instead of using FileResponse by @rra in #251
- DM-38425: Remove old testing machinery by @rra in #252
- DM-38425: Increase the WebSocket open timeout by @rra in #253
- DM-38425: Set maximum WebSocket message size to 4MB by @rra in #254
- DM-38425: Configure Ruff and fix things it found by @rra in #255
- DM-38425: Switch to Ruff for linting by @rra in #256
- DM-38425: Catch and report timeouts opening WebSocket by @rra in #257
- DM-38425: Include the monkey and flock in Slack exceptions by @rra in #258
- DM-38425: Prepare 5.1.0 release by @rra in #259
Full Changelog: 5.0.0...5.1.0
5.0.0
Backwards-incompatible changes
- Settings are now handled with Pydantic and undergo much stricter validation. In particular, the Slack web hook URL must now be a valid URL if provided.
- In order to enable stricter and more useful Pydantic validation of flock specifications, the syntax for creating a flock has changed.
business
is now a dictionary, therestart
option has been moved under it, the type of business is specified withtype
, and the business configuration options have moved under that key asoptions
. Options that are not applicable to a given business type are now rejected. - The
jupyter.url_prefix
option is now justurl_prefix
, andjuyter.image
is now justimage
. The names of the setting underimage
have changed. - The
TAPQueryRunner
optionstap_sync
andtap_query_set
are now justsync
andquery_set
. lab_settle_time
is no longer supported as a configuration option for the businesses that spawn a Nublado lab. It defaulted to 0 and we never set it.JupyterJitterLoginLoop
has been retired. Instead, set thejitter
option onJupyterPythonLoop
.JupyterLoginLoop
has been merged withJupyterPythonLoop
. The only difference in the former is that no lab session was created and no code was run, which seems pointless and not worth the distinction.JupyterPythonLoop
runs a simple addition by default, which should be an improvement overJupyterLoginLoop
in every likely situation.
New features
- When the production logging profile is used, the messages from monkeys are no longer reported to the main mobu log, only to the individual monkey logs. This should produce considerably less noise in external log aggregators.
- The notebook being run is now included in all Slack error reports, not just for code execution failures.
- The API documentation now shows only the relevant options for the type of business when showing how to create a flock.
- Add support for running a business once and returning its results, via a POST to the new
/run
endpoint. - Add support for the new Nublado lab controller (see SQR-066.
- The time a business pauses after a failure before it is restarted is now configurable with the
error_idle_time
option and defaults to 10 minutes (instead of 1 minute) for Nublado businesses, since this is how long JupyterHub will wait for a lab to spawn before giving up.
Bug fixes
- The
dp0.2
TAPQueryRunner
query set is now lighter-weight and will consume less memory and CPU to execute, hopefully reducing timeout errors. - Cell numbering in error reports is now across all cells, not just code cells.
TAPQueryRunner
no longer creates a TAP client in its__init__
method, since creating a TAP client makes HTTP requests to the TAP server that can fail and failure would potentially crash mobu. Instead, it creates the TAP client instartup
and handles exceptions properly so that they're reported to Slack.- Business failures during
startup
are now counted as a failed execution so that a business that fails repeatedly instartup
doesn't report 100% success in the flock summary. - The code run by
JupyterPythonLoop
andNotebookRunner
to get the Kubernetes node on which the lab is running now useslsst.rsp.get_node
instead of the deprecatedrubin_jupyer_utils.lab.notebook.utils.get_node
.
Other changes
- Slightly improve logging when monkeys are shut down due to errors.
- mobu's internals have been extensively refactored following the design in SQR-072 to hopefully make future maintenance easier.
What's Changed
- [neophile] Update dependencies by @sqrbot in #171
- [neophile] Update dependencies by @sqrbot in #172
- [neophile] Update dependencies by @sqrbot in #173
- [neophile] Update dependencies by @sqrbot in #174
- Bump python from 3.10.6-slim-bullseye to 3.10.7-slim-bullseye by @dependabot in #175
- [neophile] Update dependencies by @sqrbot in #176
- [neophile] Update dependencies by @sqrbot in #177
- [neophile] Update dependencies by @sqrbot in #178
- [neophile] Update dependencies by @sqrbot in #179
- [neophile] Update dependencies by @sqrbot in #180
- [neophile] Update dependencies by @sqrbot in #182
- [neophile] Update dependencies by @sqrbot in #183
- [neophile] Update dependencies by @sqrbot in #185
- [neophile] Update dependencies by @sqrbot in #186
- [neophile] Update dependencies by @sqrbot in #187
- [neophile] Update dependencies by @sqrbot in #188
- [neophile] Update dependencies by @sqrbot in #189
- [neophile] Update dependencies by @sqrbot in #190
- [neophile] Update dependencies by @sqrbot in #192
- [neophile] Update dependencies by @sqrbot in #193
- [neophile] Update dependencies by @sqrbot in #194
- [neophile] Update dependencies by @sqrbot in #195
- [neophile] Update dependencies by @sqrbot in #196
- [neophile] Update dependencies by @sqrbot in #197
- Bump python from 3.10.7-slim-bullseye to 3.11.1-slim-bullseye by @dependabot in #191
- [neophile] Update dependencies by @sqrbot in #198
- [neophile] Update dependencies by @sqrbot in #200
- Bump docker/build-push-action from 3 to 4 by @dependabot in #201
- Bump python from 3.11.1-slim-bullseye to 3.11.2-slim-bullseye by @dependabot in #203
- [neophile] Update dependencies by @sqrbot in #202
- [neophile] Update dependencies by @sqrbot in #204
- [neophile] Update dependencies by @sqrbot in #206
- Downscale DP0.2 querys to approx. DP0.1 size by @fritzm in #205
- [neophile] Update dependencies by @sqrbot in #207
- DM-38339: Use the new Safir Slack webhook support by @rra in #208
- DM-38339: Convert to pyproject.toml by @rra in #209
- DM-38339: Update GitHub Actions configuration by @rra in #210
- DM-38339: Update mypy configuration and type annotations by @rra in #211
- DM-38339: Switch to backtracking resolver by @rra in #212
- DM-38339: Update to latest Safir, use Settings for config by @rra in #213
- DM-38339: Use relative imports for test modules by @rra in #214
- DM-38339: Remove types from docstrings by @rra in #215
- DM-38339: Redo monkey logging and state machine by @rra in #216
- DM-38339: Fix some coding style issues in TAP code by @rra in #217
- DM-38339: Add notebook to Slack error reports by @rra in #218
- [neophile] Update dependencies by @sqrbot in #219
- DM-38339: Reorganize source and clean up business type structure by @rra in #220
- DM-38339: Remove lab_settle_time configuration by @rra in #221
- DM-38339: Eliminate JupyterJitterLoginLoop by @rra in #222
- DM-38339: Merge JupyterLoginLoop and JupyterPythonLoop by @rra in #223
- DM-38339: Handle failures during TAPQueryRunner setup by @rra in #224
- DM-38339: Refactor mobu state management by @rra in #225
- DM-38339: Add support for running a business once by @rra in #226
- DM-38408: Add support for the new Nublado lab controller by @rra in #227
Full Changelog: 4.5.0...5.0.0