Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: regularly scheduled jobs (cron) #163

Merged
merged 66 commits into from
Jan 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
30197e5
Crontab README
benjie Dec 16, 2020
16cb9c7
Crontab parsing utilities
benjie Dec 16, 2020
a4cef07
Add @types/node
benjie Dec 16, 2020
60c05f7
Move types to interfaces
benjie Dec 16, 2020
c29edf8
Some infra
benjie Dec 16, 2020
ad01bae
Cron table
benjie Dec 16, 2020
34d0edc
More infra
benjie Dec 18, 2020
4f535df
backfillMinutes, KnownCrontabs
benjie Dec 18, 2020
3cc7551
Backfill period, no exclusion period
benjie Dec 18, 2020
d3f362d
Lots more work
benjie Dec 18, 2020
1ef6332
Handle crontab lines without options/payload
benjie Dec 18, 2020
1d7a254
More verbose logging
benjie Dec 18, 2020
f50a929
Upgrade README
benjie Dec 20, 2020
e71e464
Add notes on limiting backfilling
benjie Dec 20, 2020
6f46053
Allow negative priority, allow slightly more complex ID
benjie Dec 20, 2020
ece3c36
Clean up stopping and releasing of cron worker
benjie Dec 20, 2020
cbfc49c
Tweak TypeScript docs
benjie Dec 20, 2020
22915e5
Make date mutation clearer
benjie Dec 20, 2020
7855f1c
Comments
benjie Dec 20, 2020
e1f8cd8
More comments and minor tweaks
benjie Dec 20, 2020
44f0da8
More comments and fix priority parsing
benjie Dec 20, 2020
c0afe04
Refactor options parser
benjie Dec 20, 2020
5197dcb
Rewrite options to use query string
benjie Dec 20, 2020
87e89bd
Fix error
benjie Dec 20, 2020
d69eccd
Don't wait for next minute to exit
benjie Dec 20, 2020
8dfac3a
There's another migration now
benjie Dec 20, 2020
3742332
README fixes
benjie Dec 20, 2020
d53ebdc
Fix ?fill example
benjie Dec 21, 2020
2941c4d
Merge branch 'main' into cron
benjie Jan 12, 2021
884d72e
Formatting
benjie Jan 12, 2021
b41043f
Don't lint altschema
benjie Jan 12, 2021
bbc935f
Update database schema
benjie Jan 12, 2021
15aa23c
Add note on stability
benjie Jan 12, 2021
5ab5c74
Tweak crontab example
benjie Jan 12, 2021
69896e4
Merge branch 'main' into cron
benjie Jan 12, 2021
7ee616c
Merge branch 'main' into cron
benjie Jan 12, 2021
ae2af78
Merge branch 'main' into cron
benjie Jan 12, 2021
baebdbc
Trim line before parsing/checking for comments
benjie Jan 12, 2021
2dff290
Docs
benjie Jan 12, 2021
06423db
Fix issue with nextTimestamp advancing when clock skew is detected
benjie Jan 12, 2021
31eb0fa
Crontab parsing test
benjie Jan 12, 2021
4fda2a7
Test crontab error handling
benjie Jan 12, 2021
fe2dddc
Newlines shouldn't cause issues
benjie Jan 12, 2021
f6738d1
Helper for constructing CronItems by hand
benjie Jan 12, 2021
b6baf34
Expose parseCrontab function
benjie Jan 12, 2021
9e75af8
Rename CronItem/RawCronItem/etc and rewrite docs/comments for clarity
benjie Jan 13, 2021
72c4da1
Solve race condition in startup/release
benjie Jan 13, 2021
32a62f2
Test job identifier is registered
benjie Jan 13, 2021
cd0ec0a
Test simple backfilling
benjie Jan 13, 2021
215a7f5
Check longer backfill
benjie Jan 13, 2021
55d53f7
Move more helpers to helpers
benjie Jan 14, 2021
aecd43a
Fake timers
benjie Jan 14, 2021
351878c
Stronger mocking of Date (now covers new Date() constructor too)
benjie Jan 14, 2021
8258ca5
Cron test that job is scheduled
benjie Jan 14, 2021
4c77c54
Lint
benjie Jan 14, 2021
0b16662
Manual tweak
benjie Jan 14, 2021
79fd263
Don't use rounded timestamp when detailing clock skew/delay
benjie Jan 15, 2021
61547f9
Better logging for catching up
benjie Jan 15, 2021
4070c24
Tests for clock skew
benjie Jan 15, 2021
97d77bf
Round debug message
benjie Jan 15, 2021
f4ce94d
Typo
benjie Jan 15, 2021
59fe240
Rename function
benjie Jan 15, 2021
3986142
Some minimal validation of parsedCronItems
benjie Jan 15, 2021
ba81759
Finish renaming cronify
benjie Jan 15, 2021
cf2e473
Factor out jest-time-helpers module
benjie Jan 15, 2021
2638061
Improve error messages
benjie Jan 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ https://graphile.org/support/
- Adding jobs to same named queue runs them in series
- Automatically re-attempts failed jobs with exponential back-off
- Customisable retry count (default: 25 attempts over ~3 days)
- Crontab-like scheduling feature for recurring tasks (with optional backfill)
- Task de-duplication via unique `job_key`
- Flexible runtime controls that can be used for complex rate limiting (e.g. via
(graphile-worker-rate-limiter)[https://github.com/politics-rewired/graphile-worker-rate-limiter])
Expand Down Expand Up @@ -1146,6 +1147,201 @@ This method can be used to postpone or advance job execution, or to schedule a
previously failed or permanently failed job for execution. The updated jobs will
be returned (note that this may be fewer jobs than you requested).

## Recurring tasks (crontab)

**Stability: _experimental_**; we may make breaking changes to this
functionality in a minor release, so pay close attention to the changelog when
upgrading.

Graphile Worker supports triggering recurring tasks according to a cron-like
schedule. This is designed for recurring tasks such as sending a weekly email,
running database maintenance tasks every day, performing data roll-ups hourly,
downloading external data every 20 minutes, etc.

Graphile Worker's crontab support:

- guarantees (thanks to ACID-compliant transactions) that no duplicate task
schedules will occur
- can backfill missed jobs if desired (e.g. if the Worker wasn't running when
the job was due to be scheduled)
- schedules tasks using Graphile Worker's regular job queue, so you get all the
regular features such as exponential back-off on failure.

**NOTE**: It is not intended that you add recurring tasks for each of your
individual application users, instead you should have relatively few recurring
tasks, and those tasks can create additional jobs for the individual users (or
process multiple users) if necessary.

Tasks are by default read from a `crontab` file next to the `tasks/` folder (but
this is configurable in library mode). Please note that our syntax is not 100%
compatible with cron's, and our task payload differs. We only handle timestamps
in UTC. The following diagram details the parts of a Graphile Worker crontab
schedule:

```crontab
# ┌───────────── UTC minute (0 - 59)
# │ ┌───────────── UTC hour (0 - 23)
# │ │ ┌───────────── UTC day of the month (1 - 31)
# │ │ │ ┌───────────── UTC month (1 - 12)
# │ │ │ │ ┌───────────── UTC day of the week (0 - 6) (Sunday to Saturday)
# │ │ │ │ │ ┌───────────── task (identifier) to schedule
# │ │ │ │ │ │ ┌────────── optional scheduling options
# │ │ │ │ │ │ │ ┌────── optional payload to merge
# │ │ │ │ │ │ │ │
# │ │ │ │ │ │ │ │
# * * * * * task ?opts {payload}
```

Comment lines start with a `#`.

For the first 5 fields we support an explicit numeric value, `*` to represent
all valid values, `*/n` (where `n` is a positive integer) to represent all valid
values divisible by `n`, range syntax such as `1-5`, and any combination of
these separated by commas.

The task identifier should match the following regexp
`/^[_a-zA-Z][_a-zA-Z0-9:_-]*$/` (namely it should start with an alphabetic
character and it should only contain alphanumeric characters, colon, underscore
and hyphen). It should be the name of one of your Graphile Worker tasks.

The `opts` must always be prefixed with a `?` if provided and details
configuration for the task such as what should be done in the event that the
previous event was not scheduled (e.g. because the Worker wasn't running).
Options are specified using HTTP query string syntax (with `&` separator).

Currently we support the following `opts`:

- `id=UID` where UID is a unique alphanumeric case-sensitive identifier starting
with a letter - specify an identifier for this crontab entry; by default this
will use the task identifier, but if you want more than one schedule for the
same task (e.g. with different payload, or different times) then you will need
to supply a unique identifier explicitly.
- `fill=t` where `t` is a "time phrase" (see below) - backfill any entries from
the last time period `t`, for example if the worker was not running when they
were due to be executed (by default, no backfilling).
- `max=n` where `n` is a small positive integer - override the `max_attempts` of
the job.
- `queue=name` where `name` is an alphanumeric queue name - add the job to a
named queue so it executes serially.
- `priority=n` where `n` is a relatively small integer - override the priority
of the job.

**NOTE**: changing the identifier (e.g. via `id`) can result in duplicate
executions, so we recommend that you explicitly set it and never change it.

**NOTE**: using `fill` will not backfill new tasks, only tasks that were
previously known.

**NOTE**: the higher you set the `fill` parameter, the longer the worker startup
time will be; when used you should set it to be slightly larger than the longest
period of downtime you expect for your worker.

Time phrases are comprised of a sequence of number-letter combinations, where
the number represents a quantity and the letter represents a time period, e.g.
`5d` for `five days`, or `3h` for `three hours`; e.g. `4w3d2h1m` represents
`4 weeks, 3 days, 2 hours and 1 minute` (i.e. a period of 44761 minutes). The
following time periods are supported:

- `s` - one second (1000 milliseconds)
- `m` - one minute (60 seconds)
- `h` - one hour (60 minutes)
- `d` - one day (24 hours)
- `w` - one week (7 days)

The `payload` is a JSON5 object; it must start with a `{`, must not contain
newlines or carriage returns (`\n` or `\r`), and must not contain trailing
whitespace. It will be merged into the default crontab payload properties.

Each crontab job will have a JSON object payload containing the key `_cron` with
the value being an object with the following entries:

- `ts` - ISO8601 timestamp representing when this job was due to execute
- `backfilled` - true if the task was "backfilled" (i.e. it wasn't scheduled on
time), false otherwise

### Crontab examples

The following schedules the `send_weekly_email` task at 4:30am (UTC) every
Monday:

```
30 4 * * 1 send_weekly_email
```

The following does similar, but also will backfill any tasks over the last two
days (`2d`), sets max attempts to `10` and merges in `{"onboarding": false}`
into the task payload:

```
30 4 * * 1 send_weekly_email ?fill=2d&max=10 {onboarding:false}
```

The following triggers the `rollup` task every 4 hours on the hour:

```
0 */4 * * * rollup
```

### Limiting backfill

When you ask Graphile Worker to backfill jobs, it will do so for all jobs
matching that specification that should have been scheduled over the backfill
period. Other than the period itself, you cannot place limits on the backfilling
(for example, you cannot say "backfill at most one job" or "only backfill if the
next job isn't due within the next 3 hours"); this is because we've determined
that there's many situations (back-off, overloaded worker, serially executed
jobs, etc.) in which the result of this behaviour might result in outcomes that
the user did not expect.

If you need these kinds of constraints on backfilled jobs, you should implement
them _at runtime_ (rather than at scheduling time) in the task executor itself,
which could use the `payload._cron.ts` property to determine whether execution
should continue or not.

### Specifying cron items in library mode

You've three options for specifying cron tasks in library mode:

1. `crontab`: a crontab string (like the contents of a crontab file)
2. `crontabFile`: the (string) path to a crontab file, from which to read the
rules
3. `parsedCronItems`: explicit parsed cron items (see below)

#### parsedCronItems

The Graphile Worker internal format for cron items lists all the matching
minutes/hours/etc uniquely and in numerically ascending order. It also has other
requirements and is to be treated as an opaque type, so you must not construct
this value manually.

Instead, you may specify the parsedCronItems using one of the helper functions:

1. `parseCrontab`: pass a crontab string and it will be converted into a list of
`ParsedCronItem`s
2. `parseCronItems`: pass a list of `CronItem`s and it will be converted into a
list of `ParsedCronItem`s

The `CronItem` type is designed to be written by humans (and their scripts) and
has the following properties:

- `task` (required): the string identifier of the task that should be executed
(same as the first argument to `add_job`)
- `pattern` (required): a cron pattern (e.g. `* * * * *`) describing when to run
this task
- `options`: optional options influencing backfilling, etc
- `backfillPeriod`: how long (in milliseconds) to backfill (see above)
- `maxAttempts`: the maximum number of attempts we'll give the job
- `queueName`: if you want the job to run serially, you can add it to a named
queue
- `priority`: optionally override the priority of the job
- `payload`: an optional payload object to merge into the generated payload for
the job
- `identifier`: an optional string to give this cron item a permanent
identifier; if not given we will use the `task`. This is particularly useful
if you want to schedule the same task multiple times, perhaps on different
time patterns or with different payloads or other options (since every cron
item must have a unique identifier).

## Forbidden flags

When a job is created (or updated via `job_key`), you may set its `flags` to a
Expand Down
Loading