Skip to content

Commit

Permalink
Merge branch 'main' into fix-optional-ids-open-api
Browse files Browse the repository at this point in the history
  • Loading branch information
desertaxle authored Dec 24, 2024
2 parents 92ce5dd + 8d506b2 commit bc6428b
Show file tree
Hide file tree
Showing 213 changed files with 7,092 additions and 4,281 deletions.
41 changes: 39 additions & 2 deletions .github/workflows/python-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ jobs:
- name: Server Tests
modules: tests/server/ tests/events/server
- name: Client Tests
modules: tests/ --ignore=tests/server/ --ignore=tests/events/server --ignore=tests/test_task_runners.py --ignore=tests/runner --ignore=tests/workers
modules: tests/ --ignore=tests/typesafety --ignore=tests/server/ --ignore=tests/events/server --ignore=tests/test_task_runners.py --ignore=tests/runner --ignore=tests/workers
- name: Runner and Worker Tests
modules: tests/test_task_runners.py tests/runner tests/workers
database:
Expand Down Expand Up @@ -364,7 +364,7 @@ jobs:
- name: Run tests
run: |
echo "Using COVERAGE_FILE=$COVERAGE_FILE"
pytest tests \
pytest tests --ignore=tests/typesafety \
--numprocesses auto \
--maxprocesses 6 \
--dist worksteal \
Expand Down Expand Up @@ -449,3 +449,40 @@ jobs:
echo "## Coverage Report" >> $GITHUB_STEP_SUMMARY
echo "[Detailed Report](${{ steps.upload_combined_coverage_report.outputs.artifact_url }})" >> $GITHUB_STEP_SUMMARY
coverage report --format=markdown >> $GITHUB_STEP_SUMMARY
run-typesafety-test:
name: typesafety
runs-on: ubuntu-latest

steps:

- uses: actions/checkout@v4
with:
persist-credentials: false
fetch-depth: 0

- name: Set up Python 3.12
uses: actions/setup-python@v5
id: setup_python
with:
python-version: 3.12

- name: UV Cache
# Manually cache the uv cache directory
# until setup-python supports it:
# https://github.com/actions/setup-python/issues/822
uses: actions/cache@v4
id: cache-uv
with:
path: ~/.cache/uv
key: uvcache-${{ runner.os }}-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements-client.txt', 'requirements.txt', 'requirements-dev.txt') }}

- name: Install packages
run: |
python -m pip install -U uv
uv pip install --upgrade --system -e .[dev]
- name: Run tests
run: |
pytest tests/typesafety \
--disable-docker-image-builds
2 changes: 1 addition & 1 deletion .github/workflows/static-analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ jobs:
fetch-depth: 0

- name: Set up uv
uses: astral-sh/setup-uv@v4
uses: astral-sh/setup-uv@v5
with:
python-version: "3.12"

Expand Down
3 changes: 2 additions & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@
"group": "For platform engineers",
"pages": [
"v3/tutorials/platform",
"v3/tutorials/debug"
"v3/tutorials/debug",
"v3/tutorials/alerts"
]
}
],
Expand Down
33 changes: 29 additions & 4 deletions docs/v3/api-ref/rest-api/server/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -939,7 +939,7 @@
"type": "string",
"format": "date-time",
"description": "Only include runs that start or end after this time.",
"default": "0001-01-01T00:00:00",
"default": "0001-01-01T00:00:00+00:00",
"title": "Since"
},
"description": "Only include runs that start or end after this time."
Expand Down Expand Up @@ -20093,8 +20093,15 @@
"Graph": {
"properties": {
"start_time": {
"type": "string",
"format": "date-time",
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": "null"
}
],
"title": "Start Time"
},
"end_time": {
Expand Down Expand Up @@ -20185,7 +20192,14 @@
"title": "Key"
},
"type": {
"type": "string",
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Type"
},
"is_latest": {
Expand Down Expand Up @@ -22100,6 +22114,17 @@
}
],
"title": "Task Parameters Id"
},
"traceparent": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Traceparent"
}
},
"type": "object",
Expand Down
64 changes: 64 additions & 0 deletions docs/v3/automate/events/automations-triggers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -905,6 +905,70 @@ In the above example:
- `user_id_1` creates and then completes an order, triggering a run of our deployment.
- `user_id_2` creates an order, but no completed event is emitted so no deployment is triggered.

### Detect and respond to zombie flows

If the infrastructure a flow is running on suddenly fails (for example, the machine crashes or a container is evicted),
Prefect's orchestration engine will be unable to report state changes and the flow run will get stuck in the running state.

Fortunately, flow runs triggered via deployment can emit heartbeats as they are running, and a Prefect automation can update
a flow run's state to crashed if the server stops receiving heartbeats for that flow run.

<Note>
**Enable flow run heartbeat events**

You will need to ensure you're running Prefect version 3.1.8 or greater and set `PREFECT_RUNNER_HEARTBEAT_FREQUENCY`
to an integer greater than 30 to emit flow run heartbeat events.
</Note>

To create an automation that marks zombie flow runs as crashed, run this script:
```python
from datetime import timedelta
from prefect.automations import Automation
from prefect.client.schemas.objects import StateType
from prefect.events.actions import ChangeFlowRunState
from prefect.events.schemas.automations import EventTrigger, Posture
from prefect.events.schemas.events import ResourceSpecification
my_automation = Automation(
name="Crash zombie flows",
trigger=EventTrigger(
after={"prefect.flow-run.heartbeat"},
expect={
"prefect.flow-run.heartbeat",
"prefect.flow-run.Completed",
"prefect.flow-run.Failed",
"prefect.flow-run.Cancelled",
"prefect.flow-run.Crashed",
},
match=ResourceSpecification({"prefect.resource.id": ["prefect.flow-run.*"]}),
for_each={"prefect.resource.id"},
posture=Posture.Proactive,
threshold=1,
within=timedelta(seconds=90),
),
actions=[
ChangeFlowRunState(
state=StateType.CRASHED,
message="Flow run marked as crashed due to missing heartbeats.",
)
],
)
if __name__ == "__main__":
my_automation.create()
```

The trigger definition says after each heartbeat event for a flow run we expect to see another heartbeat event or a
terminal state event for that same flow run within 90 seconds of a heartbeat event.

If `PREFECT_RUNNER_HEARTBEAT_FREQUENCY` is set to `30`, the automation will trigger only after 3 heartbeats have been missed.
You can adjust `within` in the trigger definition and `PREFECT_RUNNER_HEARTBEAT_FREQUENCY` to change how quickly the automation
will fire after the server stops receiving flow run heartbeats.

You can also add additional actions to your automation to send a notification when zombie runs are detected.

## See also

- To learn more about Prefect events, which can trigger automations, see the [events docs](/v3/automate/events/events/).
Expand Down
2 changes: 1 addition & 1 deletion docs/v3/deploy/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ The best choice depends on your use case.

### Static infrastructure

When you have several flows running regularly, [the `serve` method](/v3/develop/write-flows/#serving-a-flow)
When you have several flows running regularly, [the `serve` method](/v3/deploy/run-flows-in-local-processes#serve-a-flow)
of the `Flow` object or [the `serve` utility](/v3/develop/write-flows/#serving-multiple-flows-at-once)
is a great option for managing multiple flows simultaneously.

Expand Down
103 changes: 103 additions & 0 deletions docs/v3/develop/manage-states.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,109 @@ The final state of a flow or task run depends on a number of factors; generally
- `FAILED`: a run in any `FAILED` state encountered an error during execution, such as a raised exception
- `CRASHED`: a run in any `CRASHED` state was interrupted by an OS signal such as a `KeyboardInterrupt` or `SIGTERM`

The flow of state transitions can be visualized here:

<Note>
States are represented by their name, with boxes behind states clarifying their underlying type. Dotted lines lead to terminal states.
</Note>

```mermaid
%%{
init: {
'theme': 'neutral',
'flowchart': {
'curve' : 'linear',
'rankSpacing': 80,
'nodeSpacing': 70,
'width': 5
}
}
}%%
flowchart TD
%% Style definitions
classDef scheduled fill:#fcd14edb,stroke:#fcd14edb
classDef pending fill:#A99FADdb,stroke:#A99FAD
classDef running fill:#1860f2db,stroke:#1860f2db
classDef paused fill:#a99faddb,stroke:#a99faddb
classDef completed fill:#2ac769db,stroke:#2ac769db,stroke-width:2px
classDef failed fill:#fb4e4ef5,stroke:#fb4e4ef5,stroke-width:2px
classDef crashed fill:#f97316db,stroke:#f97316db,stroke-width:2px
classDef cancelled fill:#3d3d3da8,stroke:#3d3d3da8,stroke-width:2px
classDef awaiting_concurrency_slot fill:#ede7f6,stroke:#4527a0,stroke-width:2px
%% States
subgraph scheduled_type[Scheduled]
Scheduled[Scheduled]:::scheduled
Late[Late]:::scheduled
AwaitingConcurrencySlot[AwaitingConcurrencySlot]:::scheduled
end
Running[Running]:::running
Failed[Failed]:::failed
subgraph scheduled_type2[Scheduled]
AwaitingRetry[Awaiting Retry]:::scheduled
end
subgraph running_type[Running]
Retrying[Retrying]:::running
end
Pending[Pending]:::pending
Cancelling[Cancelling]:::cancelled
Cancelled[Cancelled]:::cancelled
Cached[Cached]:::completed
RolledBack[Rolled Back]:::completed
Crashed[Crashed]:::crashed
Paused[Paused]:::paused
Completed[Completed]:::completed
%% Connections
Scheduled --> |Scheduled start time passes without entering Pending| Late
Scheduled --> |Worker/Runner successfully submits run| Pending
Scheduled --> |Worker encounters concurrency limit| AwaitingConcurrencySlot
AwaitingConcurrencySlot --> Pending
Late --> Pending
Pending --> |Preconditions met| Running
Running -.-> |Success| Completed
%%problematic section:
Retrying -.-> |Success| Completed
Failed --> |Retries remaining| AwaitingRetry
AwaitingRetry --> |Retry attempt| Retrying
Retrying -.-> |Failure| Failed
Running -.-> |Error| Failed
Running -.-> |Infrastructure issue| Crashed
Running -.-> |Cache hit| Cached
Running -.-> |Transaction rollback| RolledBack
Running --> |User cancels| Cancelling
Running --> |Manual pause| Paused
Paused --> |Resume| Running
Cancelling -.-> |Cleanup complete| Cancelled
```

### Task return values

A task will be placed into a `Completed` state if it returns _any_ Python object, with one exception:
Expand Down
36 changes: 36 additions & 0 deletions docs/v3/develop/settings-ref.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,18 @@ The URL of the Prefect API. If not set, the client will attempt to infer it.
**Supported environment variables**:
`PREFECT_API_URL`

### `auth_string`
The auth string used for basic authentication with a self-hosted Prefect API. Should be kept secret.

**Type**: `string | None`

**Default**: `None`

**TOML dotted key path**: `api.auth_string`

**Supported environment variables**:
`PREFECT_API_AUTH_STRING`

### `key`
The API key used for authentication with the Prefect API. Should be kept secret.

Expand Down Expand Up @@ -841,6 +853,18 @@ Number of seconds a runner should wait between queries for scheduled work.
**Supported environment variables**:
`PREFECT_RUNNER_POLL_FREQUENCY`

### `heartbeat_frequency`
Number of seconds a runner should wait between heartbeats for flow runs.

**Type**: `integer | None`

**Default**: `None`

**TOML dotted key path**: `runner.heartbeat_frequency`

**Supported environment variables**:
`PREFECT_RUNNER_HEARTBEAT_FREQUENCY`

### `server`

**Type**: [RunnerServerSettings](#runnerserversettings)
Expand All @@ -850,6 +874,18 @@ Number of seconds a runner should wait between queries for scheduled work.
---
## ServerAPISettings
Settings for controlling API server behavior
### `auth_string`
A string to use for basic authentication with the API; typically in the form 'user:password' but can be any string.

**Type**: `string | None`

**Default**: `None`

**TOML dotted key path**: `server.api.auth_string`

**Supported environment variables**:
`PREFECT_SERVER_API_AUTH_STRING`

### `host`
The API's host address (defaults to `127.0.0.1`).

Expand Down
Loading

0 comments on commit bc6428b

Please sign in to comment.