Skip to content

Commit

Permalink
Add troubleshooting guide to docs site based on user questions.
Browse files Browse the repository at this point in the history
  • Loading branch information
craigwalton-dsit committed Dec 19, 2024
1 parent f585459 commit 9028af0
Show file tree
Hide file tree
Showing 3 changed files with 80 additions and 5 deletions.
27 changes: 22 additions & 5 deletions docs/docs/tips/debugging-k8s-sandboxes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@ This section explains features of [Inspect](https://inspect.ai-safety-institute.
and [k9s](https://k9scli.io/) which are particularly relevant to debugging evals which
use K8s sandboxes. Please see the dedicated docs pages of each for more information.

## Inspect Log Levels
## Capture Inspect `SANDBOX`-level logs { #sandbox-log-level }

Using `--log-level sandbox` (or setting the `INSPECT_LOG_LEVEL` env var, or passing the
`log_level` argument to `eval()`) when running an Inspect eval will give you good
visibility into the Helm charts being installed, the commands being run within the
containers, and the output of those commands.
Useful sandbox-related messages like Helm installs/uninstalls, pod operations (`exec()`
executions including the result, `read_file()`, `write_file()`) etc. are logged at the
`SANDBOX` log level.

Set Inspect's log level to `SANDBOX` or lower via one of these methods:

* passing `--log-level sandbox` on the command line
* setting `INSPECT_LOG_LEVEL=sandbox` environment variable
* passing the `log_level` argument to `eval()` or `eval_set()`

Example:

Expand All @@ -35,6 +40,18 @@ SANDBOX - K8S: Completed: Execute command in pod. {
}
```

Additionally, ensure the content of the `logging` module is written to a file on disk:

```sh
mkdir -p logs
export INSPECT_PY_LOGGER_FILE="logs/inspect_py_log.log"
```

These will include timestamps and are invaluable when piecing together an ordered
sequence of events.

Consider including the datetime in the log file name to keep logs separate.

## Disabling Inspect Cleanup

By default, Inspect will clean up sandboxes (i.e. uninstall Helm releases) after an eval
Expand Down
57 changes: 57 additions & 0 deletions docs/docs/tips/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Troubleshooting

For general K8s and Inspect sandbox debugging, see the [Debugging K8s
Sandboxes](debugging-k8s-sandboxes.md) guide.

## Capture Inspect `SANDBOX`-level logs

A good starting point to most issues is to capture the output of the Python `logging`
module at `SANDBOX` level. See the [`SANDBOX` log level
section](debugging-k8s-sandboxes.md#sandbox-log-level).

## View cluster events

Certain cluster events may impact your eval, for example, a node failure.

```sh
kubectl get events --sort-by='.metadata.creationTimestamp'
```

To also see timestamps:

```sh
kubectl get events --sort-by='.metadata.creationTimestamp' \
-o custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason,Message:.message
```

To filter to a particular release or pod, either pipe into `grep` or use the
`--field-selector` flag:

```sh
kubectl get events --sort-by='.metadata.creationTimestamp' \
--field-selector involvedObject.name=agent-env-xxxxxxxx-default-0
```

Find the Pod name (including the random 8-character identifier) in the `SANDBOX`-level
logs or the stack trace.

To specify a specific namespace, use the `-n` flag.

## I'm seeing "Helm uninstall failed" errors

These are likely because the Helm chart was never installed. This typically happens if
you cancel an eval, or an eval fails before a certain sample's Helm chart was installed.

Check to see if any Helm releases were left behind:

```sh
helm list
```

## I'm seeing "Handshake status 404 Not Found" errors from Pod operations

This typically indicates that the Pod has been killed. This may be due to cluster issues
(see how to view cluster events above), or because the eval had already failed and the
Helm releases were uninstalled whilst some operations were queued or in flight.

Check the `.json` or `.eval` log produced by Inspect to see the underlying error.
1 change: 1 addition & 0 deletions docs/mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ nav:
- "Examples": examples.md
- "Tips":
- "Debugging K8s Sandboxes": tips/debugging-k8s-sandboxes.md
- "Troubleshooting": tips/troubleshooting.md
- "Docker Images": tips/images.md
- "Hubble (Cilium UI)": tips/hubble.md
- "Concurrency": tips/concurrency.md
Expand Down

0 comments on commit 9028af0

Please sign in to comment.