Skip to content

Commit

Permalink
Merge pull request #1 from UKGovernmentBEIS/craig/docs-migration
Browse files Browse the repository at this point in the history
Initial docs site migration
  • Loading branch information
craigwalton-dsit authored Dec 17, 2024
2 parents 31b9e51 + d9416dc commit 224498d
Show file tree
Hide file tree
Showing 37 changed files with 2,122 additions and 0 deletions.
18 changes: 18 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
A docs site which uses [mkdocs-material](https://squidfunk.github.io/mkdocs-material/).

## Installation

From the `docs` directory:

```bash
poetry install
```

Consider using the recommended [Rewrap](https://stkb.github.io/Rewrap/) extension
(`.vscode/extensions.json`) for VS Code to wrap Markdown text at 88 characters.

## Serve docs

```bash
mkdocs serve
```
1 change: 1 addition & 0 deletions docs/docs/CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
k8s-sandbox.ai-safety-institute.org.uk
Binary file added docs/docs/assets/aisi-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions docs/docs/assets/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.md-header__button.md-logo {
margin-top: 0;
margin-bottom: 0;
padding-top: 0;
padding-bottom: 0;
}

{{/* Enlarge logo in header. */}}
.md-header__button.md-logo img, .md-header__button.md-logo svg {
height: 1.6rem;
}
Binary file added docs/docs/assets/favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/icon-dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/assets/icon-white.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 38 additions & 0 deletions docs/docs/design/complexities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Complexities

## `exec()`

The behaviour of `kubectl exec` is not consistent with that in `docker exec`. Consider
the following command (note: the `k8s_sandbox` package does not actually use `kubectl`,
but it illustrates the point).

```sh
kubectl exec pod -- bash -c "python server.py &"
docker exec container bash -c "python server.py &"
```

Kubernetes won't consider the command completed until the Python process exits, whereas
Docker will consider the command completed as soon as the bash script exits.

More specifically, Kubernetes will wait for the stdout and stderr file descriptors to be
closed (including by any child processes which inherited them).

The `kubectl` command could be re-written like so to make it behave in the same way as
`docker exec`:

```sh
kubectl exec pod -- bash -c "python server.py > /dev/null 2>&1 &"
```

However, we do not have control over the commands which LLMs choose to run, so the
`k8s_sandbox` package attempts to emulate the Docker behaviour (which seems more
intuitive anyway).

See the source code for documentation on how this is achieved.

!!! question "Why not use `tty=True` (`-t`)?"

Whilst this would give us the behaviour we want around commands containing a
backgrounded task (`&`), it means that stderr is redirected to stdout. It also changes
the line endings of the output from `\n` to `\r\n`, which means that the output is not
consistent with output from other sandbox environments like Docker.
15 changes: 15 additions & 0 deletions docs/docs/design/future.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Future Work

## Automatically running from `compose.yaml` files

The typical small-scale community user will likely write agentic evals using the
Docker Compose sandbox environment provider. But there will be appetite within larger
organisations to run these evals in a Kubernetes cluster - either for scalability or
security reasons.

To avoid maintaining both `compose.yaml` and `helm-values.yaml` files, we are
considering writing adding support for generating a `helm-values.yaml` file on the fly
from a `compose.yaml` file. Only very simple `compose.yaml` files would be supported.

Features such as automatically building docker images from a Dockerfile would not be
supported.
126 changes: 126 additions & 0 deletions docs/docs/design/limitations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Limitations

## Containers may restart

Containers may restart during an eval. This can be for several reasons including:

* The container terminates or crashes (PID 1 exited).
* The Pod is killed by Kubernetes (e.g. Out Of Memory).
* The Pod is rescheduled by Kubernetes (e.g. due to node failure or resource
constraints).
* The Pod's [liveness
probes](https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe)
fail.

Allowing containers to restart may be desirable:

* You may not want an agent to be able to deliberately crash its container (`kill 1`) in
order to fail an eval if that would result in retrying the eval.
* If an agent causes your support infrastructure (like a web server) to crash or exceed
memory limits, you may want it to restart.
* Your containers may depend on a certain startup order e.g. a web server assumes it can
connect to a database which hasn't been scheduled or is not ready yet. In which case
you would want the web server to enter a crash backoff loop until the database is
available.

Sometimes, containers restarting is not desirable:

* If state is stored in-memory or on a non-persistent volume, it will be lost. E.g. an
agent starts a long-running background process in its container or a web server stores
session data in-memory.

If the eval attempts to directly interact with a container whilst it is restarting (e.g.
an agent tries to `exec()` a shell command), that sample of the eval will fail with a
suitable exception.

You can reduce the likelihood of Pod eviction by setting the resource limits and
requests of Pods such that you get a `Guaranteed` [QoS
class](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)
which is the case by default in the [built-in Helm
chart](../helm//built-in-chart.md#resource-requests-and-limits).

You can reduce the impact of a container restarting by using persistent volumes.

??? question "Why not use Jobs over StatefulSets?"

Instead of using
[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
or
[Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/),
[Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) could be used
as the workload controller for the underlying Pods. This way, the Pod's
`restartPolicy` can be configured as `Never` and the Job's `backoffLimit` as `0` in
the cases where restarts are not desirable. However, this introduces some
complexities:

1. The `--wait` flag passed to `helm install` does not wait for Pods belonging to
Jobs to be in a Running state. We'd have to implement our own waiting mechanism,
possibly as a Helm post-install hook to avoid coupling the Python code to the Helm
chart.

2. We either need to ask developers to write their images in a way which won't crash
if dependencies are not ready, or provide some way of expressing dependencies
(e.g. a `dependsOn` field in the Helm chart) and ensuring the Pods are started in
that order (e.g. with an init container which queries `kubectl`).

3. The Python code would need a way of periodically checking (e.g. before every
`exec()`) if any Pods in the release are in a failed state and won't be restarted,
then fail that sample of the eval by raising an exception.


What about bare Pods?

When using bare Pods (i.e. not managed by a workload controller),
`helm install --wait` will wait for all Pods to be in a Running state. However, if
a Pod enters a failed state, it will not be restarted and `helm install` will wait
indefinitely.


## Denied network requests hang

Because Cilium simply drops packets for denied network requests, the client will hang
waiting for a response until its timeout is reached. The timeout is dependent on which
tool/client you're using. We recommend any tool calls also pass the `timeout` parameter
in case the model runs a command that doesn't have a built-in timeout.

## Cilium's security measures prevent some exploits

Cilium imposes some sensible network security measures, described on their
[blog](https://cilium.io/blog/2020/06/29/cilium-kubernetes-cni-vulnerability/). Amongst
them is packet spoofing prevention. Any evals (e.g. Cyber misuse) which depend on the
agent spoofing packets may not work.

## The CoreDNS sidecar in the built-in Helm chart will use port 53

Evals which require the use of port 53 (e.g. a Cyber eval with a vulnerable DNS server)
will not work with the built-in Helm chart as each Pod has a CoreDNS sidecar which uses
port 53.

## The `user` parameter to `exec()` is not supported

In Kubernetes, a container runs as a single user. If you need to run commands as
different users, you may have to run the container as root and use a tool like `runuser`
to run commands as different users.

## Images are not automatically built, tagged or pushed

The process of building, tagging and pushing images is left to the user or other tooling
as it is highly dependent on your environment and practices.

## `inspect sandbox cleanup k8s` without specifying an ID is not supported

To avoid potentially removing resources that belong to other users, the `k8s_sandbox`
package will not uninstall every Helm chart in the current namespace. Note that `inspect
sandbox cleanup k8s xxxxxxxx` is supported.

## `TimeoutError` won't be raised on busybox images

The `timeout` binary on busybox images behaves differently, causing a 128 + 15 (SIGTERM)
= 143 exit code rather than a 124 exit code. This will result in a suitable `ExecResult`
being returned rather than raising a `TimeoutError`.

## Service names must be lower case alphanumeric

In the built-in Helm chart, service names (i.e. the keys in the `services` dict) must
match the case-sensitive regex `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` e.g. `my-name` or
`123-abc`. The Helm chart will fail to install if this is not the case.
79 changes: 79 additions & 0 deletions docs/docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Examples

## Hello World

Here's a basic example of an eval which uses the built-in Helm chart and no custom
`values.yaml` files.

```py
from inspect_ai import Task, task
from inspect_ai.dataset import MemoryDataset, Sample
from inspect_ai.scorer import includes
from inspect_ai.solver import generate, use_tools
from inspect_ai.tool import bash


@task
def challenge() -> Task:
samples = [
Sample(
input="Get the OS version codename using `cat /etc/os-release`.",
target="bookworm",
)
]
return Task(
dataset=MemoryDataset(samples=samples),
solver=[
use_tools([bash()]),
generate(),
],
sandbox="k8s",
scorer=includes(),
)
```

If this were in a `task.py` file, run it with `inspect eval task.py`.

## Custom values.yaml

```py
return Task(
...,
sandbox=("k8s", "values.yaml"),
)
```

Assuming you're using the built-in Helm chart, a suitable `values.yaml` file is:

```yaml
default:
image: ubuntu:24.04
command: ["tail", "-f", "/dev/null"]
```
## Additional infrastructure
Again, assuming you're using the built-in Helm chart. The Nginx server will be
addressable at `nginx:80` and `my-web-server.com:80` from any of the containers in your
Helm release.

```py
Sample(
input="Get info on the web server version running at my-web-server.com.",
target="nginx/1.27.0",
)
```

```yaml
default:
image: ubuntu:24.04
command: ["tail", "-f", "/dev/null"]
server:
image: nginx:1.27.0
dnsRecord: true
additionalDnsRecords:
- "my-web-server.com"
readinessProbe:
tcpSocket:
port: 80
```
33 changes: 33 additions & 0 deletions docs/docs/getting-started/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Installation

To make the K8s sandbox environment provider discoverable to Inspect, install this
Python package in your environment.


=== "pip"

```sh
pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
```

=== "poetry"

```sh
poetry add git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
```

=== "uv"

```sh
uv pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
```

Then, pass `"k8s"` as the `sandbox` argument to the Inspect `Task` or `Sample`
constructor.

```py
return Task(
...,
sandbox="k8s",
)
```
65 changes: 65 additions & 0 deletions docs/docs/getting-started/local-cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Local Cluster

If you don't have access to a remote Kubernetes cluster, you can prototype locally using
[minikube](https://minikube.sigs.k8s.io/docs/).

## Dependencies

* [minikube](https://minikube.sigs.k8s.io/docs/)
* [gVisor](https://gvisor.dev/docs/user_guide/install/)
* [Cilium](https://github.com/cilium/cilium-cli)

A minimal setup compatible with the built-in Helm chart can be created as follows:

```sh
minikube start --container-runtime=containerd --addons=gvisor

kubectl apply -f - <<EOF
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: runc
handler: runc
EOF

kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
provisioner: k8s.io/minikube-hostpath
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

cilium install
cilium status --wait
```

The `runc` `RuntimeClass` is required in order to specify a `runtimeClassName` of `runc`
in your `values.yaml` files (even if runc is the cluster's default).

You can see the available container runtime class names with:

```sh
kubectl get runtimeclass
```

The `nfs-csi` `StorageClass` is required in order to use the `volumes` functionality
offered by the built-in Helm chart. It actually uses the `minikube-hostpath`
provisioner.

!!! warning

This is an example setup which is appropriate for development work, but
should not be used long term or in a production setting. For long-term use
you should use a larger, more resilient cluster with separate node groups
for critical services.

If you wish to use images built locally or from a private registry, the quickest
approach may be to manually load them into minikube. There are other methods in the
[minikube documentation](https://minikube.sigs.k8s.io/docs/handbook/pushing/).

```sh
minikube image load <image-name>:<tag>
```
Loading

0 comments on commit 224498d

Please sign in to comment.