-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from UKGovernmentBEIS/craig/docs-migration
Initial docs site migration
- Loading branch information
Showing
37 changed files
with
2,122 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
A docs site which uses [mkdocs-material](https://squidfunk.github.io/mkdocs-material/). | ||
|
||
## Installation | ||
|
||
From the `docs` directory: | ||
|
||
```bash | ||
poetry install | ||
``` | ||
|
||
Consider using the recommended [Rewrap](https://stkb.github.io/Rewrap/) extension | ||
(`.vscode/extensions.json`) for VS Code to wrap Markdown text at 88 characters. | ||
|
||
## Serve docs | ||
|
||
```bash | ||
mkdocs serve | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
k8s-sandbox.ai-safety-institute.org.uk |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
.md-header__button.md-logo { | ||
margin-top: 0; | ||
margin-bottom: 0; | ||
padding-top: 0; | ||
padding-bottom: 0; | ||
} | ||
|
||
{{/* Enlarge logo in header. */}} | ||
.md-header__button.md-logo img, .md-header__button.md-logo svg { | ||
height: 1.6rem; | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Complexities | ||
|
||
## `exec()` | ||
|
||
The behaviour of `kubectl exec` is not consistent with that in `docker exec`. Consider | ||
the following command (note: the `k8s_sandbox` package does not actually use `kubectl`, | ||
but it illustrates the point). | ||
|
||
```sh | ||
kubectl exec pod -- bash -c "python server.py &" | ||
docker exec container bash -c "python server.py &" | ||
``` | ||
|
||
Kubernetes won't consider the command completed until the Python process exits, whereas | ||
Docker will consider the command completed as soon as the bash script exits. | ||
|
||
More specifically, Kubernetes will wait for the stdout and stderr file descriptors to be | ||
closed (including by any child processes which inherited them). | ||
|
||
The `kubectl` command could be re-written like so to make it behave in the same way as | ||
`docker exec`: | ||
|
||
```sh | ||
kubectl exec pod -- bash -c "python server.py > /dev/null 2>&1 &" | ||
``` | ||
|
||
However, we do not have control over the commands which LLMs choose to run, so the | ||
`k8s_sandbox` package attempts to emulate the Docker behaviour (which seems more | ||
intuitive anyway). | ||
|
||
See the source code for documentation on how this is achieved. | ||
|
||
!!! question "Why not use `tty=True` (`-t`)?" | ||
|
||
Whilst this would give us the behaviour we want around commands containing a | ||
backgrounded task (`&`), it means that stderr is redirected to stdout. It also changes | ||
the line endings of the output from `\n` to `\r\n`, which means that the output is not | ||
consistent with output from other sandbox environments like Docker. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Future Work | ||
|
||
## Automatically running from `compose.yaml` files | ||
|
||
The typical small-scale community user will likely write agentic evals using the | ||
Docker Compose sandbox environment provider. But there will be appetite within larger | ||
organisations to run these evals in a Kubernetes cluster - either for scalability or | ||
security reasons. | ||
|
||
To avoid maintaining both `compose.yaml` and `helm-values.yaml` files, we are | ||
considering writing adding support for generating a `helm-values.yaml` file on the fly | ||
from a `compose.yaml` file. Only very simple `compose.yaml` files would be supported. | ||
|
||
Features such as automatically building docker images from a Dockerfile would not be | ||
supported. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# Limitations | ||
|
||
## Containers may restart | ||
|
||
Containers may restart during an eval. This can be for several reasons including: | ||
|
||
* The container terminates or crashes (PID 1 exited). | ||
* The Pod is killed by Kubernetes (e.g. Out Of Memory). | ||
* The Pod is rescheduled by Kubernetes (e.g. due to node failure or resource | ||
constraints). | ||
* The Pod's [liveness | ||
probes](https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe) | ||
fail. | ||
|
||
Allowing containers to restart may be desirable: | ||
|
||
* You may not want an agent to be able to deliberately crash its container (`kill 1`) in | ||
order to fail an eval if that would result in retrying the eval. | ||
* If an agent causes your support infrastructure (like a web server) to crash or exceed | ||
memory limits, you may want it to restart. | ||
* Your containers may depend on a certain startup order e.g. a web server assumes it can | ||
connect to a database which hasn't been scheduled or is not ready yet. In which case | ||
you would want the web server to enter a crash backoff loop until the database is | ||
available. | ||
|
||
Sometimes, containers restarting is not desirable: | ||
|
||
* If state is stored in-memory or on a non-persistent volume, it will be lost. E.g. an | ||
agent starts a long-running background process in its container or a web server stores | ||
session data in-memory. | ||
|
||
If the eval attempts to directly interact with a container whilst it is restarting (e.g. | ||
an agent tries to `exec()` a shell command), that sample of the eval will fail with a | ||
suitable exception. | ||
|
||
You can reduce the likelihood of Pod eviction by setting the resource limits and | ||
requests of Pods such that you get a `Guaranteed` [QoS | ||
class](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) | ||
which is the case by default in the [built-in Helm | ||
chart](../helm//built-in-chart.md#resource-requests-and-limits). | ||
|
||
You can reduce the impact of a container restarting by using persistent volumes. | ||
|
||
??? question "Why not use Jobs over StatefulSets?" | ||
|
||
Instead of using | ||
[StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) | ||
or | ||
[Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), | ||
[Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) could be used | ||
as the workload controller for the underlying Pods. This way, the Pod's | ||
`restartPolicy` can be configured as `Never` and the Job's `backoffLimit` as `0` in | ||
the cases where restarts are not desirable. However, this introduces some | ||
complexities: | ||
|
||
1. The `--wait` flag passed to `helm install` does not wait for Pods belonging to | ||
Jobs to be in a Running state. We'd have to implement our own waiting mechanism, | ||
possibly as a Helm post-install hook to avoid coupling the Python code to the Helm | ||
chart. | ||
|
||
2. We either need to ask developers to write their images in a way which won't crash | ||
if dependencies are not ready, or provide some way of expressing dependencies | ||
(e.g. a `dependsOn` field in the Helm chart) and ensuring the Pods are started in | ||
that order (e.g. with an init container which queries `kubectl`). | ||
|
||
3. The Python code would need a way of periodically checking (e.g. before every | ||
`exec()`) if any Pods in the release are in a failed state and won't be restarted, | ||
then fail that sample of the eval by raising an exception. | ||
|
||
|
||
What about bare Pods? | ||
|
||
When using bare Pods (i.e. not managed by a workload controller), | ||
`helm install --wait` will wait for all Pods to be in a Running state. However, if | ||
a Pod enters a failed state, it will not be restarted and `helm install` will wait | ||
indefinitely. | ||
|
||
|
||
## Denied network requests hang | ||
|
||
Because Cilium simply drops packets for denied network requests, the client will hang | ||
waiting for a response until its timeout is reached. The timeout is dependent on which | ||
tool/client you're using. We recommend any tool calls also pass the `timeout` parameter | ||
in case the model runs a command that doesn't have a built-in timeout. | ||
|
||
## Cilium's security measures prevent some exploits | ||
|
||
Cilium imposes some sensible network security measures, described on their | ||
[blog](https://cilium.io/blog/2020/06/29/cilium-kubernetes-cni-vulnerability/). Amongst | ||
them is packet spoofing prevention. Any evals (e.g. Cyber misuse) which depend on the | ||
agent spoofing packets may not work. | ||
|
||
## The CoreDNS sidecar in the built-in Helm chart will use port 53 | ||
|
||
Evals which require the use of port 53 (e.g. a Cyber eval with a vulnerable DNS server) | ||
will not work with the built-in Helm chart as each Pod has a CoreDNS sidecar which uses | ||
port 53. | ||
|
||
## The `user` parameter to `exec()` is not supported | ||
|
||
In Kubernetes, a container runs as a single user. If you need to run commands as | ||
different users, you may have to run the container as root and use a tool like `runuser` | ||
to run commands as different users. | ||
|
||
## Images are not automatically built, tagged or pushed | ||
|
||
The process of building, tagging and pushing images is left to the user or other tooling | ||
as it is highly dependent on your environment and practices. | ||
|
||
## `inspect sandbox cleanup k8s` without specifying an ID is not supported | ||
|
||
To avoid potentially removing resources that belong to other users, the `k8s_sandbox` | ||
package will not uninstall every Helm chart in the current namespace. Note that `inspect | ||
sandbox cleanup k8s xxxxxxxx` is supported. | ||
|
||
## `TimeoutError` won't be raised on busybox images | ||
|
||
The `timeout` binary on busybox images behaves differently, causing a 128 + 15 (SIGTERM) | ||
= 143 exit code rather than a 124 exit code. This will result in a suitable `ExecResult` | ||
being returned rather than raising a `TimeoutError`. | ||
|
||
## Service names must be lower case alphanumeric | ||
|
||
In the built-in Helm chart, service names (i.e. the keys in the `services` dict) must | ||
match the case-sensitive regex `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` e.g. `my-name` or | ||
`123-abc`. The Helm chart will fail to install if this is not the case. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Examples | ||
|
||
## Hello World | ||
|
||
Here's a basic example of an eval which uses the built-in Helm chart and no custom | ||
`values.yaml` files. | ||
|
||
```py | ||
from inspect_ai import Task, task | ||
from inspect_ai.dataset import MemoryDataset, Sample | ||
from inspect_ai.scorer import includes | ||
from inspect_ai.solver import generate, use_tools | ||
from inspect_ai.tool import bash | ||
|
||
|
||
@task | ||
def challenge() -> Task: | ||
samples = [ | ||
Sample( | ||
input="Get the OS version codename using `cat /etc/os-release`.", | ||
target="bookworm", | ||
) | ||
] | ||
return Task( | ||
dataset=MemoryDataset(samples=samples), | ||
solver=[ | ||
use_tools([bash()]), | ||
generate(), | ||
], | ||
sandbox="k8s", | ||
scorer=includes(), | ||
) | ||
``` | ||
|
||
If this were in a `task.py` file, run it with `inspect eval task.py`. | ||
|
||
## Custom values.yaml | ||
|
||
```py | ||
return Task( | ||
..., | ||
sandbox=("k8s", "values.yaml"), | ||
) | ||
``` | ||
|
||
Assuming you're using the built-in Helm chart, a suitable `values.yaml` file is: | ||
|
||
```yaml | ||
default: | ||
image: ubuntu:24.04 | ||
command: ["tail", "-f", "/dev/null"] | ||
``` | ||
## Additional infrastructure | ||
Again, assuming you're using the built-in Helm chart. The Nginx server will be | ||
addressable at `nginx:80` and `my-web-server.com:80` from any of the containers in your | ||
Helm release. | ||
|
||
```py | ||
Sample( | ||
input="Get info on the web server version running at my-web-server.com.", | ||
target="nginx/1.27.0", | ||
) | ||
``` | ||
|
||
```yaml | ||
default: | ||
image: ubuntu:24.04 | ||
command: ["tail", "-f", "/dev/null"] | ||
server: | ||
image: nginx:1.27.0 | ||
dnsRecord: true | ||
additionalDnsRecords: | ||
- "my-web-server.com" | ||
readinessProbe: | ||
tcpSocket: | ||
port: 80 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Installation | ||
|
||
To make the K8s sandbox environment provider discoverable to Inspect, install this | ||
Python package in your environment. | ||
|
||
|
||
=== "pip" | ||
|
||
```sh | ||
pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git | ||
``` | ||
|
||
=== "poetry" | ||
|
||
```sh | ||
poetry add git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git | ||
``` | ||
|
||
=== "uv" | ||
|
||
```sh | ||
uv pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git | ||
``` | ||
|
||
Then, pass `"k8s"` as the `sandbox` argument to the Inspect `Task` or `Sample` | ||
constructor. | ||
|
||
```py | ||
return Task( | ||
..., | ||
sandbox="k8s", | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Local Cluster | ||
|
||
If you don't have access to a remote Kubernetes cluster, you can prototype locally using | ||
[minikube](https://minikube.sigs.k8s.io/docs/). | ||
|
||
## Dependencies | ||
|
||
* [minikube](https://minikube.sigs.k8s.io/docs/) | ||
* [gVisor](https://gvisor.dev/docs/user_guide/install/) | ||
* [Cilium](https://github.com/cilium/cilium-cli) | ||
|
||
A minimal setup compatible with the built-in Helm chart can be created as follows: | ||
|
||
```sh | ||
minikube start --container-runtime=containerd --addons=gvisor | ||
|
||
kubectl apply -f - <<EOF | ||
apiVersion: node.k8s.io/v1 | ||
kind: RuntimeClass | ||
metadata: | ||
name: runc | ||
handler: runc | ||
EOF | ||
|
||
kubectl apply -f - <<EOF | ||
apiVersion: storage.k8s.io/v1 | ||
kind: StorageClass | ||
metadata: | ||
name: nfs-csi | ||
provisioner: k8s.io/minikube-hostpath | ||
reclaimPolicy: Delete | ||
volumeBindingMode: Immediate | ||
EOF | ||
|
||
cilium install | ||
cilium status --wait | ||
``` | ||
|
||
The `runc` `RuntimeClass` is required in order to specify a `runtimeClassName` of `runc` | ||
in your `values.yaml` files (even if runc is the cluster's default). | ||
|
||
You can see the available container runtime class names with: | ||
|
||
```sh | ||
kubectl get runtimeclass | ||
``` | ||
|
||
The `nfs-csi` `StorageClass` is required in order to use the `volumes` functionality | ||
offered by the built-in Helm chart. It actually uses the `minikube-hostpath` | ||
provisioner. | ||
|
||
!!! warning | ||
|
||
This is an example setup which is appropriate for development work, but | ||
should not be used long term or in a production setting. For long-term use | ||
you should use a larger, more resilient cluster with separate node groups | ||
for critical services. | ||
|
||
If you wish to use images built locally or from a private registry, the quickest | ||
approach may be to manually load them into minikube. There are other methods in the | ||
[minikube documentation](https://minikube.sigs.k8s.io/docs/handbook/pushing/). | ||
|
||
```sh | ||
minikube image load <image-name>:<tag> | ||
``` |
Oops, something went wrong.