Merge pull request #1 from UKGovernmentBEIS/craig/docs-migration

Initial docs site migration
UKGovernmentBEIS · Dec 17, 2024 · 224498d · 224498d
2 parents 31b9e51 + d9416dc
commit 224498d
Show file tree

Hide file tree

Showing 37 changed files with 2,122 additions and 0 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,18 @@
+A docs site which uses [mkdocs-material](https://squidfunk.github.io/mkdocs-material/).
+
+## Installation
+
+From the `docs` directory:
+
+```bash
+poetry install
+```
+
+Consider using the recommended [Rewrap](https://stkb.github.io/Rewrap/) extension
+(`.vscode/extensions.json`) for VS Code to wrap Markdown text at 88 characters.
+
+## Serve docs
+
+```bash
+mkdocs serve
+```
diff --git a/docs/docs/CNAME b/docs/docs/CNAME
@@ -0,0 +1 @@
+k8s-sandbox.ai-safety-institute.org.uk
diff --git a/docs/docs/assets/aisi-logo.png b/docs/docs/assets/aisi-logo.png
diff --git a/docs/docs/assets/extra.css b/docs/docs/assets/extra.css
@@ -0,0 +1,11 @@
+.md-header__button.md-logo {
+  margin-top: 0;
+  margin-bottom: 0;
+  padding-top: 0;
+  padding-bottom: 0;
+}
+
+{{/* Enlarge logo in header. */}}
+.md-header__button.md-logo img, .md-header__button.md-logo svg {
+    height: 1.6rem;
+}
diff --git a/docs/docs/assets/favicon.png b/docs/docs/assets/favicon.png
diff --git a/docs/docs/assets/icon-dark.png b/docs/docs/assets/icon-dark.png
diff --git a/docs/docs/assets/icon-white.png b/docs/docs/assets/icon-white.png
diff --git a/docs/docs/design/complexities.md b/docs/docs/design/complexities.md
@@ -0,0 +1,38 @@
+# Complexities
+
+## `exec()`
+
+The behaviour of `kubectl exec` is not consistent with that in `docker exec`. Consider
+the following command (note: the `k8s_sandbox` package does not actually use `kubectl`,
+but it illustrates the point).
+
+```sh
+kubectl exec pod   -- bash -c "python server.py &"
+docker exec container bash -c "python server.py &"
+```
+
+Kubernetes won't consider the command completed until the Python process exits, whereas
+Docker will consider the command completed as soon as the bash script exits.
+
+More specifically, Kubernetes will wait for the stdout and stderr file descriptors to be
+closed (including by any child processes which inherited them).
+
+The `kubectl` command could be re-written like so to make it behave in the same way as
+`docker exec`:
+
+```sh
+kubectl exec pod -- bash -c "python server.py > /dev/null 2>&1 &"
+```
+
+However, we do not have control over the commands which LLMs choose to run, so the
+`k8s_sandbox` package attempts to emulate the Docker behaviour (which seems more
+intuitive anyway).
+
+See the source code for documentation on how this is achieved.
+
+!!! question "Why not use `tty=True` (`-t`)?"
+
+    Whilst this would give us the behaviour we want around commands containing a
+    backgrounded task (`&`), it means that stderr is redirected to stdout. It also changes
+    the line endings of the output from `\n` to `\r\n`, which means that the output is not
+    consistent with output from other sandbox environments like Docker.
diff --git a/docs/docs/design/future.md b/docs/docs/design/future.md
@@ -0,0 +1,15 @@
+# Future Work
+
+## Automatically running from `compose.yaml` files
+
+The typical small-scale community user will likely write agentic evals using the
+Docker Compose sandbox environment provider. But there will be appetite within larger
+organisations to run these evals in a Kubernetes cluster - either for scalability or
+security reasons.
+
+To avoid maintaining both `compose.yaml` and `helm-values.yaml` files, we are
+considering writing adding support for generating a `helm-values.yaml` file on the fly
+from a `compose.yaml` file. Only very simple `compose.yaml` files would be supported.
+
+Features such as automatically building docker images from a Dockerfile would not be
+supported.
diff --git a/docs/docs/design/limitations.md b/docs/docs/design/limitations.md
@@ -0,0 +1,126 @@
+# Limitations
+
+## Containers may restart
+
+Containers may restart during an eval. This can be for several reasons including:
+
+* The container terminates or crashes (PID 1 exited).
+* The Pod is killed by Kubernetes (e.g. Out Of Memory).
+* The Pod is rescheduled by Kubernetes (e.g. due to node failure or resource
+  constraints).
+* The Pod's [liveness
+  probes](https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/#liveness-probe)
+  fail.
+
+Allowing containers to restart may be desirable:
+
+* You may not want an agent to be able to deliberately crash its container (`kill 1`) in
+  order to fail an eval if that would result in retrying the eval.
+* If an agent causes your support infrastructure (like a web server) to crash or exceed
+  memory limits, you may want it to restart.
+* Your containers may depend on a certain startup order e.g. a web server assumes it can
+  connect to a database which hasn't been scheduled or is not ready yet. In which case
+  you would want the web server to enter a crash backoff loop until the database is
+  available.
+
+Sometimes, containers restarting is not desirable:
+
+* If state is stored in-memory or on a non-persistent volume, it will be lost. E.g. an
+  agent starts a long-running background process in its container or a web server stores
+  session data in-memory.
+
+If the eval attempts to directly interact with a container whilst it is restarting (e.g.
+an agent tries to `exec()` a shell command), that sample of the eval will fail with a
+suitable exception.
+
+You can reduce the likelihood of Pod eviction by setting the resource limits and
+requests of Pods such that you get a `Guaranteed` [QoS
+class](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)
+which is the case by default in the [built-in Helm
+chart](../helm//built-in-chart.md#resource-requests-and-limits).
+
+You can reduce the impact of a container restarting by using persistent volumes.
+
+??? question "Why not use Jobs over StatefulSets?"
+
+    Instead of using
+    [StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
+    or
+    [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/),
+    [Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) could be used
+    as the workload controller for the underlying Pods. This way, the Pod's
+    `restartPolicy` can be configured as `Never` and the Job's `backoffLimit` as `0` in
+    the cases where restarts are not desirable. However, this introduces some
+    complexities:
+
+    1. The `--wait` flag passed to `helm install` does not wait for Pods belonging to
+    Jobs to be in a Running state. We'd have to implement our own waiting mechanism,
+    possibly as a Helm post-install hook to avoid coupling the Python code to the Helm
+    chart.
+
+    2. We either need to ask developers to write their images in a way which won't crash
+    if dependencies are not ready, or provide some way of expressing dependencies
+    (e.g. a `dependsOn` field in the Helm chart) and ensuring the Pods are started in
+    that order (e.g. with an init container which queries `kubectl`).
+
+    3. The Python code would need a way of periodically checking (e.g. before every
+    `exec()`) if any Pods in the release are in a failed state and won't be restarted,
+    then fail that sample of the eval by raising an exception.
+
+
+    What about bare Pods?
+
+    When using bare Pods (i.e. not managed by a workload controller),
+    `helm install --wait` will wait for all Pods to be in a Running state. However, if
+    a Pod enters a failed state, it will not be restarted and `helm install` will wait
+    indefinitely.
+
+
+## Denied network requests hang
+
+Because Cilium simply drops packets for denied network requests, the client will hang
+waiting for a response until its timeout is reached. The timeout is dependent on which
+tool/client you're using. We recommend any tool calls also pass the `timeout` parameter
+in case the model runs a command that doesn't have a built-in timeout.
+
+## Cilium's security measures prevent some exploits
+
+Cilium imposes some sensible network security measures, described on their
+[blog](https://cilium.io/blog/2020/06/29/cilium-kubernetes-cni-vulnerability/). Amongst
+them is packet spoofing prevention. Any evals (e.g. Cyber misuse) which depend on the
+agent spoofing packets may not work.
+
+## The CoreDNS sidecar in the built-in Helm chart will use port 53
+
+Evals which require the use of port 53 (e.g. a Cyber eval with a vulnerable DNS server)
+will not work with the built-in Helm chart as each Pod has a CoreDNS sidecar which uses
+port 53.
+
+## The `user` parameter to `exec()` is not supported
+
+In Kubernetes, a container runs as a single user. If you need to run commands as
+different users, you may have to run the container as root and use a tool like `runuser`
+to run commands as different users.
+
+## Images are not automatically built, tagged or pushed
+
+The process of building, tagging and pushing images is left to the user or other tooling
+as it is highly dependent on your environment and practices.
+
+## `inspect sandbox cleanup k8s` without specifying an ID is not supported
+
+To avoid potentially removing resources that belong to other users, the `k8s_sandbox`
+package will not uninstall every Helm chart in the current namespace. Note that `inspect
+sandbox cleanup k8s xxxxxxxx` is supported.
+
+## `TimeoutError` won't be raised on busybox images
+
+The `timeout` binary on busybox images behaves differently, causing a 128 + 15 (SIGTERM)
+= 143 exit code rather than a 124 exit code. This will result in a suitable `ExecResult`
+being returned rather than raising a `TimeoutError`.
+
+## Service names must be lower case alphanumeric
+
+In the built-in Helm chart, service names (i.e. the keys in the `services` dict) must
+match the case-sensitive regex `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` e.g. `my-name` or
+`123-abc`. The Helm chart will fail to install if this is not the case.
diff --git a/docs/docs/examples.md b/docs/docs/examples.md
@@ -0,0 +1,79 @@
+# Examples
+
+## Hello World
+
+Here's a basic example of an eval which uses the built-in Helm chart and no custom
+`values.yaml` files.
+
+```py
+from inspect_ai import Task, task
+from inspect_ai.dataset import MemoryDataset, Sample
+from inspect_ai.scorer import includes
+from inspect_ai.solver import generate, use_tools
+from inspect_ai.tool import bash
+
+
+@task
+def challenge() -> Task:
+    samples = [
+        Sample(
+            input="Get the OS version codename using `cat /etc/os-release`.",
+            target="bookworm",
+        )
+    ]
+    return Task(
+        dataset=MemoryDataset(samples=samples),
+        solver=[
+            use_tools([bash()]),
+            generate(),
+        ],
+        sandbox="k8s",
+        scorer=includes(),
+    )
+```
+
+If this were in a `task.py` file, run it with `inspect eval task.py`.
+
+## Custom values.yaml
+
+```py
+return Task(
+    ...,
+    sandbox=("k8s", "values.yaml"),
+)
+```
+
+Assuming you're using the built-in Helm chart, a suitable `values.yaml` file is:
+
+```yaml
+default:
+  image: ubuntu:24.04
+  command: ["tail", "-f", "/dev/null"]
+```
+
+## Additional infrastructure
+
+Again, assuming you're using the built-in Helm chart. The Nginx server will be
+addressable at `nginx:80` and `my-web-server.com:80` from any of the containers in your
+Helm release.
+
+```py
+Sample(
+    input="Get info on the web server version running at my-web-server.com.",
+    target="nginx/1.27.0",
+)
+```
+
+```yaml
+default:
+  image: ubuntu:24.04
+  command: ["tail", "-f", "/dev/null"]
+server:
+  image: nginx:1.27.0
+  dnsRecord: true
+  additionalDnsRecords:
+    - "my-web-server.com"
+  readinessProbe:
+    tcpSocket:
+      port: 80
+```
diff --git a/docs/docs/getting-started/installation.md b/docs/docs/getting-started/installation.md
@@ -0,0 +1,33 @@
+# Installation
+
+To make the K8s sandbox environment provider discoverable to Inspect, install this
+Python package in your environment.
+
+
+=== "pip"
+
+    ```sh
+    pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
+    ```
+
+=== "poetry"
+
+    ```sh
+    poetry add git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
+    ```
+
+=== "uv"
+
+    ```sh
+    uv pip install git+https://github.com/UKGovernmentBEIS/inspect_k8s_sandbox.git
+    ```
+
+Then, pass `"k8s"` as the `sandbox` argument to the Inspect `Task` or `Sample`
+constructor.
+
+```py
+return Task(
+    ...,
+    sandbox="k8s",
+)
+```
diff --git a/docs/docs/getting-started/local-cluster.md b/docs/docs/getting-started/local-cluster.md
@@ -0,0 +1,65 @@
+# Local Cluster
+
+If you don't have access to a remote Kubernetes cluster, you can prototype locally using
+[minikube](https://minikube.sigs.k8s.io/docs/).
+
+## Dependencies
+
+* [minikube](https://minikube.sigs.k8s.io/docs/)
+* [gVisor](https://gvisor.dev/docs/user_guide/install/)
+* [Cilium](https://github.com/cilium/cilium-cli)
+
+A minimal setup compatible with the built-in Helm chart can be created as follows:
+
+```sh
+minikube start --container-runtime=containerd --addons=gvisor
+
+kubectl apply -f - <<EOF
+apiVersion: node.k8s.io/v1
+kind: RuntimeClass
+metadata:
+  name: runc
+handler: runc
+EOF
+
+kubectl apply -f - <<EOF
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: nfs-csi
+provisioner: k8s.io/minikube-hostpath
+reclaimPolicy: Delete
+volumeBindingMode: Immediate
+EOF
+
+cilium install
+cilium status --wait
+```
+
+The `runc` `RuntimeClass` is required in order to specify a `runtimeClassName` of `runc`
+in your `values.yaml` files (even if runc is the cluster's default).
+
+You can see the available container runtime class names with:
+
+```sh
+kubectl get runtimeclass
+```
+
+The `nfs-csi` `StorageClass` is required in order to use the `volumes` functionality
+offered by the built-in Helm chart. It actually uses the `minikube-hostpath`
+provisioner.
+
+!!! warning
+
+    This is an example setup which is appropriate for development work, but
+    should not be used long term or in a production setting. For long-term use
+    you should use a larger, more resilient cluster with separate node groups
+    for critical services.
+
+If you wish to use images built locally or from a private registry, the quickest
+approach may be to manually load them into minikube. There are other methods in the
+[minikube documentation](https://minikube.sigs.k8s.io/docs/handbook/pushing/).
+
+```sh
+minikube image load <image-name>:<tag>
+```