From f68162ad76652c2cffd03236a0ed7872dca93889 Mon Sep 17 00:00:00 2001 From: Mo Zhou Date: Tue, 10 Dec 2024 11:29:33 +0800 Subject: [PATCH] docs: improve `code sandbox` documentation --- docs/explanation/code-sandbox.md | 190 ++++++++++++++++++++++++- docs/howto/incluster-code-execution.md | 16 --- 2 files changed, 189 insertions(+), 17 deletions(-) diff --git a/docs/explanation/code-sandbox.md b/docs/explanation/code-sandbox.md index be89942..b469b17 100644 --- a/docs/explanation/code-sandbox.md +++ b/docs/explanation/code-sandbox.md @@ -1,3 +1,191 @@ # Code Sandbox - +`tablegpt-agent` directs `tablegpt` to generate Python code for data analysis. However, the generated code may contain potential vulnerabilities or unexpected errors. Running such code directly in a production environment could threaten the system's stability and security. + +`Code Sandbox` is designed to address this challenge. By leveraging sandbox technology, it confines code execution to a controlled environment, effectively preventing malicious or unexpected behaviors from impacting the main system. This provides an isolated and reliable space for running code safely. + +`Code Sandbox` built on the [pybox](https://github.com/edwardzjl/pybox) library and supports three main execution modes: + +- **Local Environment**: Executes code in a local sandbox for quick *deployment* and *validation*. +- **Remote Environment**: Create remote environments through `Jupyter Enterprise Gateway` to achieve shared computing. +- **Cluster Environment**: Bypassing the need for proxy services such as `Jupyter Enterprise Gateway` by communicating directly with kernel pods. + +Code Sandbox is designed based on the following key principles: + +- **Security**: Limits code access using sandbox technology to ensure a safe and reliable execution environment. +- **Isolation**: Provides independent execution environments for each task, ensuring strict separation of resources and data. +- **Scalability**: Adapts to diverse computing environments, from local setups to Kubernetes clusters, supporting dynamic resource allocation and efficient task execution. + + +## Local Environment + +In a local environment, Code Sandbox utilizes the `pybox` library to create and manage sandbox environments, providing a secure code execution platform. By isolating code execution from the host system's resources and imposing strict permission controls, it ensures safety and reliability. This approach is especially suitable for **development** and **debugging** scenarios. + +If you want to run `tablegpt-agent` in a local environment, you can enable the **local mode**. Below are the installation steps and a detailed operation guide. + +### Installing + +To use `tablegpt-agent` in local mode, install the library with the following command: + +```sh +pip install tablegpt-agent[local] +``` + +### Configuring + +`tablegpt-agent` comes with several built-in features, such as auxiliary methods for data analysis and support for displaying Chinese fonts. **These features are automatically added to the sandbox environment by default**. If you need advanced customization (e.g., adding specific methods or fonts), refer to the [TableGPT IPython Kernel Configuration Documentation](https://github.com/tablegpt/tablegpt-agent/tree/main/ipython) for further guidance. + +### Creating and Running + +The following code demonstrates how to use the pybox library to set up a sandbox, execute code, and retrieve results in a local environment: + +```python +from uuid import uuid4 +from pybox import LocalPyBoxManager, PyBoxOut + +# Initialize the local sandbox manager +pybox_manager = LocalPyBoxManager() + +# Assign a unique Kernel ID for the sandbox +kernel_id = str(uuid4()) + +# Start the sandbox environment +box = pybox_manager.start(kernel_id) + +# Define the test code to execute +test_code = """ +import math +result = math.sqrt(16) +result +""" + +# Run the code in the sandbox +out: PyBoxOut = box.run(code=test_code) + +# Print the execution result +print(out) +``` + +### Example Output + +After running the above code, the system will return the following output, indicating successful execution with no errors: +```text +data=[{'text/plain': '4.0'}] error=None +``` + +With `Code Sandbox` in local execution mode, developers can enjoy the safety of sandbox isolation at minimal cost while maintaining flexibility and efficiency. This lays a solid foundation for more complex remote or cluster-based scenarios. + + +## Remote Environment + +In a remote environment, `Code Sandbox` uses the `pybox` library and its `RemotePyBoxManager` to create and manage sandbox environments. The remote mode relies on the [Enterprise Gateway](https://github.com/jupyter-server/enterprise_gateway) service to dynamically create and execute remote sandboxes. This mode allows multiple services to connect to the same remote environment, enabling shared access to resources. + +### Configuring + +If `tablegpt-agent` is used in **remote mode**, the first step is to start the `enterprise_gateway` service. You can refer to the [Enterprise Gateway Deployment Guide](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/operators/index.html#deploying-enterprise-gateway) for detailed instructions on configuring and starting the service. + +Once the service is up and running, ensure that the service address is accessible. For example, assume the `enterprise_gateway` service is available at `http://example.com`. + +### Creating and Running + +The following code demonstrates how to create a remote sandbox using `RemotePyBoxManager` and execute code within it: + +```python +from uuid import uuid4 +from pybox import RemotePyBoxManager, PyBoxOut + +# Initialize the remote sandbox manager, replacing with the actual Enterprise Gateway service address +pybox_manager = RemotePyBoxManager(host="http://example.com") + +# Assign a unique Kernel ID +kernel_id = str(uuid4()) + +# Start the remote sandbox environment +box = pybox_manager.start(kernel_id) + +# Define the test code +test_code = """ +import math +result = math.sqrt(16) +result +""" + +# Run the code in the sandbox +out: PyBoxOut = box.run(code=test_code) + +# Print the execution result +print(out) +``` + +### Example Output + +After executing the above code, the system will return the following output, indicating successful execution without any errors: + +```plaintext +data=[{'text/plain': '4.0'}] error=None +``` + +### Advanced Environment Configuration + +The `RemotePyBoxManager` provides the following advanced configuration options to allow for flexible customization of the sandbox execution environment: + +1. **`env_file`**: Allows you to load environment variables from a file to configure the remote sandbox. +2. **`kernel_env`**: Enables you to pass environment variables directly as key-value pairs, simplifying the setup process. + +To learn more about the parameters and configuration options, refer to the [Kernel Environment Variables](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/users/kernel-envs.html) documentation. + + +## Cluster Environment + +In a Kubernetes cluster, `Code Sandbox` leverages the `KubePyBoxManager` provided by the `pybox` library to create and manage sandboxes. Unlike the `remote environment`, the cluster environment **communicates directly with Kernel Pods** created by the [Jupyter Kernel Controller](https://github.com/edwardzjl/jupyter-kernel-controller), eliminating the need for an intermediary service like `Enterprise Gateway`. + +### Configuring + +Before using the cluster environment, you need to deploy the `jupyter-kernel-controller` service. You can quickly create the required CRDs and Deployments using the [Deploy Documentation](https://github.com/edwardzjl/jupyter-kernel-controller?tab=readme-ov-file#build-run-deploy). + +### Creating and Running + +Once the `jupyter-kernel-controller` service is successfully deployed and running, you can create and run a cluster sandbox using the following code: + +```python +from uuid import uuid4 +from pybox import KubePyBoxManager, PyBoxOut + +# Initialize the cluster sandbox manager, replacing with actual paths and environment variable configurations +pybox_manager = KubePyBoxManager( + env_file="YOUR_ENV_FILE_PATH", # Path to the environment variable file + kernel_env="YOUR_KERNEL_ENV_DICT", # Kernel environment variable configuration +) + +# Assign a unique Kernel ID +kernel_id = str(uuid4()) + +# Start the cluster sandbox environment +box = pybox_manager.start(kernel_id) + +# Define the test code +test_code = """ +import math +result = math.sqrt(16) +result +""" + +# Run the code in the sandbox +out: PyBoxOut = box.run(code=test_code) + +# Print the execution result +print(out) +``` + +### Example Output + +After executing the code above, the following output will be returned, indicating successful execution without any errors: + +```plaintext +data=[{'text/plain': '4.0'}] error=None +``` + +**NOTE:** The `env_file` and `kernel_env` parameters required by `KubePyBoxManager` are essentially the same as those for `RemotePyBoxManager`. For detailed information about these parameters, please refer to the [RemotePyBoxManager Advanced Environment Configuration](#advanced-environment-configuration). + + +With the above configuration, you can efficiently manage secure and reliable sandboxes in a Kubernetes cluster, supporting flexible control and extension of execution results. diff --git a/docs/howto/incluster-code-execution.md b/docs/howto/incluster-code-execution.md index 76d3c91..9b1efe4 100644 --- a/docs/howto/incluster-code-execution.md +++ b/docs/howto/incluster-code-execution.md @@ -1,19 +1,3 @@ # Incluster Code Execution The `tablegpt-agent` directs `tablegpt` to generate Python code for data analysis. This code is then executed within a sandbox environment to ensure system security. The execution is managed by the [pybox](https://github.com/edwardzjl/pybox) library, which provides a simple way to run Python code outside the main process. - -## Usage - -If you're using the local executor (pybox.LocalPyBoxManager), follow these steps to configure the environment: - -1. Install the dependencies required for the `IPython Kernel` using the following command: - - ```sh - pip install -r ipython/requirements.txt - ``` - -2. Copy the code from the `ipython/ipython-startup-scripts` folder to the `$HOME/.ipython/profile_default/startup/` directory. - - This folder contains the functions and configurations needed to perform data analysis with `tablegpt-agent`. - - Note: The `~/.ipython` directory must be writable for the process launching the kernel, otherwise there will be a warning message: `UserWarning: IPython dir '/home/jovyan/.ipython' is not a writable location, using a temp directory.` and the startup scripts won't take effects.