Skip to content

Commit

Permalink
docs: improve code sandbox documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
weekenthralling committed Dec 10, 2024
1 parent 1668779 commit f68162a
Show file tree
Hide file tree
Showing 2 changed files with 189 additions and 17 deletions.
190 changes: 189 additions & 1 deletion docs/explanation/code-sandbox.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,191 @@
# Code Sandbox

<!-- Placeholder -->
`tablegpt-agent` directs `tablegpt` to generate Python code for data analysis. However, the generated code may contain potential vulnerabilities or unexpected errors. Running such code directly in a production environment could threaten the system's stability and security.

`Code Sandbox` is designed to address this challenge. By leveraging sandbox technology, it confines code execution to a controlled environment, effectively preventing malicious or unexpected behaviors from impacting the main system. This provides an isolated and reliable space for running code safely.

`Code Sandbox` built on the [pybox](https://github.com/edwardzjl/pybox) library and supports three main execution modes:

- **Local Environment**: Executes code in a local sandbox for quick *deployment* and *validation*.
- **Remote Environment**: Create remote environments through `Jupyter Enterprise Gateway` to achieve shared computing.
- **Cluster Environment**: Bypassing the need for proxy services such as `Jupyter Enterprise Gateway` by communicating directly with kernel pods.

Code Sandbox is designed based on the following key principles:

- **Security**: Limits code access using sandbox technology to ensure a safe and reliable execution environment.
- **Isolation**: Provides independent execution environments for each task, ensuring strict separation of resources and data.
- **Scalability**: Adapts to diverse computing environments, from local setups to Kubernetes clusters, supporting dynamic resource allocation and efficient task execution.


## Local Environment

In a local environment, Code Sandbox utilizes the `pybox` library to create and manage sandbox environments, providing a secure code execution platform. By isolating code execution from the host system's resources and imposing strict permission controls, it ensures safety and reliability. This approach is especially suitable for **development** and **debugging** scenarios.

If you want to run `tablegpt-agent` in a local environment, you can enable the **local mode**. Below are the installation steps and a detailed operation guide.

### Installing

To use `tablegpt-agent` in local mode, install the library with the following command:

```sh
pip install tablegpt-agent[local]
```

### Configuring

`tablegpt-agent` comes with several built-in features, such as auxiliary methods for data analysis and support for displaying Chinese fonts. **These features are automatically added to the sandbox environment by default**. If you need advanced customization (e.g., adding specific methods or fonts), refer to the [TableGPT IPython Kernel Configuration Documentation](https://github.com/tablegpt/tablegpt-agent/tree/main/ipython) for further guidance.

### Creating and Running

The following code demonstrates how to use the pybox library to set up a sandbox, execute code, and retrieve results in a local environment:

```python
from uuid import uuid4
from pybox import LocalPyBoxManager, PyBoxOut

# Initialize the local sandbox manager
pybox_manager = LocalPyBoxManager()

# Assign a unique Kernel ID for the sandbox
kernel_id = str(uuid4())

# Start the sandbox environment
box = pybox_manager.start(kernel_id)

# Define the test code to execute
test_code = """
import math
result = math.sqrt(16)
result
"""

# Run the code in the sandbox
out: PyBoxOut = box.run(code=test_code)

# Print the execution result
print(out)
```

### Example Output

After running the above code, the system will return the following output, indicating successful execution with no errors:
```text
data=[{'text/plain': '4.0'}] error=None
```

With `Code Sandbox` in local execution mode, developers can enjoy the safety of sandbox isolation at minimal cost while maintaining flexibility and efficiency. This lays a solid foundation for more complex remote or cluster-based scenarios.


## Remote Environment

In a remote environment, `Code Sandbox` uses the `pybox` library and its `RemotePyBoxManager` to create and manage sandbox environments. The remote mode relies on the [Enterprise Gateway](https://github.com/jupyter-server/enterprise_gateway) service to dynamically create and execute remote sandboxes. This mode allows multiple services to connect to the same remote environment, enabling shared access to resources.

### Configuring

If `tablegpt-agent` is used in **remote mode**, the first step is to start the `enterprise_gateway` service. You can refer to the [Enterprise Gateway Deployment Guide](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/operators/index.html#deploying-enterprise-gateway) for detailed instructions on configuring and starting the service.

Once the service is up and running, ensure that the service address is accessible. For example, assume the `enterprise_gateway` service is available at `http://example.com`.

### Creating and Running

The following code demonstrates how to create a remote sandbox using `RemotePyBoxManager` and execute code within it:

```python
from uuid import uuid4
from pybox import RemotePyBoxManager, PyBoxOut

# Initialize the remote sandbox manager, replacing with the actual Enterprise Gateway service address
pybox_manager = RemotePyBoxManager(host="http://example.com")

# Assign a unique Kernel ID
kernel_id = str(uuid4())

# Start the remote sandbox environment
box = pybox_manager.start(kernel_id)

# Define the test code
test_code = """
import math
result = math.sqrt(16)
result
"""

# Run the code in the sandbox
out: PyBoxOut = box.run(code=test_code)

# Print the execution result
print(out)
```

### Example Output

After executing the above code, the system will return the following output, indicating successful execution without any errors:

```plaintext
data=[{'text/plain': '4.0'}] error=None
```

### Advanced Environment Configuration

The `RemotePyBoxManager` provides the following advanced configuration options to allow for flexible customization of the sandbox execution environment:

1. **`env_file`**: Allows you to load environment variables from a file to configure the remote sandbox.
2. **`kernel_env`**: Enables you to pass environment variables directly as key-value pairs, simplifying the setup process.

To learn more about the parameters and configuration options, refer to the [Kernel Environment Variables](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/users/kernel-envs.html) documentation.


## Cluster Environment

In a Kubernetes cluster, `Code Sandbox` leverages the `KubePyBoxManager` provided by the `pybox` library to create and manage sandboxes. Unlike the `remote environment`, the cluster environment **communicates directly with Kernel Pods** created by the [Jupyter Kernel Controller](https://github.com/edwardzjl/jupyter-kernel-controller), eliminating the need for an intermediary service like `Enterprise Gateway`.

### Configuring

Before using the cluster environment, you need to deploy the `jupyter-kernel-controller` service. You can quickly create the required CRDs and Deployments using the [Deploy Documentation](https://github.com/edwardzjl/jupyter-kernel-controller?tab=readme-ov-file#build-run-deploy).

### Creating and Running

Once the `jupyter-kernel-controller` service is successfully deployed and running, you can create and run a cluster sandbox using the following code:

```python
from uuid import uuid4
from pybox import KubePyBoxManager, PyBoxOut

# Initialize the cluster sandbox manager, replacing with actual paths and environment variable configurations
pybox_manager = KubePyBoxManager(
env_file="YOUR_ENV_FILE_PATH", # Path to the environment variable file
kernel_env="YOUR_KERNEL_ENV_DICT", # Kernel environment variable configuration
)

# Assign a unique Kernel ID
kernel_id = str(uuid4())

# Start the cluster sandbox environment
box = pybox_manager.start(kernel_id)

# Define the test code
test_code = """
import math
result = math.sqrt(16)
result
"""

# Run the code in the sandbox
out: PyBoxOut = box.run(code=test_code)

# Print the execution result
print(out)
```

### Example Output

After executing the code above, the following output will be returned, indicating successful execution without any errors:

```plaintext
data=[{'text/plain': '4.0'}] error=None
```

**NOTE:** The `env_file` and `kernel_env` parameters required by `KubePyBoxManager` are essentially the same as those for `RemotePyBoxManager`. For detailed information about these parameters, please refer to the [RemotePyBoxManager Advanced Environment Configuration](#advanced-environment-configuration).


With the above configuration, you can efficiently manage secure and reliable sandboxes in a Kubernetes cluster, supporting flexible control and extension of execution results.
16 changes: 0 additions & 16 deletions docs/howto/incluster-code-execution.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,3 @@
# Incluster Code Execution

The `tablegpt-agent` directs `tablegpt` to generate Python code for data analysis. This code is then executed within a sandbox environment to ensure system security. The execution is managed by the [pybox](https://github.com/edwardzjl/pybox) library, which provides a simple way to run Python code outside the main process.

## Usage

If you're using the local executor (pybox.LocalPyBoxManager), follow these steps to configure the environment:

1. Install the dependencies required for the `IPython Kernel` using the following command:

```sh
pip install -r ipython/requirements.txt
```

2. Copy the code from the `ipython/ipython-startup-scripts` folder to the `$HOME/.ipython/profile_default/startup/` directory.

This folder contains the functions and configurations needed to perform data analysis with `tablegpt-agent`.

Note: The `~/.ipython` directory must be writable for the process launching the kernel, otherwise there will be a warning message: `UserWarning: IPython dir '/home/jovyan/.ipython' is not a writable location, using a temp directory.` and the startup scripts won't take effects.

0 comments on commit f68162a

Please sign in to comment.