MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for LLMs.
https://arxiv.org/abs/2410.23074
we propose MPLSandbox, an out-of-the-box sandbox designed to provide unified compiler feedback across multiple programming languages. Additionally, it integrates traditional code analysis tools, delivering comprehensive code information to LLMs from numerous perspectives. MPLSandbox simplifies code analysis for researchers, and can be seamlessly integrated into LLM training and application processes to enhance the performance of LLMs in a range of code-related tasks.
MPLSandbox consists of three core modules:
This Module can provide unified compiler feedback by compiling and executing the code. The code and unit test samples are sent to the sub-sandbox of the corresponding programming language for isolated execution to obtain compiler feedback. The sandbox ensures the program executes safely without jeopardizing the external environment or interrupting the training process
This module includes multiple traditional analysis tools to offer a comprehensive analysis report from numerous perspectives. It provides a comprehensive code analysis from multiple perspectives, such as static analysis (i.e., potential bug detection} and code smell analysis) and dynamic analysis (i.e., fuzz testing and efficiency analysis). Additionally, this module can also assess other input information besides the code, such as evaluating the coverage of unit tests for the code, aiding researchers in improving the quality of these unit tests.
This module integrates compilation feedback and various analysis results to accomplish a range of complex code-related tasks. It integrates these results for LLMs to improve the quality of generated code and enhance their performance on a range of code-related tasks.
The user can create and install MPLSandbox using the following command:
git clone git@github.com:Ablustrund/MPLSandbox.git
cd MPLSandbox
pip install .
# pip install -e . ## for editable mode
First, users need to deploy the Docker images addresses on the host machine. After extensive testing, we have installed the necessary dependencies in Docker containers for various languages and packaged these custom Docker containers into the corresponding images as follows. We hope that users can directly use our open-source images because this can, to some extent, reduce the hassle of installing dependencies for various languages.
Python: mplsandbox-python-3.9.19-v1
Java: mplsandbox-java-11.0.12-v1
JavaScript: mplsandbox-javascript-22-v1
Go: mplsandbox-golang-1.17.0-v1
Ruby: mplsandbox-ruby-3.0.2-v1
TypeScript: mplsandbox-typescript-1-22-v1
Bash: mplsandbox-bash-v1
We recommend that users manually download these image files and then use the following command to import them into Docker:
docker load < <path_to_downloaded_image>
If users wish to use custom images, we recommend modifying the DefaultImage
class in /mplsandbox/const.py
to define their own images.
Users can start mplsandbox and run it with the following lines of code:
from mplsandbox import MPLSANDBOX
data = {
"question":"Define get_sum_of_two_numbers():\n \"\"\"Write a function that takes two integers as input and returns their sum.\n\n -----Input-----\n \n The input consists of multiple test cases. Each test case contains two integers $a$ and $b$ ($-10^9 \\le a, b \\le 10^9$).\n \n -----Output-----\n \n For each test case, print the sum of the two integers.\n \n -----Example-----\n Input\n 3\n 1 2 ↵\n -1 1 ↵\n 1000000000 1000000000\n \n Output\n 3\n 0\n 2000000000\n \"\"\"",
"code": 'def get_sum_of_two_numbers():\n a, b = map(int, input().split(" "))\n print(a * b)\nget_sum_of_two_numbers()',
"unit_cases": {
"inputs": ["1 2", "3 4"],
"outputs": ["3", "7"]
},
"lang": "python"
} # or a JSON file path
executor = MPLSANDBOX(data)
result = executor.run(analysis_type="all")
The specific descriptions of all fields in the data are as follows:
Field | Description |
---|---|
question |
(Required) Specifies the path to the code file to be executed. |
code |
(Required) Specifies the code to be executed. |
unit_cases |
(Required) Specifies the unit test cases, including inputs and expected outputs . |
lang |
(Optional) Specifies the language of the code. If not specified, it can be set to "AUTO" for automatic recognition. |
libraries |
(Optional) Specifies a list of dependency library names that need to be installed. |
client |
(Optional) Specifies the docker client instance to be used |
image |
(Optional) Specifies the docker image used to run the code. |
dockerfile |
(Optional) Specifies the path to the dockerfile used to build a custom docker image. |
keep_template |
(Optional) If it is set to True , the template files will be kept after the code is run. |
verbose |
(Optional) If it is set to True , verbose output will be enabled to assist with debugging and diagnosing issues. |
app |
(Optional) If it is set to True , app mode will be enabled, facilitating the deployment of services on the server. |
We also provide the following command-line interface to scan the data.json
file and output the report to the report.txt
file:
mplsandbox --data /path/to/your/data.json --report /path/to/your/report.txt
MPLSandbox often serves as a node for emitting code-related signals, so configuring the corresponding services is very important. We have provided a simple service demo in the scripts
directory, and users can run this demo with the following command:
cd scripts
python ./app.py
Then, users can access the service using the curl command or other methods, and the format example is in scripts/test_app.sh
./test_app.sh
We are working hard to refactor and improve the open-source version of MPLSandbox to closely match the functionality of the version used internally by Meituan LLM Team. We are currently working hard to reconstruct analysis tools for languages such as Go, JavaScript, and Ruby to achieve better code analysis and automated testing.
@misc{dou2024MPLSandbox,
title={Multi-Programming Language Sandbox for LLMs},
author={Shihan Dou and Jiazheng Zhang and Jianxiang Zang and Yunbo Tao and Haoxiang Jia and Shichun Liu and Yuming Yang and Shenxi Wu and Shaoqing Zhang and Muling Wu and Changze Lv and Limao Xiong and Wenyu Zhan and Lin Zhang and Rongxiang Weng and Jingang Wang and Xunliang Cai and Yueming Wu and Ming Wen and Rui Zheng and Tao Ji and Yixin Cao and Tao Gui and Xipeng Qiu and Qi Zhang and Xuanjing Huang},
year={2024},
eprint={2410.23074},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2410.23074},
}
@article{dou2024s,
title={What's Wrong with Your Code Generated by Large Language Models? An Extensive Study},
author={Dou, Shihan and Jia, Haoxiang and Wu, Shenxi and Zheng, Huiyuan and Zhou, Weikang and Wu, Muling and Chai, Mingxu and Fan, Jessica and Huang, Caishuang and Tao, Yunbo and others},
journal={arXiv preprint arXiv:2407.06153},
year={2024}
}