Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data		data
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

FullStack Bench: Evaluating LLMs as Full Stack Coders

Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"

🏠 FullStack Bench Code • 📊 Benchmark Data • 📚 SandboxFusion

📌Introduction

FullStack Bench is a multilingual benchmark for full-stack programming, covering a wide range of application domains and 16 programming languages with 3K test samples, which substantially pushes the limits of code LLMs in code-related abilities of the real-world code development scenarios.

Task Examples

FullStack Bench covers more mainstream application domains when compared to existing code evaluation benchmarks. Here is a visualization example from FullStack Bench, where the model is tasked with solving problems in the domain of desktop and web development using HTML.

Refer to our paper or dataset for more details.

Results

Refer to our paper for more results.

📚SandboxFusion

SandboxFusion is an an effective code sandbox execution tool to evaluate different programming tasks from different languages. It incorporates over 10 coding-related evaluation datasets, featuring a standardized data format and accessible via a uniform HTTP API.

Refer to our paper and 📚 Tutorial for more Details.

📊Data

Dataset	Download
FullStack Bench Dataset	🤗 HuggingFace

💻Usage

Start the sandbox server:

docker run -d --rm -p 8080:8080 volcengine/sandbox-fusion:server-20241204

For users in mainland China, the following mirror is provided:

docker run -d --rm -p 8080:8080 vemlp-cn-beijing.cr.volces.com/preset-images/code-sandbox:server-20241204

Then, run the benchmark:

git clone https://github.com/bytedance/FullStackBench.git
cd FullStackBench
pip install -r requirements.txt
# modify the model configs in src/main.py
python src/main.py

📖Citation

If you find our work helpful, please use the following citations.

@misc{liu2024fullstackbenchevaluatingllms,
      title={FullStack Bench: Evaluating LLMs as Full Stack Coders}, 
      author={Siyao Liu and He Zhu and Jerry Liu and Shulin Xin and Aoyan Li and Rui Long and Li Chen and Jack Yang and Jinxiang Xia and Z. Y. Peng and Shukai Liu and Zhaoxiang Zhang and Ge Zhang and Wenhao Huang and Kai Shen and Liang Xiang},
      year={2024},
      eprint={2412.00535},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.00535}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FullStack Bench: Evaluating LLMs as Full Stack Coders

Table of contents

📌Introduction

Task Examples

Results

📚SandboxFusion

📊Data

💻Usage

📖Citation

About

Releases

Packages

Languages

License

bytedance/FullStackBench

Folders and files

Latest commit

History

Repository files navigation

FullStack Bench: Evaluating LLMs as Full Stack Coders

Table of contents

📌Introduction

Task Examples

Results

📚SandboxFusion

📊Data

💻Usage

📖Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages