This repository contains the dataset used for the "SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities" paper (CVF, arXiv). The paper was accepted to Workshop on Graphic Design Understanding and Generation (GDUG), held at the CVPR2024 conference.
The benchmark consists of six SVG editing tasks, each with its own folder. Each folder has two subfolders: answer
and query
. The query
folder contains the prompt for the LLM with the SVG code before editing. The answer
folder is where the answer images are.
We selected 100 images from the twemoji dataset (by Twitter, Inc and other contributors, licensed under CC BY 4.0) and made the input prompt and the answer images. The images were modified to create those answer images. Refer to the paper for more details on how we created the dataset.
To use the dataset to test your own LLM, clone the repository and input the prompts in the query
folder to the LLM.
If you want to change the number of cases, try out a new task, or use a different dataset to generate the cases, you can build your own dataset with the CaseGenerator.py
code. Follow these steps to do so:
- Copy the
CaseGenerator.py
file. - Clone the twemoji dataset in the same folder as the downloaded
CaseGenerator.py
file.
Note
If you plan to use a different SVG dataset, make sure to update the path to the SVG files and the name for each image in the CaseGenerator.py
file.
- Run the code.
Tip
Use the latest version of Python to minimize the possibility of images not being included in the dataset due to the absence of the emoji names in the unicodedata
library.
- The dataset will be built in the same structure as this repository.
- Code and prompts licensed under the MIT License
- Images licensed under the CC-BY 4.0 License