Skip to content

ScratchEval, a benchmark that uses the Scratch language to systematically evaluate the visual programming capabilities of state-of-the-art LMMs.

License

Notifications You must be signed in to change notification settings

HKBUNLP/ScratchEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

ScratchEval provides a series of very challenging questions designed to test the large multimodal models' (LMM) visual code reasoning ability.

Specifically, we designed a series of challenging multiple-choice questions problems using the visual modular programming language Scratch and found that even the most advanced LMM still perform poorly on our test benchmark.

All our data is stored in ./data folder.

📜 License

The benchmark was annotated and developed by the authors of this paper, and the dataset is released under the Apache 2.0 license.

About

ScratchEval, a benchmark that uses the Scratch language to systematically evaluate the visual programming capabilities of state-of-the-art LMMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published