ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

ScratchEval provides a series of very challenging questions designed to test the large multimodal models' (LMM) visual code reasoning ability.

Specifically, we designed a series of challenging multiple-choice questions problems using the visual modular programming language Scratch and found that even the most advanced LMM still perform poorly on our test benchmark.

All our data is stored in ./data folder.

📜 License

The benchmark was annotated and developed by the authors of this paper, and the dataset is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

📜 License

About

Releases

Packages

License

HKBUNLP/ScratchEval

Folders and files

Latest commit

History

Repository files navigation

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

📜 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages