Skip to content

kiril-bikov/AugARC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AugARC - Augmented Abstraction and Reasoning Corpus

Introduction

This repository contains the source code for producing the augmented ARC datasets, training LLMs and testing them on the AugARC Benchmark.

The AugARC provides is easy and unified benchmark to evaluate LLMs on 3-shot accuracy on reasoning tasks. In AugARC, each ARC task starts with a textual description explaining the format of the problem. Each ARC grid is represented as a 2D matrix of numbers.

In AugARC, the first prediction is based on a normal ARC task, whereas the second and the third ones are 90° and 270° clockwise rotated versions of the same task. The AugARC benchmark is tailored towards LLMs’ architecture, as those models process inputs in an auto-regressive, sequential manner. By rotating the ARC tasks, LLMs are presented with a different sequence of numbers (2D matrices) which contain the same abstract logic.

Transformations on an ARC task to obtain its Augmented ARC variants are visualised below.

Base 90° Rotated 270° Rotated
770cc55f_base 770cc55f_90 770cc55f_270

Data

All the augmented ARC data is also available from:

https://osf.io/r58ks/

Citation

If you use our data, please cite our paper

https://openreview.net/pdf?id=cgUTWzgvCj

AugARC: Augmented Abstraction and Reasoning Benchmark for Large Language Models, Kiril Bikov, Mikel Bober-Irizar, Soumya Banerjee, AAAI Workshop on Preparing Good Data for Generative AI: Challenges and Approaches

Contact

Kiril Bikov and Soumya Banerjee

kmb85@cam.ac.uk

sb2333@cam.ac.uk

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published