Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983

Ahmad21Omar · 2025-09-22T21:09:44Z

We are pleased to propose the addition of SLR-Bench as a new Community Task, authored in collaboration with @lukashelff

SLR-Bench is a large-scale benchmark for scalable logical reasoning with language models, comprising 19,000 prompts organized into 20 curriculum levels. The tasks progressively increase in relational, arithmetic, and recursive complexity, requiring models to synthesize Prolog rules that classify train compositions.

Link to the Paper: https://arxiv.org/abs/2506.15787
Link to the Dataset: https://huggingface.co/datasets/AIML-TUDA/SLR-Bench

HuggingFaceDocBuilderDev · 2025-09-23T07:46:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NathanHB

hey ! thanks for the addition it looks great, only have a few nits :)

community_tasks/slr_bench_evals.py

Ahmad21Omar · 2025-09-23T21:00:15Z

hey ! thanks for the addition it looks great, only have a few nits :)

Hello there! :)

Thank you for your quick feedback and review. I have addressed the mentioned issues in my latest commit.

If you notice any issues with my recent changes or have any additional suggestions for improvement, please feel free to let us know.

Best regards!

community_tasks/slr_bench_evals.py

NathanHB · 2025-09-25T09:31:09Z

looks good ! thanks for the fixes :)

add slr_bench evals function

512449b

NathanHB reviewed Sep 23, 2025

View reviewed changes

community_tasks/slr_bench_evals.py Outdated Show resolved Hide resolved

community_tasks/slr_bench_evals.py Outdated Show resolved Hide resolved

community_tasks/slr_bench_evals.py Outdated Show resolved Hide resolved

NathanHB added the new-task label Sep 23, 2025

implement feedback on PR

e1add28

NathanHB reviewed Sep 24, 2025

View reviewed changes

community_tasks/slr_bench_evals.py Outdated Show resolved Hide resolved

remove logging and raise exception when judge not loaded

85ed489

NathanHB approved these changes Sep 25, 2025

View reviewed changes

NathanHB merged commit c7a063a into huggingface:main Sep 25, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983

Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983

Uh oh!

Ahmad21Omar commented Sep 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 23, 2025

Uh oh!

NathanHB left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ahmad21Omar commented Sep 23, 2025

Uh oh!

Uh oh!

NathanHB commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983

Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983

Uh oh!

Conversation

Ahmad21Omar commented Sep 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 23, 2025

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ahmad21Omar commented Sep 23, 2025

Uh oh!

Uh oh!

NathanHB commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!