SPaRC Grid Visualization
Grid Format
Model Performance Comparison
Performance
1,000
Unique Puzzles
98%
Human Success Rate
15.8%
Best AI Performance
15+
Steps Per Solution

Overview

Existing multi‑step reasoning benchmarks often conflate symbolic tasks with surface‑level linguistic pattern matching, leaving fundamental spatial reasoning abilities under‑explored. We introduce SPaRC, a dataset of 1,000 two‑dimensional grid puzzles that require joint path‑planning, rule satisfaction, and numeric‑geometric arithmetic. Humans solve 98% of puzzles in < 20 seconds on average, whereas state‑of‑the‑art language‑vision models achieve only 1–16% accuracy—underscoring a significant reasoning gap.

Key Features

Multi‑Constraint Problem Solving

Puzzles require simultaneously satisfying multiple spatial constraints, forcing models to integrate various rules (counting, segregation, shape logic) while planning a single path. Wrong steps can lead to irreversible errors, requiring careful hypothesis revision and deep abstract reasoning.

Joint Planning and Pathfinding

Tasks demand step-by-step planning combined with pathfinding and logic skills. Solvers must understand rule interactions, perform long-term planning, and navigate through spatial constraints—a core human ability that modern AI systems struggle with.

Significant Human-AI Performance Gap

Humans solve 98% of puzzles easily (including 94.5% of the hardest level 5 puzzles), while state-of-the-art reasoning models achieve only 15.8% accuracy overall and just 1.1% on difficult puzzles—revealing fundamental limitations in current AI spatial reasoning.

Puzzle Rules

Dots Rule
Dots
Gaps Rule
Gaps
Stones Rule
Stones
Stars Rule
Stars
Triangles Rule
Triangles
Polyominoes Rule
Polyominoes
Ylop Rule
Ylop
Dots Rule

Item Collection

Dots

The solution path needs to pass through every dot.

Example Puzzle

Puzzle Description

"Start at the marked position. Navigate through the grid while adhering to the rules defined by the colored elements. Find a valid path to the exit point while satisfying all constraints."

Puzzle ID
Difficulty
Grid Size
Solution Length

Click to interact

Click "Show Solution" to visualize the optimal path

Leaderboard

Model performance on SPaRC puzzles by difficulty level. Performance generally decreases sharply as difficulty increases.

Model All Level 1 Level 2 Level 3 Level 4 Level 5
Human 98.0% 100.0% 100.0% 100.0% 94.4% 94.5%
Reasoning Models
OpenAI o4-mini 15.8% 47.7% 19.5% 10.7% 1.2% 1.1%
OpenAI o3-mini 8.2% 29.1% 10.2% 2.5% 1.2% 0.0%
Qwen QwQ 32B 5.8% 20.9% 5.9% 2.5% 1.2% 0.0%
DeepSeek R1 70B 4.0% 17.4% 2.5% 1.7% 0.0% 0.0%
Instruction Models
OpenAI GPT-4.1 1.6% 7.0% 0.8% 0.8% 0.0% 0.0%
Google Gemma-3 27B 1.2% 3.5% 0.8% 0.8% 0.0% 1.1%
Qwen Qwen 2.5 72B 0.4% 0.0% 1.7% 0.0% 0.0% 0.0%
Model Performance Comparison

Figure: Error analysis by model type and reasoning category

Error Patterns by Model Type

Radar chart showing different failure modes

Error Analysis by Model Type

Error patterns differ across model types, with reasoning models showing better performance on baseline spatial tasks but struggling with complex rule interactions.

Reasoning Analysis Example

Case study of model reasoning failure

Reasoning Analysis Example

Analysis of puzzle 80a59619e323acba with an incorrect solution attempt by DeepSeek R1 Llama 70B. The model tends to commit to a sequence of reasoning steps without thoroughly validating each one, leading to careless mistakes.

Tools & Resources

Puzzle Visualizer

Interactive Tool

Solve any puzzle with our interactive visualization tool. Allows examining solutions, rules, and constraints.

Open Visualizer

Annotation Interface

Research Tool

Use our annotation interface to label puzzle samples and analyze human capabilities compared to AI models.

Start Annotating

Result Analyser

Coming Soon

Analyze model performance patterns and compare reasoning strategies across different AI systems to identify key areas for improvement.

Dataset

The SPaRC dataset contains 1,000 spatial reasoning puzzles with the complete "all" set available on HuggingFace.

HuggingFace Browse on HuggingFace
HuggingFace

HuggingFace Usage

Python
# Install and import pip install datasets from datasets import load_dataset # Login using e.g. `huggingface-cli login` to access this dataset ds = load_dataset("lkaesberg/SPaRC", "all", split="test") # Access puzzle data puzzle = ds[0] print(puzzle["id"]) # Puzzle identifier print(puzzle["difficulty_level"]) # Difficulty (1-5) print(puzzle["grid_size"]) # Grid dimensions

Citation

@article{kaesberg2025sparc, title = {SPaRC: A Spatial Pathfinding Reasoning Challenge}, author = {Kaesberg, Lars Benedikt and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela}, year = {2025}, url = {https://huggingface.co/datasets/lkaesberg/SPaRC} }

Contact

For questions, feedback, or collaboration opportunities:

l.kaesberg@uni-goettingen.de