SPaRC: A Spatial Pathfinding Reasoning Challenge

Kaesberg, Lars Benedikt; Wahle, Jan Philip; Ruas, Terry; Gipp, Bela

Grid Format

Performance

1,000

Unique Puzzles

98%

Human Success Rate

15.8%

Best AI Performance

15+

Steps Per Solution

Overview

Existing multi‑step reasoning benchmarks often conflate symbolic tasks with surface‑level linguistic pattern matching, leaving fundamental spatial reasoning abilities under‑explored. We introduce SPaRC, a dataset of 1,000 two‑dimensional grid puzzles that require joint path‑planning, rule satisfaction, and numeric‑geometric arithmetic. Humans solve 98% of puzzles in < 20 seconds on average, whereas state‑of‑the‑art language‑vision models achieve only 1–16% accuracy—underscoring a significant reasoning gap.

Key Features

Multi‑Constraint Problem Solving

Puzzles require simultaneously satisfying multiple spatial constraints, forcing models to integrate various rules (counting, segregation, shape logic) while planning a single path. Wrong steps can lead to irreversible errors, requiring careful hypothesis revision and deep abstract reasoning.

Joint Planning and Pathfinding

Tasks demand step-by-step planning combined with pathfinding and logic skills. Solvers must understand rule interactions, perform long-term planning, and navigate through spatial constraints—a core human ability that modern AI systems struggle with.

Significant Human-AI Performance Gap

Humans solve 98% of puzzles easily (including 94.5% of the hardest level 5 puzzles), while state-of-the-art reasoning models achieve only 15.8% accuracy overall and just 1.1% on difficult puzzles—revealing fundamental limitations in current AI spatial reasoning.

Puzzle Rules

Dots

Gaps

Stones

Stars

Triangles

Polyominoes

Ylop

Item Collection

Dots

The solution path needs to pass through every dot.

Example Puzzle

Puzzle Description

"Start at the marked position. Navigate through the grid while adhering to the rules defined by the colored elements. Find a valid path to the exit point while satisfying all constraints."

Puzzle ID

Difficulty

Grid Size

Solution Length

Click to interact

Click "Show Solution" to visualize the optimal path

Leaderboard

Model performance on SPaRC puzzles by difficulty level. Performance generally decreases sharply as difficulty increases.

Model	All	Level 1	Level 2	Level 3	Level 4	Level 5
Human	98.0%	100.0%	100.0%	100.0%	94.4%	94.5%
Reasoning Models
o4-mini	15.8%	47.7%	19.5%	10.7%	1.2%	1.1%
o3-mini	8.2%	29.1%	10.2%	2.5%	1.2%	0.0%
QwQ 32B	5.8%	20.9%	5.9%	2.5%	1.2%	0.0%
R1 70B	4.0%	17.4%	2.5%	1.7%	0.0%	0.0%
Instruction Models
GPT-4.1	1.6%	7.0%	0.8%	0.8%	0.0%	0.0%
Gemma-3 27B	1.2%	3.5%	0.8%	0.8%	0.0%	1.1%
Qwen 2.5 72B	0.4%	0.0%	1.7%	0.0%	0.0%	0.0%

Error Patterns by Model Type

Radar chart showing different failure modes

Error patterns differ across model types, with reasoning models showing better performance on baseline spatial tasks but struggling with complex rule interactions.

Reasoning Analysis Example

Case study of model reasoning failure

Analysis of puzzle 80a59619e323acba with an incorrect solution attempt by DeepSeek R1 Llama 70B. The model tends to commit to a sequence of reasoning steps without thoroughly validating each one, leading to careless mistakes.

Tools & Resources

Puzzle Visualizer

Interactive Tool

Solve any puzzle with our interactive visualization tool. Allows examining solutions, rules, and constraints.

Open Visualizer

Annotation Interface

Research Tool

Use our annotation interface to label puzzle samples and analyze human capabilities compared to AI models.

Start Annotating

Result Analyser

Coming Soon

Analyze model performance patterns and compare reasoning strategies across different AI systems to identify key areas for improvement.

Dataset

The SPaRC dataset contains 1,000 spatial reasoning puzzles with the complete "all" set available on HuggingFace.

Browse on HuggingFace

HuggingFace Usage

Python

# Install and import
pip install datasets
from datasets import load_dataset

# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("lkaesberg/SPaRC", "all", split="test")

# Access puzzle data
puzzle = ds[0]
print(puzzle["id"])                # Puzzle identifier
print(puzzle["difficulty_level"])  # Difficulty (1-5)
print(puzzle["grid_size"])         # Grid dimensions

Citation

@article{kaesberg2025sparc, title = {SPaRC: A Spatial Pathfinding Reasoning Challenge}, author = {Kaesberg, Lars Benedikt and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela}, year = {2025}, url = {https://huggingface.co/datasets/lkaesberg/SPaRC} }

Contact

For questions, feedback, or collaboration opportunities:

l.kaesberg@uni-goettingen.de