[Aug 15, 2025] Daily most popular paper on Hugging Face "We-Math 2.0"
A “MathBook” curriculum that teaches MLLMs math, then rewards them for thinking step-by-step.
✨️Why this matters✨️
Despite rapid progress, most multimodal LLMs still stumble on complex, visual math because datasets are patchy, difficulty is human-centric (not model-centric), and methods don’t generalize reasoning to related sub-problems. We-Math 2.0 tackles all three at once.
✨️What they built✨️
1. MathBook Knowledge System — a five-level hierarchy: 491 knowledge points linked to 1,819 fundamental principles spanning primary→university math.
2. MathBook-Standard & MathBook-Pro — handcrafted GeoGebra visuals; principled 3-D difficulty modeling (step, visual, context) that expands each problem into 7 levels for progressive training.
3. MathBook-RL — two-stage training: (i) Cold-Start SFT to internalize knowledge-oriented CoT; (ii) Progressive-Alignment RL with an average-reward mechanism plus dynamic scheduling (incl. knowledge-increment routing when the model errs).
4. MathBookEval — a benchmark of 1,000 fully annotated problems covering all 491 knowledge points; reasoning depth stratified (Level-1: 1–3 steps; Level-2: 4–6; Level-3: 7–10) and evaluated with GPT-4o as judge.
❓️Does it work? (7B scale)
On four standard benchmarks, MathBook-7B scores Avg 48.7 with 73.0 (MathVista), 28.0 (MathVision), 48.4 (We-Math), 45.2 (MathVerse)—competitive with recent open-source reasoning models—using only ~1K SFT 9.8K RL samples.
They attribute the “less-is-more” data efficiency to the structured knowledge system and training scheme.
✨️Why it’s interesting✨️
A model-centric difficulty space (not tied to human grade levels) curriculum RL is a neat recipe for visual math.
Benchmarking aligns to explicit knowledge points and multi-step depth, filling gaps in existing suites.
📝Notes
Early version; authors flag it as “working in progress.” Evaluation uses LLM-as-a-judge (GPT-4o), which is consistent with prior math-vision work but has known trade-offs.
🔗Source & links
Paper: arXiv 2508.10433 • Project:
we-math2.github.io • Code: GitHub (We-Math2.0) • Datasets: We-Math 2.0 Standard / Pro.
📈Metrics (today)
Hugging Face upvotes: 121
GitHub stars: 115
#MLLM #MathReasoning #VisionLanguage #ReinforcementLearning #Benchmark #WeMath #MathBook #AIResearch
✨️Follow us for more daily picks and researcher-ready breakdowns — and connect with the community on
reveal.ac✨️