Math1089

Math1089

Users
Tweets

May 4

আসছেন . . . Coming . . . math1089.in/a-journey-throug… #errors #mistake #MathMistakes #matherrors #commonmistakes #commonerrors #MathTrap #traps #motivation #fallacy #mathfallacies

132

Optimize Science

Optimize Science @OptimizeScience

Mar 18

SAT Math Challenge 🚨 (No Calculator) What is the value of (x^2 - 9)/(x - 3) when x = 3 ? A) 0 B) 3 C) 6 D) Undefined “Most people say 6. Are they right?” #SATPrep #SATMath #MathTrap #NoCalculator #STEM #LearnOnX #TestPrep #OptimizeScience

Empowerment Through Adversity

Empowerment Through Adversity

@Empoweradverse

10 Nov 2025

Replying to @IQTestBrain

Let’s be real — This ain’t a test of intelligence, it’s a social media fishing net for comments like “I solved it 😎.” Meanwhile, real legends are out there solving life bills ÷ sanity = chaos. 💸☕ #MathTrap #SocialMediaFlex #LegendInMyOwnCalculator

1,634

Rohan Paul

Rohan Paul

@rohanpaul_ai

13 Oct 2024

LLMs struggle to combine learned knowledge for novel math problems, unlike humans. **Original Problem** 🔍: LLMs struggle with systematic compositionality in mathematical reasoning, despite impressive performance on complex tasks. This paper investigates their ability to combine learned knowledge components to solve novel problems. ----- **Solution in this Paper** 💡: • Constructs MATHTRAP dataset by adding logical traps to MATH/GSM8K problems • Traps require combining math knowledge with trap-related knowledge • Evaluates LLMs on original, trap, and conceptual problems • Explores interventions: prompts, few-shot demos, fine-tuning ----- **Key Insights from this Paper** 💡: • LLMs fail to spontaneously combine knowledge to solve trap problems • Stark performance gap between humans and LLMs on compositional tasks • External interventions can improve LLM performance on trap problems • Compositional generalization remains a key challenge for LLMs ----- **Results** 📊: • Closed-source LLMs: >70% on conceptual problems, <50% accuracy ratio on traps • Open-source: ~40% on conceptual/original, <20% accuracy ratio on traps • Humans: 83.8% on traps without notice, 95.1% with notice • Interventions improved performance, e.g. 5-shot demos boosted GPT-3.5 from 7.6% to 23.9%

2,141

Rohan Paul

Rohan Paul

@rohanpaul_ai

8 Oct 2024

Paper: "Exploring the Compositional Deficiency of LLMs in Mathematical Reasoning" 🧠 Key Points: - Investigates LLMs' compositionality in math reasoning - Created MathTrap dataset - Logical traps in MATH and GSM8k problems - "Unseen" cases for LLMs - Requires combining: - Math knowledge from original problems - Knowledge related to introduced traps 🔬 Findings: - LLMs possess both knowledge components - Fail to spontaneously combine for novel cases - Compositionality remains an open challenge 🛠️ Mitigation Methods: - Natural language prompts - Few-shot demonstrations - Fine-tuning ------ Generated this podcast with Google's illuminate.

5:18

2,110

Rohan Paul

Rohan Paul

@rohanpaul_ai

17 Sep 2024

GPT-o1 preview is ~2x better vs Llama 3.1-70B on this MathTrap dataset According to this Reddit post. 📊 Performance on MathTrap Private Dataset GPT-o1 preview API: 38.0% accuracy GPT-4 API: 36.0% accuracy Llama-3.1-8B: 13.5% accuracy Llama-3.1-70B: 19.4% accuracy ------ 🧩 MathTrap dataset, is a benchmark designed to evaluate the compositional generalization capabilities of large language models (LLMs) in mathematical reasoning. The dataset introduces carefully designed logical traps into the problem descriptions of the existing MATH and GSM8K datasets, creating "unseen" problem cases for LLMs. 🧩 MathTrap dataset is divided into public (open-source) and private (closed-source) datasets.

3,828