Filter
Exclude
Time range
-
Near
SAT Math Challenge 🚨 (No Calculator) What is the value of (x^2 - 9)/(x - 3) when x = 3 ? A) 0 B) 3 C) 6 D) Undefined “Most people say 6. Are they right?” #SATPrep #SATMath #MathTrap #NoCalculator #STEM #LearnOnX #TestPrep #OptimizeScience
1
1
2
25
Replying to @IQTestBrain
Let’s be real — This ain’t a test of intelligence, it’s a social media fishing net for comments like “I solved it 😎.” Meanwhile, real legends are out there solving life bills ÷ sanity = chaos. 💸☕ #MathTrap #SocialMediaFlex #LegendInMyOwnCalculator
1
15
1,634
LLMs struggle to combine learned knowledge for novel math problems, unlike humans. **Original Problem** 🔍: LLMs struggle with systematic compositionality in mathematical reasoning, despite impressive performance on complex tasks. This paper investigates their ability to combine learned knowledge components to solve novel problems. ----- **Solution in this Paper** 💡: • Constructs MATHTRAP dataset by adding logical traps to MATH/GSM8K problems • Traps require combining math knowledge with trap-related knowledge • Evaluates LLMs on original, trap, and conceptual problems • Explores interventions: prompts, few-shot demos, fine-tuning ----- **Key Insights from this Paper** 💡: • LLMs fail to spontaneously combine knowledge to solve trap problems • Stark performance gap between humans and LLMs on compositional tasks • External interventions can improve LLM performance on trap problems • Compositional generalization remains a key challenge for LLMs ----- **Results** 📊: • Closed-source LLMs: >70% on conceptual problems, <50% accuracy ratio on traps • Open-source: ~40% on conceptual/original, <20% accuracy ratio on traps • Humans: 83.8% on traps without notice, 95.1% with notice • Interventions improved performance, e.g. 5-shot demos boosted GPT-3.5 from 7.6% to 23.9%
7
1
8
2,141
Paper: "Exploring the Compositional Deficiency of LLMs in Mathematical Reasoning" 🧠 Key Points: - Investigates LLMs' compositionality in math reasoning - Created MathTrap dataset - Logical traps in MATH and GSM8k problems - "Unseen" cases for LLMs - Requires combining: - Math knowledge from original problems - Knowledge related to introduced traps 🔬 Findings: - LLMs possess both knowledge components - Fail to spontaneously combine for novel cases - Compositionality remains an open challenge 🛠️ Mitigation Methods: - Natural language prompts - Few-shot demonstrations - Fine-tuning ------ Generated this podcast with Google's illuminate.
2
5
8
2,110
GPT-o1 preview is ~2x better vs Llama 3.1-70B on this MathTrap dataset According to this Reddit post. 📊 Performance on MathTrap Private Dataset GPT-o1 preview API: 38.0% accuracy GPT-4 API: 36.0% accuracy Llama-3.1-8B: 13.5% accuracy Llama-3.1-70B: 19.4% accuracy ------ 🧩 MathTrap dataset, is a benchmark designed to evaluate the compositional generalization capabilities of large language models (LLMs) in mathematical reasoning. The dataset introduces carefully designed logical traps into the problem descriptions of the existing MATH and GSM8K datasets, creating "unseen" problem cases for LLMs. 🧩 MathTrap dataset is divided into public (open-source) and private (closed-source) datasets.
5
4
40
3,828
17 Nov 2022
Mais tellement 😂 Ça me régale
2
Replying to @Mathtrap
Yes, going up later today
1
Yes Mercedes pace was genuine (e.g. vs Ferrari and PER), but VER had damage too
1
1
7 Oct 2022
Replying to @F1
@KingMiky_ @Mathtrap @ChWyLeee @CyrNogues il a tourné avant la chicane 😭 je redoute le jour où cet légende ne sera plus en f1 😭
2
3
12 Sep 2022
1
1
Moths, moths, moths! I told her she needs bigger hands! 🫢😂 Privet stealing the show as always! 😉 #moths #britishmoths #mathtrap #mothsmatter
5
melee is like post hardcore mathtrap 🫣
1
5
Replying to @Mathtrap
Part of fixing the problem
5
21 Mar 2022
2
3
Replying to @Mathtrap
The real car in the studio
23
17 Feb 2022
Ça devient gênant @MickA__Jr @CyrNogues
1
2
17 Feb 2022
Replying to @Dernier_Secteur
La futur championne du monde @KingMiky_ @Mathtrap @CyrNogues @ChWyLeee
3
1
4
9 Feb 2022
C'est encore plus terrible que ce que je pensais, cette rage toujours présente c'est un bonheur sans fin 🥰
2
9 Feb 2022
Mdrrr la boucle est encore plus terrible qu'à Dadam !!! Réveille toi c'est fini
1
2