Joined January 2016
3,526 Photos and videos
Pinned Tweet
19 Oct 2023
6
869
Datamap retweeted
This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly: Can LLMs actually discover science, or are they just good at talking about it? The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder: Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists? Here’s what the authors did differently 👇 • They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision • Tasks span biology, chemistry, and physics, not toy puzzles • Models must work with incomplete data, noisy results, and false leads • Success is measured by scientific progress, not fluency or confidence What they found is sobering. LLMs are decent at suggesting hypotheses, but brittle at everything that follows. ✓ They overfit to surface patterns ✓ They struggle to abandon bad hypotheses even when evidence contradicts them ✓ They confuse correlation for causation ✓ They hallucinate explanations when experiments fail ✓ They optimize for plausibility, not truth Most striking result: `High benchmark scores do not correlate with scientific discovery ability.` Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories. Why this matters: Real science is not one-shot reasoning. It’s feedback, failure, revision, and restraint. LLMs today: • Talk like scientists • Write like scientists • But don’t think like scientists yet The paper’s core takeaway: Scientific intelligence is not language intelligence. It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.” Until models can reliably do that, claims about “AI scientists” are mostly premature. This paper doesn’t hype AI. It defines the gap we still need to close. And that’s exactly why it’s important.
378
2,110
8,185
1,171,743
Datamap retweeted
You rarely solve hard problems in a flash of insight. It's more typically a slow, careful process of exploring a branching tree of possibilities. You must pause, backtrack, and weigh every alternative. You can't fully do this in your head, because your working memory is too limited. Writing is the external medium that affords the time and precision necessary. Serious thinking must be done in writing. And that's why you can't outsource your writing, because then you're outsourcing your thinking.
92
311
2,798
126,966
Datamap retweeted
🧵5/n. 🧪 Critical thinking effects When answers arrive instantly, people practice evaluation and reasoning less, and the paper reports measurable declines in critical‑thinking scores among heavy users explained by offloading behavior. The punchline is not anti‑tool, it is that over‑delegation breeds standardized critical thinking, where everyone leans on the same shortcuts.
1
3
18
883
Datamap retweeted
"As a result of [China's] massive supply, the cost of generating electricity from solar has now fallen to a global average of around $0.04 per kilowatt hour—making it the cheapest energy source in history". currentaffairs.org/news/chin… Meanwhile, Western officials complain about China's so-called "overcapacity", which is precisely what is making a transition away from fossil fuels possible for the world. As this physicist writes: "one thing is clear: while China is making political decisions based on scientific evidence and while it is flooding the market with cheap solar energy, the Western world is sinking in a quagmire of self-righteous debate consisting of right-wing lies and left-wing virtue signaling. We need to get serious about how China is offering a way to combat climate change".
80
853
2,579
119,076
Datamap retweeted
I like the analogy of the "bicycle for the mind", because riding a bike requires effort from you, and the bike multiplies the effect of that effort. I don't think the end goal of technology should be to let you sit around and twiddle your thumbs.
64
168
1,450
91,395
Datamap retweeted
Software engineers shouldn't fear being replaced by AI. They should fear being asked to maintain the sprawling mess of AI-generated legacy code their employer's systems will soon run on. Because that one will actually happen.
341
943
8,242
332,190
Datamap retweeted
Will this be the last Keeling curve upate from NOAA for CO2 at Mauna Loa? June 2025: 429.61 ppm This may be a tragic moment:
119
810
2,768
203,833
Datamap retweeted
An international call for action just got louder: Today, 7 Nobel Laureates have issued a powerful call for a minimum tax on the ultra-wealthy in Le Monde Here’s a quick breakdown of the debate—and where things stand globally lemonde.fr/en/opinion/articl… 🧵
18
213
512
73,682
Datamap retweeted
1/ This graph from @JonBruner tells an important story: America's current dominance in science only began after the mid-1930s, when persecuted scientists began fleeing universities in Germany and then elsewhere in occupied Europe.
105
2,009
6,467
343,224
Datamap retweeted
You've heard of the studies where they give the same dataset/research question to a bunch of researchers and they tend to get different answers, right? Why is that? This new working paper shows that it has a lot to do with data cleaning. This is consistent with Gelman's "garden of forking paths" analogy. Small researcher coding decisions greatly influence results, often without being explicitly acknowledged.
23
348
1,698
211,238
Datamap retweeted
Mexico's president Claudia Sheinbaum is an energy systems expert. She is positioning Mexico to lead in the global green economy —from EVs & batteries to Renewables,Critical minerals,HVAC manufacturing. Her Plan Mexico is at a critical juncture. Our report: netzeropolicylab.com/mexico-…
I'm excited to share our @NZpolicylab analysis of Mexico's industrial policy and its potential in the energy transition. The report examines pathways for green investments and sustainable development. Read the full report: netzeropolicylab.com/mexico-…
9
121
354
41,466
Datamap retweeted
Overall, many employers in their sample have a distinct Democratic tilt. Look at how few sectors are dominated by Republicans! {This could have something to do with what orgs are in their database, but their sample is quite large!}
11
42
595
59,506
Datamap retweeted
Now, looping back around, how does a dive bar tie into this? When our team visited the mill, we would stay in Memphis, about an hour away. At the end of the day, we would swing by the closest bar—Bar Dog—for a nightcap. And the bar was usually full of people speaking German.
11
62
1,030
68,571
Datamap retweeted
1 Apr 2025
ai has convinced me that there are millions of people out there who functionally just do not have a working brain. their motor functions are on autopilot but they lack intelligence in the same way a bacterium does
21
92
919
99,725
Datamap retweeted
29 Mar 2025
amazing graph on open source access to research and its byproducts by @percyliang
6
17
5,947
Datamap retweeted
NEW 🧵: Is human intelligence starting to decline? Recent results from major international tests show that the average person’s capacity to process information, use reasoning and solve novel problems has been falling since around the mid 2010s. What should we make of this?
1,708
4,551
16,929
4,030,671
14 Mar 2025
53
Datamap retweeted
Paris, rue La Fayette, 12 mars 2025, 9h00.
307
146
1,916
2,834,516
Datamap retweeted
10 Mar 2025
"[AI agents] are threatening to break the blood-brain barrier between the application layer and the OS layer." Signal President Meredith Whittaker (@mer__edith) warns of "real danger" in agentic AI hype. To be "magic genie bots" (concert booking etc.), they need root access to all your data: browser, cards, messages. Cloud-processed & unencrypted: "privacy & security guarantees" at profound risk.
159
621
2,627
271,470
Datamap retweeted
NEW: The actions of Trump and Vance in recent weeks highlight something under-appreciated. The American right is now ideologically closer to countries like Russia, Turkey and in some senses China, than to the rest of the west (even the conservative west).
919
3,509
11,583
1,895,911