cto @roundtablehq_, prev phd @princeton // language models, cogsci, ml

Joined April 2014
147 Photos and videos
Matt Hardy retweeted

1
3
508
Coding tools do seem to be making software development less social in ways that can be costly even if you’re shipping more code. Instead of a constant back-and-forth while a feature is being built, an eng can fully ship a massive PR relatively autonomously. This sounds nice but can lead to a nightmare review process and for the shipped code to go off the rails if the ticket was underspecified (normally this sort of problem would be corrected earlier through social feedback). Similarly, the social back-and-forth helps you identify when something is overly complicated and probably should be re-architected, whereas an LLM has no problem writing this type of code. Very important to maintain human collaboration and communication as long as humans are (still) in the loop! For now we're still doing standups.
exciting stuff
4
606
Matt Hardy retweeted
Just published in @PNASNews, we resolve a 50-year-old riddle from Richard Feynman's handwritten notes, prove and generalize it, and run a large-scale human study to reveal near-optimal heuristics in sequential decision problems: doi.org/10.1073/pnas.2509612…
4
20
87
8,067
Matt Hardy retweeted
got the power of god, anime and cats on our side @RoundtableHQ_
1
7
234
Matt Hardy retweeted
(1/n) Thrilled to launch our new preprint -- led by the brilliant Milena Rmus (@milenamr7) and with superstars Matt Hardy (@mdahardy) and Tom Griffiths (@cocosci_lab)
1
7
26
1,876
Matt Hardy retweeted
After AlphaGo, the skill of human Go players noticeably improved. I suspect we will see a similar pattern in math.
Another major problem, this time in additive combinatorics, has fallen, this time to humans rather than AI, but using methods related to the AI solution to the unit distance conjecture.
187
976
9,046
786,611
You should try pair vibe coding. People have developed totally different approaches to using the models.
1
2
174
Another instance of Moravec: For me, Claude is much better at back-end work than front-end. In FE, issues often show up only after visually inspecting/using a component. Whereas in back-end, feedback is already text-based (logs) and Claude can work relatively autonomously.
3
178
Matt Hardy retweeted
Introducing the PROCESS Turing Test. It's time for Proof of Human
Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes differ between AI agents and humans - even when the final output is identical. Link: arxiv.org/abs/2605.06524 This work was led by @milenamr7 and co-authored with @cocosci_lab, and @mayankagrawal
2
5
431
Matt Hardy retweeted
seems like room for improvement in making AIs more human-like by training on human (including neural) processes, not just outputs - and also potential metrics to hillclimb
Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes differ between AI agents and humans - even when the final output is identical. Link: arxiv.org/abs/2605.06524 This work was led by @milenamr7 and co-authored with @cocosci_lab, and @mayankagrawal
2
4
472
Matt Hardy retweeted
The new Turing test from @RoundtableHQ_
Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes differ between AI agents and humans - even when the final output is identical. Link: arxiv.org/abs/2605.06524 This work was led by @milenamr7 and co-authored with @cocosci_lab, and @mayankagrawal
3
5
807
Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes differ between AI agents and humans - even when the final output is identical. Link: arxiv.org/abs/2605.06524 This work was led by @milenamr7 and co-authored with @cocosci_lab, and @mayankagrawal
2
9
26
4,452
Can agents be trained to close the gap? We compared fine-tuning recipes. Directly optimizing for human-like process works best, but only when the relevant process features are known, and it doesn't generalize across tasks. Specifying the right process is the real bottleneck.
1
1
2
194
What are the takeaways? Robust human verification may not need a fixed task battery or outpost-based testing (e.g. CAPTCHAs). Instead, tasks could be continuously regenerated to probe process dimensions where AI alignment is still imperfect - a dynamic red-teaming loop!
1
2
113
Matt Hardy retweeted
My submission to the Dwarkesh essay competition, also available on Substack now. Omaha as judgment day for AGI I answered Prompt 3: how do the labs start making money
$ 20k blog prize to answer some big questions about AI The not-so-secret point of this whole contest is so that I can hire a research collaborator to think through questions like this hand in hand with me. dwarkesh.com/p/blog-prize
2
3
8
1,098
Coming from Boston this never gets old
1
1
170
Matt Hardy retweeted
Packed room with @StanfordPsych! A new generation of researchers are using Proof of Human to verify the authenticity of all their published data. We got to give them a sneak peek of a recent research paper too…more to come here :) And thank you to the audience for amazing questions on human identity, both in practice and as an intellectual field of study
2
13
939
Matt Hardy retweeted
I always knew academia-to-entreprenurship would be my path, but it took me a while to gain the confidence. I go in-depth into my story here, starting from my Indian-American cultural heritage and finding out my main strength to be interdisciplinary synthesis. Why I Turned Down a Tenure-Track Professorship, now out on Minds, Machines, and Markets: substack.com/home/post/p-195…
1
3
19
2,817
What I don't get about bread culture is why it's so much cheaper in Europe compared to the US. If you go the bakery in Paris, you can get a baguette for ~1€. Whereas mass-produced American bread is like $3. Shouldn't the mass produced option be cheaper?
The European mind cannot comprehend the coexistence of the bakery section with the baked goods aisle.
28
20
7,198