Mayank Agrawal

Mayank Agrawal

147 Photos and videos

Tweets

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

Jun 10

x.com/i/article/206474972736…

508

Matt Hardy

Matt Hardy

@mdahardy

Jun 4

Coding tools do seem to be making software development less social in ways that can be costly even if you’re shipping more code. Instead of a constant back-and-forth while a feature is being built, an eng can fully ship a massive PR relatively autonomously. This sounds nice but can lead to a nightmare review process and for the shipped code to go off the rails if the ticket was underspecified (normally this sort of problem would be corrected earlier through social feedback). Similarly, the social back-and-forth helps you identify when something is overly complicated and probably should be re-architected, whereas an LLM has no problem writing this type of code. Very important to maintain human collaboration and communication as long as humans are (still) in the loop! For now we're still doing standups.

Packy McCormick

@packyM

Jun 4

exciting stuff

606

Brian Christian

Matt Hardy retweeted

Brian Christian

@brianchristian

Jun 3

Just published in @PNASNews, we resolve a 50-year-old riddle from Richard Feynman's handwritten notes, prove and generalize it, and run a large-scale human study to reveal near-optimal heuristics in sequential decision problems: doi.org/10.1073/pnas.2509612…

Resolving Feynman’s restaurant problem reveals optimal solutions and human strategies | PNAS

In the 1970s, physicist Richard Feynman turned lunch with a friend into a math problem—how to optimize dish selection over multiple meals—but his h...

pnas.org

8,067

Milena Rmus

Matt Hardy retweeted

Milena Rmus @milenamr7

May 29

got the power of god, anime and cats on our side @RoundtableHQ_

234

Mayank Agrawal

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

May 29

(1/n) Thrilled to launch our new preprint -- led by the brilliant Milena Rmus (@milenamr7) and with superstars Matt Hardy (@mdahardy) and Tom Griffiths (@cocosci_lab)

1,876

Noam Brown

Matt Hardy retweeted

Noam Brown

@polynoamial

May 28

After AlphaGo, the skill of human Go players noticeably improved. I suspect we will see a similar pattern in math.

Timothy Gowers @wtgowers @wtgowers

May 28

Another major problem, this time in additive combinatorics, has fallen, this time to humans rather than AI, but using methods related to the AI solution to the unit distance conjecture.

187

976

9,046

786,611

Matt Hardy

Matt Hardy

@mdahardy

May 25

You should try pair vibe coding. People have developed totally different approaches to using the models.

174

Matt Hardy

Matt Hardy

@mdahardy

May 21

Another instance of Moravec: For me, Claude is much better at back-end work than front-end. In FE, issues often show up only after visually inspecting/using a component. Whereas in back-end, feedback is already text-based (logs) and Claude can work relatively autonomously.

178

Mayank Agrawal

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

May 21

Introducing the PROCESS Turing Test. It's time for Proof of Human

Matt Hardy

@mdahardy

May 19

Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes differ between AI agents and humans - even when the final output is identical. Link: arxiv.org/abs/2605.06524 This work was led by @milenamr7 and co-authored with @cocosci_lab, and @mayankagrawal

431

Bogdan Ionut Cirstea

Matt Hardy retweeted

Bogdan Ionut Cirstea @BogdanIonutCir2

May 20

seems like room for improvement in making AIs more human-like by training on human (including neural) processes, not just outputs - and also potential metrics to hillclimb

Matt Hardy

@mdahardy

May 19

472

Fernanda Palacios

Matt Hardy retweeted

Fernanda Palacios

@fernandaplcs

May 19

The new Turing test from @RoundtableHQ_

Matt Hardy

@mdahardy

May 19

807

Matt Hardy

Matt Hardy

@mdahardy

May 19

4,452

more replies

Matt Hardy

Matt Hardy

@mdahardy

May 19

Can agents be trained to close the gap? We compared fine-tuning recipes. Directly optimizing for human-like process works best, but only when the relevant process features are known, and it doesn't generalize across tasks. Specifying the right process is the real bottleneck.

194

Matt Hardy

Matt Hardy

@mdahardy

May 19

What are the takeaways? Robust human verification may not need a fixed task battery or outpost-based testing (e.g. CAPTCHAs). Instead, tasks could be continuously regenerated to probe process dimensions where AI alignment is still imperfect - a dynamic red-teaming loop!

113

Mayank Agrawal

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

May 13

My submission to the Dwarkesh essay competition, also available on Substack now. Omaha as judgment day for AGI I answered Prompt 3: how do the labs start making money

Dwarkesh Patel

@dwarkesh_sp

Apr 24

$ 20k blog prize to answer some big questions about AI The not-so-secret point of this whole contest is so that I can hire a research collaborator to think through questions like this hand in hand with me. dwarkesh.com/p/blog-prize

1,098

Matt Hardy

Matt Hardy

@mdahardy

May 13

Coming from Boston this never gets old

170

Mayank Agrawal

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

May 7

Packed room with @StanfordPsych! A new generation of researchers are using Proof of Human to verify the authenticity of all their published data. We got to give them a sneak peek of a recent research paper too…more to come here :) And thank you to the audience for amazing questions on human identity, both in practice and as an intellectual field of study

939

Mayank Agrawal

Matt Hardy retweeted

Mayank Agrawal

@mayankagrawal

May 4

I always knew academia-to-entreprenurship would be my path, but it took me a while to gain the confidence. I go in-depth into my story here, starting from my Indian-American cultural heritage and finding out my main strength to be interdisciplinary synthesis. Why I Turned Down a Tenure-Track Professorship, now out on Minds, Machines, and Markets: substack.com/home/post/p-195…

2,817

Matt Hardy

Matt Hardy

@mdahardy

May 1

What I don't get about bread culture is why it's so much cheaper in Europe compared to the US. If you go the bakery in Paris, you can get a baguette for ~1€. Whereas mass-produced American bread is like $3. Shouldn't the mass produced option be cheaper?

Todd of Mischief @AndToddsaid

Apr 29

The European mind cannot comprehend the coexistence of the bakery section with the baked goods aisle.

7,198