Edward Matthews

Edward Matthews

52 Photos and videos

Tweets

Edward Matthews

@edmtthws

Jun 13

Interesting paper this week

Team Grasp

@graspdotstudy

Jun 13

This week we had the pleasure of @romovpa presenting "Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs" arxiv.org/pdf/2603.24511 He showed us how auto-research (i.e. Claude Code in a loop, equipped with a simple benchmark) can discover attack algorithms on LLMs better than hand-crafted SOTAs. "Attack" here is a prefix added to a prompt that would lead to a fixed string in the output of an LLM with high probability; LLM is a white box with known weights. Some takeaways: 1. Without seeding this autoresearch with multiple hand-written attacks, it does not work — it combines ideas, but does not come up with novel ideas. 2. Autoresearch does much better than hparam search only. 3. Plenty of options for reward hacking — need to design benchmark with autoresearch in mind. 4. Kimi performed no worse than Claude or Gemini in this task. 5. Always useful to run autoresearch if you have a benchmark to optimise as it is so low effort and powerful.

Matt Clifford

Edward Matthews retweeted

Matt Clifford

@matthewclifford

Jun 13

I do find it extraordinary that current events in AI don’t make the top ~30 stories on the BBC News homepage

118

1,589

172,641

Edward Matthews

Edward Matthews

@edmtthws

Jun 13

I think the scientific concept I most undervalued at school was measurement error, I found it such a chore at the time.

Edward Matthews

Edward Matthews

@edmtthws

Jun 13

I wonder what else I wrote off too early.

Team Grasp

Edward Matthews retweeted

Team Grasp

@graspdotstudy

Jun 9

Grasp Paper Club reviewed Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning (arxiv.org/pdf/2505.15154) last week. - Our concise summary - Sometimes it’s enough to have a short answer. We can train a model to provide short or long answers. Based on the perplexity of the short answer, we can decide if we should switch to long answer. Their solution and literature review of SoTA could be improved.

Edward Matthews

Edward Matthews

@edmtthws

Jun 9

Edward Matthews

Edward Matthews

@edmtthws

Jun 5

Edward Matthews

Edward Matthews

@edmtthws

Jun 4

I made such a good introductory lesson today on the "Problem of Universals" and how it pervades into positions on AI paths.grasp.study/lessons/e1…

Grasp

Short courses, bespoke to you

paths.grasp.study

Team Grasp

@graspdotstudy

Jun 4

Grasp builds your learning path. Now you can choose which lessons to create, and when. 30 minutes to an hour of focused learning, ready when you are.

199

Edward Matthews

Edward Matthews

@edmtthws

Jun 4

This resource in particular has been really interesting, and I'm only at the introduction bu.edu/wcp/Papers/Mind/MindP…

Edward Matthews

Edward Matthews

@edmtthws

Jun 2

Edward Matthews

Edward Matthews

@edmtthws

Jun 2

no cmon claude use use tavily fetch en.wikipedia.org/wiki/How_to…

cephalopodshop

@macrocephalopod

May 29

Here’s the honest read. Here’s the real consideration. This is the genuine tradeoff. Here’s the honest answer. Here’s the key insight. This is the crux of the question. It’s worth flagging this. It’s not one or the other. Here’s what’s actually happening. Shut the fuck up.

136

Edward Matthews

Edward Matthews

@edmtthws

Jun 2

claude is quite low on eq

Edward Matthews

Edward Matthews

@edmtthws

May 31

.@stripe Sigma coding workflow sucks rn. Inbuilt ai tooling is 18 months behind in performance, gpt/claude dont have enough context

455

Edward Matthews

Edward Matthews

@edmtthws

May 31

okay single 'l' cancelled is the worst yet

Edward Matthews

@edmtthws

May 28

Replying to @wtgowers

sorry, maths coding in american english is ruining me

Edward Matthews

Edward Matthews

@edmtthws

May 31

Practiced base 2 this morning

Edward Matthews

Edward Matthews

@edmtthws

May 31

congealing yolks really are a double edged sword

Edward Matthews

Edward Matthews

@edmtthws

May 30

IPL is next level

Maya @kali_denali_

May 29

15.3

Noah Smith 🐇🇺🇸🇺🇦🇹🇼

Edward Matthews retweeted

Noah Smith 🐇🇺🇸🇺🇦🇹🇼

@Noahpinion

May 29

Yes, AIs are going to do all or almost all of the pure theory, but tbh humans probably finished most of the pure theory that it's possible for humans to do by the end of the 20th century. Yes there has been some recent theory progress but let's be honest, most is of marginal economic value at best. There's probably lots of useful pure theory left to do in this universe, but it's probably not the kind of stuff that can be intuited by a single human, explained to a grad student, and written down in a textbook. AI will do all that stuff.

223

73,149

Edward Matthews

Edward Matthews

@edmtthws

May 29

Anyone?

Edward Matthews

Edward Matthews

@edmtthws

May 28

please no