Guy Davidson (@guyd33) | Aguea

Photos and videos

Tweets

Pinned Tweet

Guy Davidson @guyd33

23 May 2025

New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

1

34

275

49,508

Guy Davidson retweeted

Dr. Karen Ullrich @karen_ullrich

10 Dec 2025

If “getting started with agents” feels like setup hell — same. So we made a starter tutorial: First agent running in <14 minutes, no Docker/AWS. Laptop API key only. 👇 youtube.com/watch?v=gzNW_LXE…

Starting with LLM Agents: Installing and Running OpenApps

A pythonic, lightweight, scalable, and open-source UI-agent environ...

3

13

1,654

Guy Davidson @guyd33

1 Dec 2025

Like ~everyone, I'll also be at #NeurIPS this week! Please reach out to chat about past (goal representations, cognitive science, intrep) or current interests (LLM mental state inference, social environments for RL). Also if you have leads on great coffee, craft beer, or tacos.

3

3

53

4,189

Guy Davidson @guyd33

3 Dec 2025

We're also presenting some work! Our (@adinamwilliams @LakeBrenden @todd_gureckis ) interpretability work on task representations from different prompting forms will be poster #1016 on Friday's afternoon session (4:30-7:30, hall C/D/E) x.com/guyd33/status/19259677…

Guy Davidson @guyd33

23 May 2025

New preprint alert! We often prompt ICL tasks using either demonstrations or instructions. How much does the form of the prompt matter to the task representation formed by a language model? Stick around to find out 1/N

1

1

12

871

Guy Davidson @guyd33

3 Dec 2025

@jcyhc_ai will present SAGE-Eval, our (w/ @LakeBrenden) systematic generalization safety benchmark at poster #1104 on Friday AM (11-2). John does fantastic work and he's open to RE/RS roles or PhD positions in AI Safety. If you're hiring, talk to him! x.com/jcyhc_ai/status/192811…

John (Yueh-Han) Chen

@jcyhc_ai

29 May 2025

Do LLMs show systematic generalization of safety facts to novel scenarios? Introducing our work SAGE-Eval, a benchmark consisting of 100 safety facts and 10k scenarios to test this! - Claude-3.7-Sonnet passes only 57% of facts evaluated - o1 and o3-mini passed <45%! 🧵

2

5

1,418

Guy Davidson retweeted

Dr. Karen Ullrich @karen_ullrich

3 Dec 2025

Stop by the Meta booth tomorrow, Wednesday Dec 3rd at #NeurIPS in San Diego! 🤖📱 We demo our new research environment, OpenApps, for digital agents. Generate thousands of app versions to train and evaluate multimodal agents to use apps like humans do. Not attending? Stay tuned

1

2

9

938

Guy Davidson retweeted

Cédric @cedcolas

1 Dec 2025

In San Diego for #NeurIPS Happy to chat about open-endedness, self goal-generation, intrinsic motivations, self-improvement, human-machine collective intelligence Open to hear about research scientist opportunities too Don't hesitate to reach out!

3

3

29

2,433

Guy Davidson @guyd33

20 Oct 2025

My team at FAIR at Meta is recruiting interns for next summer! If you're a PhD student interested in questions around theory of mind in language models for social, multi-agent settings, and have relevant background and/or experience: metacareers.com/jobs/1821713…

5

21

188

12,659

Guy Davidson @guyd33

20 Oct 2025

Other great humans you might end up working with include @bvp22294, @AnsongNi, and @real_asli (in addition to several twitter-less folks). Feel free to reach out with any questions! (though it may take me a bit to reply)

3

883

Guy Davidson @guyd33

19 Sep 2025

Belated update #2: my year at FAIR @AIatMeta through the AIM program was so nice that I’m sticking around for the long haul. I’m excited to stay at FAIR and work with @real_asli and friends on fun LLM questions; I’ll be working from the New York office so we’re sticking around.

1

1

75

6,949

Guy Davidson @guyd33

17 Sep 2025

Belated update #1: I defended my PhD about a month ago! I appreciate the warm reception from everyone who made it in-person and virtually. Thanks to my committee, @LerrelPinto, @togelius, and Mark Ho for your feedback and fun questions.

11

71

7,228

Guy Davidson @guyd33

17 Sep 2025

I owe tremendous thanks to many other people, all (or, hopefully, at least most) of whom I mentioned in my acknowledgments. I’m also so grateful my dad could represent my family, and for my wife, Sarah, for, well, everything.

1

2

579

Guy Davidson @guyd33

17 Sep 2025

Tune in tomorrow for belated update #2, on post-PhD plans!

1

463

Guy Davidson retweeted

smitha milli @SmithaMilli

16 Sep 2025

do you use Letterboxd? would you be willing to participate in a 30-min research study where you use movie recommenders based on your Letterboxd ratings? DM me! (you will receive $20 for participating)

3

13

2,196

Guy Davidson @guyd33

6 Aug 2025

Friends and virtual acquaintances! I’m defending my PhD tomorrow morning at 11:30 AM ET. If anyone would like to watch, let me know and I’ll send you the Zoom link (and if you’re in NYC and feel compelled to join in person, that works, too!)

3

8

79

3,981

Guy Davidson @guyd33

30 Jul 2025

#CogSci2025 friends! I'm here all week and would love to chat. I'd particularly love to talk to anyone thinking about Theory of Mind and how to evaluate it better (in both minds and machines, in different settings/contexts), and about goals and their representations. Find me at:

1

1

10

1,156

Guy Davidson @guyd33

30 Jul 2025

Saturday's poster session (P3-D-44) to talk about our goal inference work, in a new, physics-based environment we developed: escholarship.org/uc/item/6tb…

1

245

Guy Davidson @guyd33

30 Jul 2025

Wherever good coffee is to be found, the rest of the time. Don't hesitate to reach out! (also happy to talk about job search in industry and what that looks and feels like these days)

215

Guy Davidson @guyd33

28 Jul 2025

Compositionality and planning: maybe not quite solved yet (g.co/gemini/share/3d33f49bb4… if anyone is curious about the actual prompt, in which there was no attempt to trick the model or anything)

3

6

50

3,547

Guy Davidson @guyd33

28 Jul 2025

For what it's worth, Imagen 4 through Whisk isn't any better (and the above is Gemini 2.5 Pro)

2

1

484

Guy Davidson @guyd33

28 Jul 2025

This reward function on the left feels like trying to motivate myself to finish this figure

209