cs phd student @harvard Β· prev @allen_ai, @cocosci_lab, undergrad @princeton Β· she/her

Joined May 2022
17 Photos and videos
Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666
Replying to @CogInterp
The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at coginterp.github.io/neurips2… (3/3)
2
8
1,005
bruce is great at making research resources and this one has been a huge help for my human studies in the stream! check it out ✨
New AI Control Toolkit - Mini Control Arena For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena. Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure. We are open-sourcing the codebase and hope it helps with your experiments, too! github.com/brucewlee/mini-co…
3
370
Sonia Murthy retweeted
My rockstar MATS mentee @BruceWLee2 has just open-sourced his sleek and elegant codebase for AI control research, ppl should give it a try!
New AI Control Toolkit - Mini Control Arena For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena. Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure. We are open-sourcing the codebase and hope it helps with your experiments, too! github.com/brucewlee/mini-co…
10
102
12,909
Sonia Murthy retweeted
πŸ“ New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
8
21
137
33,899
Sonia Murthy retweeted
Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!
We’re drowning in language models β€” there are over 2 mil. of them on Huggingface! Can we use some of them to understand which computational ingredients β€” architecture, scale, post-training, etc. – help us build models that align with human representations? Read on to find out 🧡
2
4
4
1,544
Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧡
1
6
32
10,754
We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.
1
2
270
Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]
1
218
Sonia Murthy retweeted
In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧡
4
26
180
14,786
Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427
1
21
834
Sonia Murthy retweeted
NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: bit.ly/4hNjtiI @soniakmurthy @tomerullman @_jennhu
4
21
5,106
(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427
3
14
73
7,213
Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱
1
2
588