At the Alignment Research Center, formerly at OpenAI

Joined November 2012
11 Photos and videos
Jacob Hilton retweeted
We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵
27
137
942
181,017
ARC and @aicrowdHQ are launching a ≥$100k contest for white-box estimation algorithms: given the weights of an MLP, the goal is to estimate the expected output of the network on Gaussian inputs. (Thread)
1
11
48
11,359
It's rare to find good, easily-measurable metrics for progress in alignment. But we are cautiously optimistic that top submissions will produce ideas that meaningfully advance our research.
1
397
Jacob Hilton retweeted
May 19
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
31
193
918
348,323
Jacob Hilton retweeted
Some personal news: I've started a new AI safety standards org, and our first two standards are out today. We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)
24
62
531
58,953
Can you estimate the average behavior of a neural network without running it? In ARC's latest paper, we address this question for wide randomly-initialized MLPs with Gaussian inputs. (Thread)
4
6
71
7,415
Our current approach works at the start of training, but we have a lot more work to do to produce methods that work throughout training (even for small models like the AlgZoo models we shared a few months ago).
1
1
7
602
Jacob Hilton retweeted
Seems like a great opportunity for technical talent to come into government and help the USG make sound, technically informed decisions on AI
CAISI is hiring for a bunch of exciting new roles, from partnerships to technical experts in AI x bio / chem and more. They're serious about bringing in strong researchers & engineers and letting them do good work. Based in DC or SF: nist.gov/caisi/careers-caisi
9
15
144
49,806
A challenge to the mechanistic interpretability community: fully interpret our 432-parameter RNN. (Thread)
15
36
556
64,331
Thanks to Zihao Chen, George Robinson, David Matolcsi, Jacob Stavrianos, Jiawei Li and Michael Sklar for work on this and other 2nd argmax models.
33
4,246