CS PhD student at Stanford / Researcher at the Alignment Research Center

Joined March 2017
31 Photos and videos
A cute question about inner product sketching came up in our research; any leads would be appreciated! 🙂 cstheory.stackexchange.com/q…

1
2
1,095
Victor Lecomte retweeted
A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls "the theft of the millennium." The simple facts of the case are both devastating and darkly hilarious. I'll explain for your amusement. The letter 'Not For Private Gain' is written for the relevant Attorneys General and is signed by 3 Nobel Prize winners among dozens of top ML researchers, legal experts, economists, ex-OpenAI staff and civil society groups. (I'll link below.) It says that OpenAI's attempt to restructure as a for-profit is simply totally illegal, like you might naively expect. It then asks the Attorneys General (AGs) to take some extreme measures I've never seen discussed before. Here's how they build up to their radical demands. For 9 years OpenAI and its founders went on ad nauseam about how non-profit control was essential to: 1. Prevent a few people concentrating immense power 2. Ensure the benefits of artificial general intelligence (AGI) were shared with all humanity 3. Avoid the incentive to risk other people's lives to get even richer They told us these commitments were legally binding and inescapable. They weren't in it for the money or the power. We could trust them. "The goal isn't to build AGI, it's to make sure AGI benefits humanity" said OpenAI President Greg Brockman. And indeed, OpenAI’s charitable purpose, which its board is legally obligated to pursue, is to “ensure that artificial general intelligence benefits all of humanity” rather than advancing “the private gain of any person.” 100s of top researchers chose to work for OpenAI at below-market salaries, in part motivated by this idealism. It was core to OpenAI's recruitment and PR strategy. Now along comes 2024. That idealism has paid off. OpenAI is one of the world's hottest companies. The money is rolling in. But now suddenly we're told the setup under which they became one of the fastest-growing startups in history, the setup that was supposedly totally essential and distinguished them from their rivals, and the protections that made it possible for us to trust them, ALL HAVE TO GO ASAP: 1. The non-profit's (and therefore humanity at large’s) right to super-profits, should they make tens of trillions? Gone. (Guess where that money will go now!) 2. The non-profit’s ownership of AGI, and ability to influence how it’s actually used once it’s built? Gone. 3. The non-profit's ability (and legal duty) to object if OpenAI is doing outrageous things that harm humanity? Gone. 4. A commitment to assist another AGI project if necessary to avoid a harmful arms race, or if joining forces would help the US beat China? Gone. 5. Majority board control by people who don't have a huge personal financial stake in OpenAI? Gone. 6. The ability of the courts or Attorneys General to object if they betray their stated charitable purpose of benefitting humanity? Gone, gone, gone! Screenshotting from the letter: (I'll do a new tweet after each image so they appear right.) 1/
428
1,209
4,570
14,352,538
Victor Lecomte retweeted
New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)
18 Dec 2024
New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
12
46
370
130,260
Victor Lecomte retweeted
22 Nov 2024
How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50 human experts on 7 challenging research engineering tasks.
15
172
833
445,767
Victor Lecomte retweeted
The Alignment Research Center (ARC) just released our first empirical paper: Estimating the Probabilities of Rare Outputs in Language Models. In this thread, I'll motivate the problem of low probability estimation and describe our setting/methods. 🧵
1
8
133
10,980
Victor Lecomte retweeted
Last week, ARC put out a new paper! The paper is a research update on the "heuristic estimation" direction of our research into explaining neural network behavior. The paper starts by explaining what we mean by "heuristic estimation", through an example and three analogies 🧵
1
6
27
2,836
Service update: Looks like my custom domain name is currently serving a scam site, despite everything looking fine on Hover. I'll try to fix this as soon as possible, but in the meantime, if you want to access any of my notes, please use vlecomte.github.io/. Sorry!
1
382
And GitHub never informed me that my account had been temporarily demoted, or that the custom domain had been taken over by another user. I've verified the domain name now, but it's kind of wild to me that the system was built on such shaky ground to start with. [3/3]
1
234
(In fact, I set up my custom domain in October 2021, less than one month before GitHub Pages added the ability to verify custom domains: web.archive.org/web/20211115…)

205
This is the first paper I've worked on at ARC, and I think it's pretty cool! 🙂
ARC's latest ML theory paper is on a formal backdoor detection game between an attacker, who must insert a backdoor into a function at a randomly-chosen input, and a defender, who must detect when the backdoor is active at runtime. (Thread)
6
824
Victor Lecomte retweeted
We sent this letter to @GavinNewsom this morning. He should sign SB 1047! 🧵
11
32
112
62,408
Victor Lecomte retweeted
Excited to share the first paper of my undergrad: "Incidental Polysemanticity" arxiv.org/abs/2312.03096! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)
Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab arxiv.org/abs/2312.03096 1/N
9
7
44
21,457
My first dabble at studying learning dynamics (and at AI safety-related work)! It was a lot of fun figuring out the exact speed at which encodings get sparser under L1-regularization; I didn't expect the math to end up being so nice. 🙂
Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab arxiv.org/abs/2312.03096 1/N
1
1
11
1,539
I've been getting into ML theory recently, and it was a pleasure to learn from @RylanSchaeffer while working on this project!
Excited to begin announcing our #NeurIPS2023 workshop & conference papers (1/10)! 🔥🚀An Information-Theoretic Understanding of Maximum Manifold Capacity Representations🚀🔥 w/ amazing cast @vclecomte @BerivanISIK @sanmikoyejo @ziv_ravid @Andr3yGR @KhonaMikail @ylecun 1/7
12
3,008
I'm still writing notes, but I've been slow at uploading them, and even slower at tweeting about them. I've now uploaded them all (you can browse them at victorlecomte.com/notes/), and I will highlight some of them over the next week or two (feel free to pester me if I forget). 😀

1
2
13
2,122
It ends up being the right quantity to look at for proving a bunch of things, including the edge-isoperimetric inequality (victorlecomte.com/notes/edge…), the level-1 inequality (victorlecomte.com/notes/leve…), and through that, even the Hoeffding bound (victorlecomte.com/notes/hoef…)!

1
1
625
Aside from that, I also wrote a note about Shearer's lemma (victorlecomte.com/notes/shea…) and one about Gilmer's breakthrough lower bound on the union-closed set conjecture (victorlecomte.com/notes/a-co…), which uses an entropic argument.

1
487