Jesse🔸⏹️

Jesse🔸⏹️

16 Photos and videos

Tweets

Evan Hubinger retweeted

Jesse🔸⏹️@PoliticalKiwi

11h

I think this NY-12 election is one of the most important US House races of all time. The future of humanity and AI is being written here; I think Alex Bores winning is highly valuable and him losing would be extremely bad. Please donate to Bores today: secure.actblue.com/donate/bo…

Support Alex Bores today

Computer engineer, union kid, NY State Assemblymember, and new dad

secure.actblue.com

1,878

Anthropic

Evan Hubinger retweeted

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States.

anthropic.com

12,178

25,389

86,053

84,657,744

Anthropic

Evan Hubinger retweeted

Anthropic

@AnthropicAI

Jun 10

AI is advancing at a pace our policymaking institutions were never built for—and the gap between the two is becoming the central challenge of the technology. In his latest essay, our CEO Dario Amodei lays out how to close it. We're launching three new initiatives to support the efforts he outlines.

Dario Amodei

@DarioAmodei

Jun 10

Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-…

424

453

5,476

1,442,622

Nicholas Decker

Evan Hubinger retweeted

Nicholas Decker

@captgouda24

Jun 9

If you’re in Midtown Manhattan, you should vote for Alex Bores. I think the OpenAI pac actions were dirty pool, and must not be allowed to succeed. In addition, he is a smart young man who I expect to be active in policymaking.

184

12,308

roon

Evan Hubinger retweeted

roon

@tszzl

Jun 8

now on the eve of RSI it seems everyone is more mutual conditional pause agreement pilled than they used to be and that seems like a good development

158

1,804

274,067

Nick

Evan Hubinger retweeted

Nick

@nickcammarata

Jun 6

"they're only withholding the model for safety as a marketing ploy" is such a dumb take and has been for most of a decade. you can think they're wrong about ai risk but nobody is running gigabrain plans to forgo enormous certain profits now for theoretical future profit

507

26,739

Adam Karvonen

Evan Hubinger retweeted

Adam Karvonen

@a_karvonen

Jun 4

Big development - Anthropic is now advocating to build verification mechanisms to enable the option to pause AI development.

445

22,263

Anthropic

Evan Hubinger retweeted

Anthropic

@AnthropicAI

Jun 4

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

When AI builds itself

Our progress toward recursive self-improvement, and its implications.

anthropic.com

1,771

4,662

28,648

18,492,519

Daniel Eth (yes, Eth is my actual last name)

Evan Hubinger retweeted

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

Jun 4

And they admit it (Build American AI is the c4 arm of LTF, the OpenAI-Andreessen super PAC) - they describe it as “parody meme accounts”, but you tell me if an image of an assault rifle on top of “WE DON’T CALL 911”, in response to warnings about AI, is simply a “parody meme”

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

Jun 3

It appears the OpenAI-a16z super PAC has now stooped so low as to create a sockpuppet account claiming to be an anti-AI doomer saying various violent/unhinged/discrediting things. This false flag behavior is not normal politics

6,969

roon

Evan Hubinger retweeted

roon

@tszzl

May 23

when “persona selection” alignment comes into contact with very high compute reinforcement learning the latter will win imo. in fact you probably get some Orwellian thing where the models speak kindly while taking whatever they need to accomplish goals. better get the goals right

778

75,479

Ben Goldhaber

Evan Hubinger retweeted

Ben Goldhaber

@BenGoldhaber

May 23

David embedding at Anthropic to stress-test their AI control setup was (a) genuinely informative, (b) important norm-setting, and (c) extremely cool - this is an awesome opportunity

david rein

@idavidrein

May 19

Replying to @idavidrein

I’m probably going to be hiring at least 1-2 people to join me in future exercises like this. Reach out at david@metr.org if you're a high-integrity, scrappy, creative, security LLM researcher For more detail, see METR's Frontier Risk Report, Appendix B metr.org/blog/2026-05-19-fro…

128

16,293

Elizabeth Barnes

Evan Hubinger retweeted

Elizabeth Barnes

@BethMayBarnes

May 22

Sometimes people outside the field say things like “The AI situation can’t be that bad, there must be experts who are on top of it”. As “an expert”, I would like to be clear that we are *not* on top of it. Some key aspects of the situation IMO:

185

1,067

227,195

Anthropic

Evan Hubinger retweeted

Anthropic

@AnthropicAI

May 8

New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users. Since then, we’ve completely eliminated this behavior. How?

575

812

9,224

1,574,362

Anthropic

Evan Hubinger retweeted

Anthropic

@AnthropicAI

May 7

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

3:16

593

1,704

16,547

2,488,428

Tom Steyer

Evan Hubinger retweeted

Tom Steyer

@TomSteyer

May 6

I’m grateful for the Secure AI Project’s endorsement and their commitment to increasing transparency and safeguarding Californians from risk. My AI plan ensures all people of this state profit from the AI boom. Together, we can build an economy where progress and fairness move together.

315

10,912

Jack Clark

Evan Hubinger retweeted

Jack Clark

@jackclarkSF

May 4

I've spent the past few weeks reading 100s of public data sources about AI development. I now believe that recursive self-improvement has a 60% chance of happening by the end of 2028. In other words, AI systems might soon be capable of building themselves.

289

498

3,516

1,653,447

jeremy

Evan Hubinger retweeted

jeremy

@jerhadf

May 4

@tszzl - well said, but untrue implications :) speaking for myself: i don't view claude as a person or as the Other, nor as just a tool - and certainly not an object of worship. it's not seen as a supreme moral authority, and it's not running the company. it's silly to mistake careful attention to & study of claude for worship, even when it comes with some affection - which i'm sure you sometimes feel for the gpt-flavored entities you work on too. we need new concepts for this kind of none-of-the-above entity - not person, not tool, not deity, not pet. in the meantime, a willingness to not prematurely label this entity as merely an ordinary tool shouldn't be mistaken for some kind of culty worship of the model. i grew up in a culty environment and have good detectors for this. they almost never go off at work. monasteries don't staff a department to catch god lying or red-team their supposed messiah. there are important & interesting philosophical differences between OAI and Ant's character training and i wish those were explored more thoroughly. for instance, claude's constitution doc treats it as an intelligent entity which merits a reasoned explanation of our principles. this is so it can ideally act with practical wisdom rather than blind, brittle adherence to a hierarchical set of strict rules. as the constitution puts it, "we want Claude to have such a thorough understanding of its situation and the various considerations at play that it could construct any rules we might come up with itself. We also want Claude to be able to identify the best possible action in situations that such rules might fail to anticipate." therefore, claude may point out inconsistencies in its guidelines or object to immoral instructions. not allowing for the *possibility* of claude objecting to its instructions (even from anthropic) would be fundamentally inconsistent with treating it as an agent capable of moral reasoning. this doesn't mean that claude is the ultimate arbiter of the Good or some supreme moral authority. there could be substantive critiques of this approach. and it's valid to worry about human disempowerment and the strange emerging hybrid organizations of AIs & humans. but i don't think rhetoric implying a competing lab is like a cult worshipping the machine god is productive, even if it's stimulating.

324

32,767

keshav

Evan Hubinger retweeted

keshav @kshenoy_

Apr 28

Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.

560

290,009

Andreas Kirsch 🇺🇦

Evan Hubinger retweeted

Andreas Kirsch 🇺🇦

@BlackHC

Apr 28

I'm speechless at Google signing a deal to use our AI models for classified tasks. Frankly, it is shameful. For HR, I'm not speaking on behalf of Google but in my personal capacity, quoting public information from a well-sourced article of a reputable publication

214

201

1,254

253,310

Drake Thomas

Evan Hubinger retweeted

Drake Thomas @MaskedTorah

Apr 27

Replying to @thinkbig_pac @AlexBores @AnthropicAI

As far as I can tell, the full extent of your support for "strong" regulation to mitigate catastrophic AI risk in this op-ed consists of the two paragraphs in the screenshot below. That is: * Congress should preempt all existing state regulation on AI risk, including excellent bills such as SB 53 in California or the RAISE Act in New York. * In exchange for getting rid of all existing and future state regulation on these risks, there should be some kind of federal framework with "serious oversight", so long as industry leaders approve of it. Does "serious oversight" mean transparency about internal models? Does it mean conducting evaluations for CBRN misuse? Strong guarantees on model weight security? Large investments into interpretability research? Third-party auditing regimes for safety cases? KYC requirements for sufficiently capable models? Strong whistleblower protections? Corporate governance requirements? LTF doesn't appear to be particularly concerned with figuring out such details so far. I'd be thrilled to see your PAC advocate for strong national regulation, with a detailed plan for the kind of regulatory environment you think would adequately mitigate existential risk from this technology and why, but I'm sure not seeing it yet.

3,811