Nathaniel Li

Nathaniel Li

4 Photos and videos

Tweets

Nathaniel Li @natliml

Jun 5

We're releasing the preparedness report for Muse Spark Contemplating, MSL's extreme reasoning model, benchmarking its capabilities and behaviors in biology, cybersecurity, and more!

13,385

Nathaniel Li

Nathaniel Li @natliml

Jun 5

ai.meta.com/static-resource/…

746

Dina Powell McCormick

Nathaniel Li retweeted

Dina Powell McCormick

@DinaPowellMcC

Jun 3

President Trump's AI Executive Order helps keep America in the lead on AI — encouraging innovation and taking security seriously. We appreciate the Administration's support for public-private sector collaboration and look forward to continuing to work with the White House as it implements the President's plan.

Director Michael Kratsios

@mkratsios47

Jun 2

Today @POTUS signed an EO that keeps America leading in AI while putting frontier AI capabilities to work strengthening our cyber defenses. AI systems are now the most powerful tools we have ever had to harden our cyber infrastructure and stay ahead of adversaries. It is a real blessing that these capabilities are being developed by American industry, and not by those who would use them against us. whitehouse.gov/presidential-…

270

41,896

Nathaniel Li

Nathaniel Li @natliml

May 4

My team at MSL is hiring for talented ML researchers & engineers with experience in agentic security, dual-use capability evaluations, and/or post-training of frontier AI models! If this sounds exciting, please reach out and DM me with your resume!

334

38,854

Nathaniel Li

Nathaniel Li @natliml

May 4

metacareers.com/profile/job_…

4,168

David Sacks

Nathaniel Li retweeted

David Sacks

@davidsacks47

Jan 15

224

1,480

111,279

Nathaniel Li

Nathaniel Li @natliml

Apr 12

Leading on AI will accelerate scientific discovery, empower workers, and power the next great generation of American enterprise. With Muse Spark, we released Meta's Advanced AI Scaling Framework to show how we build AI to benefit and protect the public. ai.meta.com/blog/scaling-how…

Scaling How We Build and Test Our Most Advanced AI

As we build more capable, personalized AI, reliability, security, and user protections are more important than ever.

ai.meta.com

Alexandr Wang

@alexandr_wang

Apr 8

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

454

Alexandr Wang

Nathaniel Li retweeted

Alexandr Wang

@alexandr_wang

Apr 8

745

1,191

10,372

4,552,813

The White House

Nathaniel Li retweeted

The White House

@WhiteHouse

25 Nov 2025

President Donald J. Trump launches the GENESIS MISSION - a national effort to accelerate scientific breakthroughs using AI and propel America into the GOLDEN AGE OF INNOVATION. 🚀

1,354

2,996

12,672

656,759

Nathaniel Li

Nathaniel Li @natliml

24 Sep 2025

We're releasing a Preparedness report for Code World Model, assessing its bio, cyber, and chemical security capabilities in alignment with our Frontier AI Framework. We also provide nascent evaluations on the propensities of CWM, including honesty! ai.meta.com/research/publica…

Summer Yue

@summeryue0

24 Sep 2025

We’re excited to share our preparedness report on Code World Model (CWM), FAIR’s latest open-weight model for code generation and reasoning. This report was developed by the SEAL team and the AI Security team, marking our first external publication since part of SEAL joined Meta just 1.5 months ago. It reflects our ongoing commitment to safety and alignment as we look ahead to future open-source and frontier models. We’re actively working to broaden our evaluation coverage and improve our processes, and we welcome collaboration and feedback from the broader research community. For more details, please read the full report: ai.meta.com/research/publica…

1,464

Nathaniel Li

Nathaniel Li @natliml

9 Aug 2025

I joined @Meta AI, running preparedness and security evaluations with @summeryue0 and @_julianmichael_ to ensure that Superintelligence's newest models enable a prosperous future. Grateful for the team they built at @Scale_AI and excited for the critical work ahead.

188

19,377

Nathaniel Li

Nathaniel Li @natliml

22 Apr 2025

We're releasing a multimodal benchmark for troubleshooting complex virology protocols. Expert virologists score only 22.1%, even with internet access and answering questions in their subdomains of expertise. Frontier LLMs score up to 43.8%.

Dan Hendrycks

@hendrycks

22 Apr 2025

Can AI meaningfully help with bioweapons creation? On our new Virology Capabilities Test (VCT), frontier LLMs display the expert-level tacit knowledge needed to troubleshoot wet lab protocols. OpenAI’s o3 now outperforms 94% of expert virologists.

1,512

Nathaniel Li

Nathaniel Li @natliml

23 Jan 2025

LLM benchmarks are getting quickly saturated. In response, we're releasing Humanity's Last Exam, a benchmark of extremely hard questions across math, sciences, humanities & more. Thanks to @cais, @scaleai, and thousands of expert question contributors.

Dan Hendrycks

@hendrycks

23 Jan 2025

We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get <10% accuracy and are highly overconfident. @ai_risk @scaleai

1,905

Nathaniel Li

Nathaniel Li @natliml

23 Jan 2025

Dataset and code can be found at lastexam.ai!

417

Alexandr Wang

Nathaniel Li retweeted

Alexandr Wang

@alexandr_wang

7 Sep 2024

ICYMI: New research from SEAL that demonstrates human red-teamers massively outperform automated methods over multiple turns. We also release MHJ, a dataset of multi-turn jailbreaks to help enable further research into multiturn red teaming

Nathaniel Li @natliml

27 Aug 2024

Who's better at LLM mischief — humans or AIs? Spoiler: It's us. Human red teamers achieve 70% attack success rates against LLM defenses that stump automated adversarial attacks. Why? We’re better at adversarial yapping.🧵

20,421

Summer Yue

Nathaniel Li retweeted

Summer Yue

@summeryue0

27 Aug 2024

Can robust LLM defenses be jailbroken by humans? We show that Scale Red teamers successfully break defenses on 70 % of harmful behaviors, while most automated adversarial attacks yield single-digit success rates. 🧵

24,238

Riley Goodside

Nathaniel Li retweeted

Riley Goodside

@goodside

27 Aug 2024

Humans, noted virtuosi of adversarial yap, remain #1 at trolling LLMs! New research from @scale_AI's SEAL team shows human red teamers achieve 70% success rates against LLM defenses that stump automated attacks, exploiting their susceptibility to multi-turn jailbreaks.

Nathaniel Li @natliml

27 Aug 2024

15,270

Hugh Zhang

Nathaniel Li retweeted

Hugh Zhang

@hughbzhang

27 Aug 2024

I had the great pleasure of sitting next to Nat during his internship. Nat would always come in with a huge smile on his face and brighten the day of everyone around him. He’s been hard at work all summer, and I’m super glad to see this work finally come out!

Nathaniel Li @natliml

27 Aug 2024

2,987