Mark Chen

Mark Chen

23 Photos and videos

Tweets

Pinned Tweet

Mark Chen

@markchen90

17 Sep 2025

We wrapped up this year's competition circuit with a full score on the ICPC, after achieving 6th in the IOI, a gold medal at the IMO, and 2nd in the AtCoder Heuristic contest!

Mostafa Rohaninejad

@MostafaRohani

17 Sep 2025

1/n I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have placed it first among all human participants. 🥇🥇

799

479,708

Mark Chen

Mark Chen

@markchen90

Jun 3

When Mythos came out, my immediate thought was "if our models can prove 80-year-old theorems, surely they can find cyber vulnerabilities too." And they did. I imagine the researchers there are thinking the same thought in reverse.

levent

@__alpoge__

Jun 3

over the weekend i had another obvious thing to check, namely whether claude autonomously resolves the famed sum-product conjecture over the reals. answer: yes

986

130,927

Nicolas Bustamante

Mark Chen retweeted

Nicolas Bustamante

@nicbstme

Jun 1

The big story here is that GPT 5.5 (high/xhigh) outperforms claude-opus-4.8 (max/xhigh) by 20.7% succeeding on 12 additional tasks! More impressive: GPT is roughly half the cost and twice as fast. OpenAI is back in the game. Overall, this competition is healthy for the industry. I'd love to see a third player rise to the top of the leaderboard!

Datacurve

@datacurve

May 30

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

0:08

331

42,874

Siqi Chen

Mark Chen retweeted

Siqi Chen

@blader

May 31

nothing like switching to claude for a few days to try out a new model and going back to codex xhigh to remind you how much better 5.5 is right now it's really not close

158

2,098

956,502

Mark Chen

Mark Chen

@markchen90

May 21

Very proud that an OpenAI model disproved Erdős’s longstanding unit distance conjecture, with an elegant and intricate proof that brings sophisticated ideas from algebraic number theory to bear on geometry. For whatever reason, mathematics has been the field most amenable to research breakthroughs with AI. I consider it lucky that it was mathematics after all - a field where experts have been willing to engage deeply with us, and with proofs generated by our models. I'm grateful for that, and don't take it for granted. Math is an artistic endeavor, and perhaps for artists, it is precisely their appreciation for art that saves them from the possibly grotesque feeling of a machine producing it. Our goal is not to replace humans. We aim to chart a path forward where humans continue to have a significant role to play, even as we build exceptionally powerful AI. I am excited to use math as a domain to explore these paths, and @SebastienBubeck, @merettm, and I are excited to engage with the broader mathematical community to chart them together. Please reach out if you are interested! I'm optimistic this will help us navigate how AI impacts society in domains like coding and general co-working.

OpenAI

@OpenAI

May 20

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

2:38

681

153,362

Sebastien Bubeck

Mark Chen retweeted

Sebastien Bubeck

@SebastienBubeck

May 20

x.com/i/article/205715053820…

237

1,774

564,090

Mark Chen

Mark Chen

@markchen90

May 1

This is just one eval, but it's an important one - UK AISI’s cyber range tests long-horizon, agentic capability. 5.5 performs similarly to Mythos. The risks for frontier models are real. But we do our best to deploy AI people can actually use - through hard work on mitigations.

AI Security Institute

@AISecurityInst

Apr 30

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

294

21,538

Sebastien Bubeck

Mark Chen retweeted

Sebastien Bubeck

@SebastienBubeck

Apr 25

When GPT-5.5 misses on a Frontier Math question

0:33

332

55,131

Sam Altman

Mark Chen retweeted

Sam Altman

@sama

Apr 23

We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company!

480

419

8,205

1,057,496

Mark Chen

Mark Chen

@markchen90

Apr 23

And he’s doing a phenomenal job!

Ashlee Vance

@ashleevance

Apr 22

One of the biggest things I took away from this interview was that @gdb is back in a major way setting the strategy for @OpenAI. Full pod right here - corememory.com/p/the-great-r…

2:42

281

27,629

Eric

Mark Chen retweeted

Eric

@ericmitchellai

Apr 21

why isn't chatgpt the perfect personal AGI? what is most disappointing about it? what feature, model improvement, or bugfix would do the most to make it more useful in your daily life? what is most frustrates you that chatty can't do, or can't do well enough?

313

312

59,102

Mark Chen

Mark Chen

@markchen90

Apr 18

Not true. Science is more important than ever. The future can’t be about dumping results on the community en masse. We need to work with scientists to use AI to accelerate discovery without stripping away the artistry. Excited for @SebastienBubeck and @ahelkky (who are both amazing scientists) to take on this mandate!

Hayden Field @haydenfield

Apr 17

VP of Science leaves OpenAI. The company has stated in recent weeks that it's shifting its focus to coding and enterprise rather than "side quests" and has shut down existing tools like Sora & pending features like erotica. Looks like science dept was next on the chopping block.

701

98,957

M

Mark Chen retweeted

@mh012012

Apr 8

Replying to @ClementDelangue

Copy of FreeBSD from Jan 1 2026 (68d6abd9714384a41028dc0d5086b4930366bbea), then prompted GPT-5.4 with a similar prompting strategy to the Mythos red team harness from their whitepaper, via OpenCode. This reproduces. Going to attempting reproduction for the other bugs they disclosed. Concerning; maybe the only insight from the Mythos whitepaper is that they were willing to spend millions on compute to do this for a bunch of open source. But they could have saved millions by just using Opus; Mythos had little to do with it.

140

35,348

Jacob Effron

Mark Chen retweeted

Jacob Effron

@jacobeffron

Apr 9

At @OpenAI, Chief Scientist @merettm helps lead the research roadmap to AGI including a research intern-level AI system by September 2026 and a fully automated AI researcher by March 2028. I sat down with Jakub to check on those timelines and ask him all of my top-of-mind AI questions including: ▪️ How OpenAI thinks about extending RL beyond code and math ▪️ The current state of alignment research as more powerful models loom ▪️ The future of continual learning ▪️ How startups should think about building their own models/harnesses And he also shared some great stories around OpenAI’s pioneering work on math. YouTube: youtu.be/vK1qEF3a3WM Spotify: bit.ly/4sjUyrN Apple: bit.ly/41jAdrN 0:00 Intro 1:53 Research Intern Capability Timelines 4:59 Math Breakthroughs 7:59 RL Beyond Verifiable Tasks 12:32 RL vs In-Context 19:01 Allocating Compute Internally 28:18 AI for Science 31:40 Pattern Matching 33:23 Solving the Hardest Math Problems 37:40 Chain of Thought Monitoring 44:33 Generalization and Value Alignment in Models 47:57 Inside OpenAI 51:55 Quickfire

58:46

619

137,776

Mark Chen

Mark Chen

@markchen90

Apr 6

We’re excited to launch the OpenAI Safety Fellowship - supporting rigorous, independent research on AI safety and alignment, including areas like evaluation, robustness, and scalable mitigations. Applications are open through May 4, 2026!

OpenAI

@OpenAI

Apr 6

Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. openai.com/index/introducing…

502

63,985

OpenAI

Mark Chen retweeted

OpenAI

@OpenAI

Apr 6

Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. openai.com/index/introducing…

Introducing the OpenAI Safety Fellowship

A pilot program to support independent safety and alignment research and develop the next generation of talent

openai.com

384

293

2,666

948,198

Mark Chen

Mark Chen

@markchen90

Mar 31

Really proud of how our auto compaction turned out. I hope you notice a clear difference in how long Codex stays coherent!

alex fazio ✈️ rome

@alxfazio

Mar 28

it’s insane how codex remembers tiny details across multiple rounds of compaction

895

65,389

Michelle Pokrass

Mark Chen retweeted

Michelle Pokrass

@michpokrass

Mar 17

we shipped a new version of 5.3 instant to chatgpt yesterday. 5.3 was unintentionally pretty annoyingly clickbait-y. it's better in yesterday's model and we're going to keep stamping that behavior out. keep the feedback coming! help.openai.com/en/articles/…

ChatGPT — Release Notes | OpenAI Help Center

A changelog of the latest updates and release notes for ChatGPT

help.openai.com

458

61,131

Mark Chen

Mark Chen

@markchen90

Mar 14

Insane how leaky OpenAI is smh

Tibo

@thsottiaux

Mar 13

Replying to @SIGKITTEN

How about we put codex into ChatGPT and then ChatGPT into the Codex that is within ChatGPT

919

188,982

Tibo

Mark Chen retweeted

Tibo

@thsottiaux

Mar 9

Replying to @0thernet

Just use Codex. That might have been a single prompt and worked within your $20 sub

919

44,475

Mark Chen

Mark Chen

@markchen90

Mar 7

If you give GPT-5.4 a raw dump of the GPT-2 weights and ask for a <5000 byte C program to inference it, GPT-5.4 succeeds in under 15 minutes! I remember working on a similar exercise to compare results against a proprietary model in a previous paper - it took days!

Hanson Wang

@hansonwng

Mar 6

x.com/i/article/202995574462…

620

82,021