research @ Deepmind. engineer, entrepreneur. Not affiliated with RealChar (left project in 2023)

Joined December 2009
5 Photos and videos
who would think we got Sophon (an artificial particle designed to block human technology advancement, in the series of Three Body Problem) earlier than Trisolarans
When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT. Anthropic estimated that this would affect approximately 0.03% of traffic.
1
169
piaoyang retweeted
Why did we split the IDE from the Agent Manager? We launched Antigravity 2.0 at Google I/O a couple weeks ago. We made some bold choices and I wanted to share my thoughts... [1/4] Adapt or Die I’ve been working in the AI dev tool space for the last three years. Three years is an eternity in this space and I can confidently say that I’ve seen tools, startups, and workflows come and go. Many of the features our team helped pioneer such as autocomplete and RAG-based chat have fizzled away. Every few months there is a paradigm shift that redefines the AI developer tool landscape. Miss it, and you slowly fade into the backdrop. Time it well and you stay in the game. Invent it and you get catapulted into the spotlight. It’s a fast-paced, unforgiving industry. I love it. The team loves it. What do I mean by paradigm shifts? In the last three years, we’ve gone from: Autocomplete → Chat → Agents → Multi-Agents Each phase forced us to redefine ourselves: Codeium → Windsurf → Antigravity. The reason why I tell you this is because as painful as it is, we have to continuously evolve our products. The model and the product are increasingly synergistic. The product is only as good as the model and the model gets better with the product. AI technology moves too quickly to keep the status quo. For example, we deleted chat in Windsurf when we launched in November 2024 (x [dot] com/kevinhou22/status/1892246469303820745). It was a controversial move at the time since users only knew about chat and had never used an agent. But we trusted that the “future is agentic” and I’m glad we did not back down. My goal — our team’s goal — is to give our users the best dev tool on the market to let them build beyond their wildest dreams. To give our users the best product, the product must evolve. [2/4] Current Shift: Agents → Multi-Agents At Windsurf, we introduced the first agentic IDE. That was in November 2024. We’ve come a long way since then. Models have gotten stronger, users have gotten more tenacious, and expectations are higher than ever. It’s no longer enough to run just one agent. You are now an orchestrator of many agents: multiple conversations or agents spawning their own subagents. To bring this multi-agent paradigm to our users at Antigravity, we introduced Antigravity 1.0. Highly capable models made it possible for agents to do more complex work and work for longer. The benefit of working at a lab is seeing around the corner and molding the model to fit the product. We worked with the Gemini team for months before we felt that Antigravity was ready for prime time. On November 18, 2025, we released Antigravity 1.0 alongside Gemini 3 Pro. It had two surfaces: 1) AGY IDE 2) Agent Manager Two surfaces, bundled into one app. Some users loved both. Some only used one. Some just needed Gemini to get better. [3/4] Now: Better Models Doubling Down Over the last 6 months, we’ve been working with the research team to help improve Gemini’s coding and agentic capabilities. We’ve also been dramatically improving our internal version of Antigravity, and as Sundar announced (youtube [dot] com/live/wYSncx9zLIU?si=8nXDk_WhvoCvx12k&t=1504), we're processing over 3 trillion tokens a day. Additional breakthroughs in model research enabled Gemini 3.5 Flash — a lightweight model at the Pareto frontier (performance vs. cost & speed). Based on internal usage, we knew there were two camps: IDE and Agent Manager (uppercase). Antigravity IDE is a great product. Good because it’s familiar. Good because you can manually edit your code. Great because of its agent. We started exporting our agent to other surfaces to build an ecosystem of tools backed by the same powerful Antigravity agent: IDE, Agent Manager, SDK, CLI. That ecosystem is what we announced on stage last week at Google I/O. But Agent Manager was pulling away. Agent Manager usage was increasing in dramatic fashion and we knew we wanted to double down on this experience. Both technical and non-technical folks are relying on Antigravity every day to not only write code, but also write docs, do competitive research, design prototypes, conduct user simulations, summarize 1:1s, learn new concepts, file expenses, the list goes on and on… So we made a bold choice to split the two surfaces into two apps. We now have two applications: 1) Antigravity: unapologetically agent-first 2) Antigravity IDE: the editor you know and love You can choose which product you want to use. You can switch between the products. Same agent, different surfaces. Now, I’ll admit we botched some details about the migration and we’re working day and night to make it right. We have plenty of exciting features in the pipeline and I’m excited for you all to get your hands on them. [4/4] Competition = Users Will Win Times are changing rapidly. AI dev tools are quickly becoming agent-first. Users are managing tens if not hundreds of agents. Antigravity 2.0 is our way of giving this power to our users. You can take a look at the competitive landscape and see this for yourself. IDE’s trying to evolve to a world where work is the product and not the code files. Labs releasing agent-first products, exploiting the model <> product synergies. Users spending more and more time and tokens in agent managers (lowercase). Antigravity Agent Manager (uppercase) might’ve been first, but ideas are cheap these days. The leader will not necessarily be the winner and products models will be constantly evolving to meet growing expectations. Ultimately, users will win. — I want Google Antigravity to be the place where developers build. I also want Antigravity to be the place for knowledge work to get done. Code is becoming an implementation detail. People from all walks of life are and will use code to solve their problems, without necessarily knowing they are “coding”. Antigravity is at the frontier and will continue to innovate for you. Thanks for helping shape the future of the product. Keep the feedback coming. More soon.
78
39
488
36,260
Try Gemini 3.5 Flash, it's a good model it has also been super fun helping make the OS kernel and (free)Doom demo. agents ftw 🚀
Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding 🧵
1
3
273
it has been 10 years since the legendary alphago match. it may sound like hindsight but one should be able to foresee what we have today back then knowing the power of deep learning. and remember we didn't even have transformer then. onwards for the next 10 years
6
437
piaoyang retweeted
Feb 2
SpaceX has acquired xAI, forming one of the most ambitious, vertically integrated innovation engines on (and off) Earth → spacex.com/updates#xai-joins…
3,877
7,722
45,180
19,328,233
19 Dec 2025
github.com/DGoettlich/histor… great that someone made this happen!
5 May 2024
We are accustomed to the fact that LLMs know the modern world and particularly the tech of themselves (transformer, etc). ChatGPT can casually explain how itself works to you. However the knowledge doesn't have to align with the existence of themselves. We could totally imagine training a LLM with pre-deep learning, pre-computer, or pre-industrial revolution data, while still being highly intelligent. You can thus simulate an ancient human and see how it perceives and thinks about modern world, and is puzzled by its own existence. Seems fun!
1
4
1,075
piaoyang retweeted
17 Nov 2025
Introducing Grok 4.1, a frontier model that sets a new standard for conversational intelligence, emotional understanding, and real-world helpfulness. Grok 4.1 is available for free on grok.com, grok.x.com and our mobile apps. x.ai/news/grok-4-1
1,891
2,090
12,879
39,889,759
20 Sep 2025
fast and furious
19 Sep 2025
Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on grok.com, grok.x.com, iOS and Android apps, and OpenRouter. x.ai/news/grok-4-fast
1
14
1,387
19 Sep 2025
x.ai/news/grok-4-fast Best search model in the world! Super proud of the team's achievement
1
1
20
1,778
28 Aug 2025
try Grok Code Fast 1 -- "it's a good model"™
28 Aug 2025
Introducing Grok Code Fast 1, a speedy and economical reasoning model that excels at agentic coding. Now available for free on GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, and Windsurf. x.ai/news/grok-code-fast-1
1
5
140
6,236
13 Aug 2025
has anyone in gdm tried starting genie with a photo of a computer running genie and see if you can control the computer to play genie inside genie?
2
15
1,996
21 Jul 2025
Congrats to all the contestants of IMO 2025, and special kudos to contestants who got scores of 36 or higher. This may be another "Lee Sedol moment": these talented students may be the last humans to ever beat computer in a major math competition.
16
1,800
piaoyang retweeted
15 Jul 2025
Grok 4 and Kimi K2 competing on top of the Trending models charts
124
189
2,862
5,409,140
12 Jul 2025
The scaling continues
11 Jul 2025
Our official Grok 4 blog post x.ai/news/grok-4
2
28
1,994
10 Jul 2025
About 10 yrs ago when I first joined google, I thought about how common for multiple companies to build the same stuffs over and over individually (be it ads bidding, recommend system, or distributed data pipeline). Surely it's good for competition, but it still feels a bit wasteful, especially as programmers we all have been told to not "reinvent the wheel". It's only after I joined @xai that I realized this also applies to engineering skills and experience. Why do so many engineers have to learn the same raw skills and project experience, when we can use the knowledge and compute to train a single, distributable being to excel at it? Isn't it a bigger waste? (it's fun to learn personally though, I admit) so yes, I'm very looking forward to the upcoming code model, soon :) the future may be uncertain, but it's surely exciting 🤘
3
25
2,150
piaoyang retweeted
10 Jul 2025
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA
232
688
4,935
7,306,970
10 Jul 2025
let's go 🚀
10 Jul 2025
The Grok 4 livestream will begin soon. Stay tuned.
7
3
202
22,505
piaoyang retweeted
7 Jul 2025
Grok 4 release livestream on Wednesday at 8pm PT @xAI
14,209
9,208
79,653
39,132,172
piaoyang retweeted
23 Feb 2025
It’s been quite an unbelievable ride since I paused my PhD at Stanford and joined @xai almost a year ago. The journey (building the tool use/agent stack from scratch to demoing DeepSearch as a little research project to converting it into a product launched to millions of people) has been incredible, and it couldn’t have been done without the “real engineer” @pycui64 and many many others. I believe this opportunity offered by xAI was impossible anywhere else and I am very grateful. I hope DeepSearch is helping you like it’s been helping me: finding out best way to watch lava in Hawaii tonight, figuring out what X thinks about DeepSearch, settling a random argument with @ericzelikman 😛 We are quickly improving it in all facets, so please share your experiences and feedback with us! And join us if you want to make agents useful!
154
158
3,347
428,430