Co-Founder and CEO @SakanaAILabs 🎏

Joined November 2014
4,476 Photos and videos
Pinned Tweet
I’m incredibly proud of The AI Scientist team for this milestone publication in @Nature. We started this project to explore if foundation models could execute the entire research lifecycle. Seeing this work validated at this level is a special moment. I truly believe AI will forever change the landscape of how scientific discoveries and scientific progress are made.
The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s41586-0… Blog: sakana.ai/ai-scientist-natur… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Scien…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune
75
154
1,164
232,957
Today, we are officially launching the Sakana AI RSI Lab in Tokyo to build open-ended, adaptive AI systems that collectively self-improve. I am incredibly proud of our team’s work over the past 2 years, shipping the breakthrough research that laid the foundations for this moment. Building in Japan provides us with the ultimate design constraint. Just like Japan’s historical dominance in manufacturing was achieved by fundamentally redesigning the factory floor to do more with less, we are focused on compute-efficiency. We are not building the most compute-hungry self-improvement engine. We are building the most sample-efficient one. If you are entirely unsatisfied with the brute-force status quo and ready to build the self-improving future in Japan, come join us.
Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀 sakana.ai/rsi-lab Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI. While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality: ▪ LLM²: AI models automating research to invent better preference optimization algorithms. ▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance. ▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models. ▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning. ▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity. ▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature. Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve. Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI. We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good. We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.
62
89
867
106,391
We’ve been laying the foundations for RSI over the last 2 years. Now, I am looking for a select group of distinguished frontier researchers and engineers to join our core RSI team. sakana.ai/careers/member-of-… If you have a proven track record at the frontier, but find yourself entirely bored with the status quo of brute-force scaling, this is your call.
4
5
50
12,352
Member of Technical Staff (RSI Lab) sakana.ai/careers/member-of-… If you are a visionary builder ready to move to Tokyo and engineer the engine of recursive discovery, we invite you to apply.
5
9
70
13,483
On TV Tokyo’s WBS (@wbs_tvtokyo) tonight I’ll be discussing Sakana AI’s upcoming 1T parameter model project, supported by METI’s GENIAC initiative. We are scaling up to build Japan’s first 1T parameter agent-native model, specifically optimized for long-horizon deep research and autonomous tool use. Many exciting announcements coming up very soon, stay tuned! 🚀
今夜22:00放送 テレビ東京WBS (@wbs_tvtokyo) 経産省のAI開発支援プロジェクト「GENIAC」採択について、取材を受けました。弊社CEOのDavid Ha (@hardmaru) とResearch Scientistの菅沼が、私たちの戦略や日本発のAIが世界を変える可能性について語ります。ぜひご覧ください!
11
21
113
24,694
Why Japanese companies do so many different things: The internal logic of the world’s strangest corporations davidoks.blog/p/why-japanese… TL;DR > you have a firm that has lots of lifetime employees who can’t be fired, and whose skills are tailored to what your firm needs rather than to a particular occupational category transferable to any employer > the system only makes sense if the company is also insulated from outside pressure > the Japan-style company, run by its employees and largely indifferent to the interests of shareholders, exists simply to continue existing > And that basic impulse toward survival is why Japanese companies are so insistent on diversification. If you’ve made a commitment to keep people employed for life, then you need to create jobs for them if their current jobs stop making sense > If you’re not very worried about profitability, and have lots of well-trained generalist employees, then it makes perfect sense to reinvest your company’s earnings by expanding into new industries
13
22
205
21,284
hardmaru retweeted
最新の🐟リサーチ、イーロンにも興味を持ってもらえて嬉しいです!ありがとう!🚀
Replying to @hardmaru
Interesting
11
43
521
93,270
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblock… What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffusio… 🐟
154
638
5,765
741,644
Forecasting Scientific Progress with Artificial Intelligence arxiv.org/abs/2605.22681 Turns out AI is just as bad at forecasting biology and physics breakthroughs as we are. To be fair, most breakthroughs cannot be predicted. Science is more like an evolutionary search process. Though ironically, LLMs are pretty good at predicting its own AI benchmarks…
科学の進歩は、どこまでAIで予測できるか? 最先端のAIにより未来の科学的成果を予測する能力を検証する論文が、オックスフォード大学、スタンフォード大学、@Allen_AI などの研究者との共著で発表されました。Sakana AIのリサーチサイエンティスト山田祐太朗が共同著者として参加しています。 arxiv.org/abs/2605.22681 seanwu25.github.io/CUSP-Scie… 本研究では、AIの科学予測能力を評価するベンチマーク「CUSP」を提案し、4,760件の科学イベントを用いて検証を行いました。その結果、現在の最先端モデルは有望な研究方向を見分けることはできる一方で、それが実現するか、いつ実現するかの予測は難しいことが分かりました。また、これらの限界は学習データの量だけでは説明できないことも示されています。 この結果から改めて分かるのは、科学は依然としてオープンエンドな営みであり、最先端のAIをもってしても、その発展の方向性を予測することは難しいということです。AIは科学の進歩を予言する存在ではなく、人間と共にその探索を進める協働者として最も力を発揮するでしょう。 さまざまなAIと人間の創造性が組み合わさることで、科学はこれからも予想できない方向に展開していくはずです。Sakana AIも、山田が開発者として携わってきたAI Scientistをはじめ、科学の発展に資するAI開発に努めていきます。
27
24
127
21,683
People keep asking if AI will replace software engineers. I believe the exact opposite. Thanks to the Jevons paradox, AI tools are making great engineers 10x more productive, allowing us to tackle much harder, larger-scale problems. We’re expanding our SWE teams at @SakanaAILabs We have 5 new open roles, including English-speaking R&D and Platform roles. Come build the future of AI with us in Tokyo! 🐟
【採用情報】「Software Engineer」の5ポジションが現在オープン! sakana.ai/careers 「AIが進化すれば、ソフトウェアエンジニアの仕事はなくなるのか?」 Sakana AIは、全く逆だと考えています。 AIツールの登場で開発効率が劇的に向上する一方、ジェボンズのパラドックス(Jevons paradox) が示すように、私たちが解決できる課題の幅と規模が拡大し、優秀なSoftware Engineerの需要はかつてなく高まっています。 事実、Sakana AIでは、AI支援ツールを駆使して最前線で活躍し、AIそのものを社会実装していくSoftware Engineerの採用をかつてない規模で強化しています! 現在、以下の5つの専門領域で募集を公開中です。詳細はリンク先をご覧ください。 🐙 こんな挑戦が待っています ・Enterprise: AI技術を組み込んだアプリケーションのFrontend〜Backendまでの一貫した設計・開発および運用 ・Defense & Intelligence: 日本の防衛・インテリジェンス分野に、AIを活用したソフトウェアで貢献 (※本ポジションは性質上、日本国籍保有等の要件がございます) ・Product: 自社AIプロダクトのUI/UXからバックエンド・インフラまでのフルスタック開発 ・Platform: LLMエージェントを支える強固なインフラ・データプラットフォームの設計・構築 (English req, 日本語 is a plus) ・Research and Development: ML研究と製品開発を繋ぎ、研究を加速させるツールやフルスタックインフラを構築 (English req, 日本語 is a plus) 🐡 こんな方を求めています ・Frontend / Backend / Infrastructureのいずれか複数領域での実務経験をお持ちの方 ・AI支援コーディングツールを活用し、チームで自律的に開発を進められる方 ・AIシステム開発や、0→1でのプロダクト立ち上げ経験がある方はさらに歓迎! フルタイムに加え、業務委託・インターンシップと柔軟な働き方が可能です(※ポジションにより異なります)。 最先端のAI技術を自らの手で社会へ届け、変革の波を創り出したい方。ぜひご応募ください。
32
18
243
51,825
hardmaru retweeted
5月25日、内閣総理大臣官邸で開催された車座対談に、Sakana AIの伊藤錬が参加しました。 金融など各産業に特化した高度なAI実装の実例をご紹介しました。また、海外の先端モデルも活用しつつ、日本独自の技術で、我が国の防衛の自律性やデータ主権を確保する具体的方法についても高市総理と貴重な意見 交換の機会をいただきました。
本日、スタートアップ4社(Atomis、Sakana AI、Oceanic Constellations、OptQC)の経営者の皆様と、日本のスタートアップ戦略について、意見交換の機会をいただきました。 進行役は、経団連スタートアップ委員長の南場DeNA会長にお務めいただきました。 高市内閣では、基礎研究を含めた科学技術研究の基盤を強化し、イノベーションを通じた経済成長や国際的地位の確保を目指す「新技術立国」を掲げています。 本日お話を伺い、「スタートアップ」は、日本が誇る優れた研究成果を実用化していく「主要な担い手」として極めて重要であることを改めて強く認識しました。 また、お集まりいただいた4社が取り組まれている、次世代材料、国産AI、無人水上機、光量子コンピュータといった先端技術が、高市内閣が掲げる17の戦略分野の勝ち筋を切り拓くものであることを確信しました。 政府の役割についても、 ・技術シーズを産業化するための大規模投資を実現するための投資促進 ・政府がスタートアップの製品を試験導入する仕組みを新たに導入することによる政府調達の予見可能性の拡大 ・(私が先日お会いしたアンドリーセン・ホロウィッツのような)ベンチャーキャピタルが日本のスタートアップのエンジンとして世界市場への展開を後押しする金融環境の整備 など、重要な御示唆をいただきました。 また、スタートアップ担当大臣である城内大臣が先週とりまとめた『スタートアップ総力創出パッケージ』では、SBIR制度を抜本強化して、従来の研究開発支援を超えて本格調達につなげる試験導入の新たな枠組みを創設することとしています。 いずれも『日本成長戦略』に、しっかりと反映させていきます。 日本のスタートアップ・エコシステムの更なる発展に向けて、官民一体となって取組を進めてまいります。
7
45
208
42,870
The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/sparser-… ⚡️
How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/sparser-… While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

ALT Sparser, Faster, Lighter Transformer Language Models Scaling autoregressive LLMs has driven unprecedented progress but comes with vast computational costs. In this work, we tackle these costs by leveraging unstructured sparsity within an LLM's feedforward layers, the components accounting for most of the model parameters and execution FLOPs. To achieve this, we introduce a new sparse packing format and a set of CUDA kernels designed to seamlessly integrate with the optimized execution pipelines of modern GPUs, enabling efficient sparse computation during LLM inference and training. To substantiate our gains, we provide a quantitative study of LLM sparsity, demonstrating that simple L1 regularization can induce over 99% sparsity with negligible impact on downstream performance. When paired with our kernels, we show that these sparsity levels translate into substantial throughput, energy efficiency, and memory usage benefits that increase with model scale.

52
507
3,467
431,303
If you want to look under the hood at the actual custom CUDA kernels and see exactly how we implemented the TwELL format for H100 GPUs, we’ve released the reference code. GitHub: github.com/SakanaAI/sparser-… Blog: pub.sakana.ai/sparser-faster… 🐟
7
8
64
12,870
Reproducing all of Schmidhuber’s papers (1990-2025) using an AI coding assistant. Cool project by @yaroslavvb! It even reproduced the “World Models” paper by me and @SchmidhuberAI with a toy env, with a full VAE RNN world model implementation. Project: github.com/cybertronai/schmi…
44
155
1,089
94,973
The agent also reimplemented the “Blues Improvisation” experiment by @douglas_eck and @SchmidhuberAI in 2002 which show that LSTMs can learn temporal structure in music. Finding temporal structure in music: Blues improvisation with LSTM recurrent networks sferics.idsia.ch/pub/juergen…
6
13
70
10,394
hardmaru retweeted
Great collab with @SakanaAILabs on an #ICML26 paper about sparse transformer kernels formats optimized for modern NVIDIA GPU execution. • TwELL sparse packing • Fused CUDA kernels • 20% inference/training speedups at scale Paper code below 👇
The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/sparser-… ⚡️
15
78
530
64,285
For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. By training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language. What surprised me most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers. Really excited to see where this paradigm of “AI managing AI” goes next, especially as we start moving from single-agent chain-of-thought to multi-agent “chain-of-command”. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Along with our TRINITY paper which we announced earlier, this work also powers our new multi-agent system: Sakana Fugu (sakana.ai/fugu-beta) 🐡
Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A… (ICLR 2026)
40
178
1,448
183,801
Learning to Orchestrate Agents in Natural Language with the Conductor Fugu Blog: sakana.ai/fugu-beta Paper: arxiv.org/abs/2512.04388 🐡
2
5
23
8,346
Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model sakana.ai/fugu-beta/
2
6
3,865
If GitHub were built in: Japan 🇯🇵 China 🇨🇳 North Korea 🇰🇵 The EU 🇪🇺
If Japan built Github
20
55
335
119,506