I'm a software engineer and runner.

Joined May 2007
88 Photos and videos
yoppi retweeted
After AlphaGo, the skill of human Go players noticeably improved. I suspect we will see a similar pattern in math.
Another major problem, this time in additive combinatorics, has fallen, this time to humans rather than AI, but using methods related to the AI solution to the unit distance conjecture.
187
974
9,043
785,310
MarkdownからHTMLか〜。今までもXMLタグは有効だったし、統一されるといいな
1
118
yoppi retweeted
What if instead of building one giant AI, we evolved a coordinator to orchestrate a diverse team of specialized AIs? 🐟 Excited to share our new paper: “TRINITY: An Evolved LLM Coordinator”, published as a conference paper at #ICLR2026! Paper: arxiv.org/abs/2512.04695 In nature, complex problems are rarely solved by a single monolithic entity, but rather by the coordinated efforts of specialized individuals working together. Yet, modern AI development is heavily focused on endlessly scaling up single, massive monolithic models, yielding diminishing returns. While model merging offers a way to combine different skills, it is often impractical due to mismatched neural architectures and the closed-source nature of top-performing models. To address this, we took a macro-level approach: test-time model composition. We introduce TRINITY, a system that fuses the complementary strengths of diverse, state-of-the-art models without needing to modify their underlying weights. TRINITY processes queries over multiple turns. At each step, a lightweight coordinator assigns one of three distinct roles to an LLM from its available pool: 1/ Thinker: Devises high-level strategies and analyzes the current state. 2/ Worker: Executes concrete problem-solving steps. 3/ Verifier: Evaluates if the current solution is complete and correct. By dynamically assigning these roles, the coordinator effectively offloads complex reasoning and skill execution onto the external models. What makes TRINITY unique is its extreme efficiency. The coordinator relies on the hidden states of a compact language model and a small routing head. In total, it has fewer than 20K learnable parameters. Training this system presented a massive challenge. Traditional Reinforcement Learning (REINFORCE) failed because the gradients had a low signal-to-noise ratio due to binary rewards and weak parameter coupling. Imitation learning (Supervised Fine-Tuning) was ruled out because generating multi-turn labels is prohibitively expensive. Our solution? We turned to nature-inspired algorithms. We optimized the coordinator using a derivative-free evolutionary algorithm. We found that evolution is uniquely suited to optimize this tight, high-dimensional coordination problem where traditional gradient-based methods fail. The results are very promising. In our experiments, TRINITY consistently outperforms existing multi-agent methods and individual models across various benchmarks. At the time of publication, it set a new state-of-the-art record on LiveCodeBench, achieving an 86.2% pass@1 score. More importantly, it demonstrated incredible generalization. Without any retraining, TRINITY transferred zero-shot to four unseen tasks (AIME, BigCodeBench, MT-Bench, and GPQA). On average, the evolved coordinator surpassed every individual constituent model in its pool, including GPT-5, Gemini 2.5-Pro, and Claude-4-Sonnet (the top frontier models available at the time of our #ICLR2026 submission last year). This work is central to Sakana AI's vision. We believe the future of AI isn't just about scaling monolithic models, but engineering collaborative, diverse AI ecosystems that can adapt and combine their strengths. We invite the community to read the paper and explore these ideas! Paper: arxiv.org/abs/2512.04695 OpenReview: openreview.net/forum?id=5HaR… This foundational research is part of the core engine powering our multi-agent product: Sakana Fugu 🐡👇
We’re launching the beta for our new commercial AI product: Sakana Fugu 🐡, a multi-agent orchestration system! Blog: sakana.ai/fugu-beta Fugu hits SOTA on SWE-Pro, GPQA-D, and ALE-Bench, and has been our internal secret weapon. It dynamically coordinates frontier models, autonomously selecting the optimal agent combinations and roles for each task. Available as an OpenAI-compatible API, you can seamlessly integrate Fugu into your existing workflows with minimal changes. 🐟 Fugu Mini: High-speed orchestration optimized for latency 🐡 Fugu Ultra: Full model pool utilization for deep, complex reasoning Apply for the beta test here: forms.gle/BtKkhc2CfLKk1dvNA
15
68
405
99,895
大峯奥駈道、また走るぞ〜。2泊3日計画
131
ちょっとRustわかるようになってきた
77
競プロの問題解いてみたら驚くほど書けなくなってしまって悲しい...
1
138
CCのChannels、IRCのLimeChatっぽいことができるな(結局そこに戻ってくる感)
132
ぎわさんにもいなむらさんにも会ったしNLPはゆるく繋がれていいところだ
141
最近の * 寝る前にタスク整理して、Agent Teamsに依頼して投げて寝る * 朝起きて確認してTODOタスクをVimプラグインで整理 * 粛々と繰り返して、1つめ戻る
175
AI時代に訓練されたtmuxさばきが活きるとは
1
2
242
neovim移行してみたけど、結局MacVimから逃れられない...(細やかな打鍵感の違い)
185
今、基盤モデルにアクセスできているけど、アクセスを禁じられたら困惑するくらいには、すでに、依存してしきっている
120
yoppi retweeted
巨大なLLM事前学習データを爆速で検索出来る「SoftMatcha 2」の開発に参加させてもらいました。デモ、論文、ソースコード等をこの度公開しましたので是非お試し下さい! softmatcha.github.io/v2/ 意味的類似性に基づいた置換や挿入削除に対応しながら1兆トークン規模のデータを0.1秒代で検索すると いうなかなか狂った性能になってます。EMNLP'25 Best Paperのinfini-gram-miniを含む既存のツール全てを大きく凌駕する性能だと思います。用途に特化したデータレイアウトを持つdisk-aware suffix arrayを使いながら、本来指数的になる置換・挿入・削除の候補を実データに基づきうまく枝刈りすることで高速な検索を達成してます。 この規模の事前学習データを検索出来ることの利点の事例として、論文ではベンチマークの汚染の検証をやってみてます。infini-gram-miniのような厳密な検索のみでは発見出来ないような汚染の事例なども有りそうでした。 現在デモでは数百Bトークン規模のデータからの検索を試せるようになってます。コードも公開してますのでご自身でホストしてもらうとより大規模なケースもお試し頂けます。 🌐 Demo: softmatcha-2.s3-website-ap-n… 📄 Paper: arxiv.org/abs/2602.10908 💻 Code: github.com/softmatcha/softma… 若き才能 @e869120 を始めとするSoftMatchaチームの方々との協働はとても刺激的で多くの学びがありました。楽しかった〜!ありがとうございました! @shiatsumat @go2oo2 @ksuenaga @MasWag @sho_yokoi

1兆語規模のコーパスから0.1秒単位で用例検索できるツールができてしまいました。意味的な置換・挿入・削除にも対応。 世界の Takuya Akiba と ICPC 史上初世界2位に輝いた E869120 のガチプロ2名にジョインいただき、動くわけがないと思っていたサイズでなぜか動いてます。遊んでみてください。
4
255
1,149
249,275
骨が問題だと思ってたら筋膜だったらしい(ブドウ糖注射したら良くなった)。あと、ストレッチ。少し走れるようになった😭
110
足の故障で山を走れず、土日がほんとに無
92
まって、Kimi K2.5聞いてない。fine-tuningしてみたい
1
174
左足が負荷で壊れて一ヶ月... ADIZERO EVO SLはめっちゃいいんだけど、僕の技術ではまだ練習用で使うには早かったな...
183