Building AI architectures and models that autonomously and continually learn, evolve, and reason.

Joined June 2011
27 Photos and videos
Pathway (www.pathway.com) retweeted
Exactly why I’m so bullish on @pathway_com. 🧠 Next-Gen AI ⚡️ Incredible performance 🔋 Ultra compute-efficient
Citadel Securities just put institutional weight behind what the AI bulls won't say out loud. In a new macro note titled "Tokenomics," Citadel makes the argument plainly: even the most powerful technology on earth still has to pass through the boring discipline of cost curves, capacity limits, and marginal returns. The evidence is piling up: – Amazon removed its token usage leaderboard – Microsoft cancelled Claude Code subscriptions – Multiple companies reporting unexpectedly massive token bills Their conclusion is the part that matters. Adoption is no longer about what AI can do in principle. It's becoming about the price and scarcity of the inputs needed to run it at scale. Compute. Power. Cooling. Memory bandwidth. Inference budgets. All real, all binding constraints. And here's the kicker from the chart. The Silicon Data LLM Token Expenditure Index, a benchmark for how much the market is actually spending on AI tokens, has started rolling over. Citadel reads it as a shift toward cheaper models. Companies substituting away from expensive frontier AI toward "good enough" alternatives. That's economics 101 doing what it always does. When the price of something rises, people use less of it, or find a cheaper version. Citadel sees a bifurcation forming. Frontier AI concentrated among a few firms with the balance sheets to absorb the cost. Everyone else quietly downgrading to simpler, cheaper models. This is the part of every technology revolution the early narrative ignores. The technology being real was never the question. The question was always whether the economics could carry the valuations. When one of the most sophisticated trading firms on earth starts writing about AI in the language of cost curves and rationing instead of limitless demand, the conversation has quietly changed. The hype was about what AI could do. The reckoning is about what it costs.
3
5
186
Pathway (www.pathway.com) retweeted
There are co-founders. Then, there are friends. And then, there is @JChorowski. Jan Chorowski and his story. From working on attention at MILA, through speech to Google Brain, all the way to BDH.
6
9
25
4,808
Pathway (www.pathway.com) retweeted
This is probably the most entertaining way to understand one of AI’s hardest AI debates. Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring. Both technically deep and genuinely entertaining. I was glued for the entire 1 hour 20 minutes. So many super cool points to learn. 🥊 Transformers - Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today. - The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering. - The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast. - Scaling is still the brutal rule. If you give Transformers more compute, more data, and more parameters, they usually keep getting better. Any Post-Transformer architecture has to scale just as well, or better. - It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up. - A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch. - Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute. 🥊 Post-Transformer - Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence. - The biggest Post-Transformer target is native reasoning and continual learning. Today’s LLM reasoning often feels like text-based step-by-step work added on top, instead of thinking happening naturally inside the model. - Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words. - Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt. - Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time. - The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops. - The most interesting possibility is that Transformers may help discover their own successor. AI agents are already getting better at research and coding, so the next architecture may come from AI-assisted architecture search. ------- - Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence. - Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality. --- Overall, Transformers continue to dominate, but the frontier is clearly widening. Pathway’s BDH (Dragon Hatchling — brain-inspired reasoning architecture), Sakana AI’s CTMs (Continuous Thought Machines — models that think over time), and Liquid AI’s LFMs (Liquid Foundation Models — efficient multimodal foundation models) - all of these show how the frontier is expanding. --- From “Pathway (pathway[.]com)” Youtube channel (link in comment) @zuzanna_pathway
7
25
104
89,026
Pathway (www.pathway.com) retweeted
200K views and counting! The Transformer vs Post-Transformer debate, convened by @pathway_com Ft @lukaszkaiser, @adrian_pathway, @YesThisIsLion, @mlech26l, @dexhorthy, and me. Watch on YouTube: youtube.com/watch?v=hCjoMLuC… Follow along for the next one.
1
6
15
4,859
Pathway (www.pathway.com) retweeted
“We have not yet had a PageRank moment for intelligence.” We’ve got so many comments and questions about this statement delivered by @adrian_pathway during our recent Transformer vs Post-Transformer debate with @lukaszkaiser @YesThisIsLion @mlech26l - thanks! Let’s dig into it. In the 1990s, web search already existed. We could index information. AltaVista existed. The web was growing fast. Then PageRank happened. That moment combined three things: 1. A simple but deep mathematical idea: treat the web as a giant graph and compute a stationary distribution of a *random walk* on that *graph* 2. A scalable implementation: large-scale graph computation on huge clusters 3. A company that integrated and scaled the idea end-to-end: Google That combination gave search a much clearer center. It stopped being just a pile of heuristics and started to look more like: here is the mathematical object we need to compute, now let’s build the systems needed to compute it well. Adrian asked Lukasz Kaiser directly whether he sees a PageRank-like idea inside the Transformer. Lukasz said no. For intelligence, we still do not have that kind of unifying operator or process. We do not yet have an agreed mathematical object that says: this is the core computation behind it. That missing unifier is what Adrian meant by the absent “PageRank moment for intelligence.” That is also the main idea behind our work on BDH, our Post-Transformer architecture. We are after that fundamental “platform discovery” for intelligence. The full Transformer vs Post-Transformer debate is a good place to go deeper on these topics. Link below.
2
5
14
366
Here's a great starting point for you to understand the Transformer vs Post Transformer Debate convened by @zuzanna_pathway! Credits @rohanpaul_ai.
This is probably the most entertaining way to understand one of AI’s hardest AI debates. Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring. Both technically deep and genuinely entertaining. I was glued for the entire 1 hour 20 minutes. So many super cool points to learn. 🥊 Transformers - Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today. - The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering. - The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast. - Scaling is still the brutal rule. If you give Transformers more compute, more data, and more parameters, they usually keep getting better. Any Post-Transformer architecture has to scale just as well, or better. - It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up. - A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch. - Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute. 🥊 Post-Transformer - Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence. - The biggest Post-Transformer target is native reasoning and continual learning. Today’s LLM reasoning often feels like text-based step-by-step work added on top, instead of thinking happening naturally inside the model. - Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words. - Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt. - Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time. - The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops. - The most interesting possibility is that Transformers may help discover their own successor. AI agents are already getting better at research and coding, so the next architecture may come from AI-assisted architecture search. ------- - Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence. - Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality. --- Overall, Transformers continue to dominate, but the frontier is clearly widening. Pathway’s BDH (Dragon Hatchling — brain-inspired reasoning architecture), Sakana AI’s CTMs (Continuous Thought Machines — models that think over time), and Liquid AI’s LFMs (Liquid Foundation Models — efficient multimodal foundation models) - all of these show how the frontier is expanding. --- From “Pathway (pathway[.]com)” Youtube channel (link in comment) @zuzanna_pathway
2
2
12
77,241
Pathway (www.pathway.com) retweeted
Last week’s Post-Transformer debate post raised one question: Can long term memory become part of the architecture? It points to one promising mathematical idea behind Post Transformer AI: Linear attention in high dimension with persistent state. In a standard Transformer, memory is handled through caching context. The model keeps previous keys and values in small dimension d, then attends over them. But this is still token history. BDH (Dragon Hatchling) – one of the Post-Transformer architectures, takes a different route. The paper describes BDH's state space as fixed and large, with the macro interpretation of associative memory, like KV cache, but organized differently. Each layer has a persistent state matrix: ρₗ ∈ Rⁿˣᵈ Here: n = neuronal or concept dimension d = low rank synaptic dimension d << n The key idea is that state is aligned to neurons, in high dimensional space (n in the order of billions). A Transformer stores token history.Whereas BDH-GPU (a tensor-friendly version of the BDH architecture) evolves state, similar to State-Space Models. This is where the brain analogy becomes useful. The brain does not append every experience into a longer transcript. It has a large bounded substrate of neurons and synapses, where experience changes connections sparsely and with high parallelism. BDH GPU expresses a related idea computationally: not memory as a longer context window, but memory as a large, evolving internal state. Why it matters: – no Transformer style hard context window. practically enabling a infinite context window in a reasoning model. – linear attention in a large neuronal dimension – sparse positive activations – persistent state instead of only token history The deeper insight: Long horizon reasoning may not come from storing more tokens. It may very well come from better state dynamics.
7
25
112
7,104
Pathway (www.pathway.com) retweeted
The full Transformer vs Post-Transformer debate is live. 80 minutes. Seven rounds. No slides. Real disagreement. @lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next. 00:00 Contenders enter the ring 06:30 Lukasz Kaiser defends the Transformer 10:08 Adrian Kosowski on BDH and the PageRank Moment for AI 17:35 Llion Jones: Why Transformers aren't the final architecture 29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI 40:28 Reasoning Beyond Language 44:15 Scaling Laws: Transformer vs Post Transformer 50:31 Benchmarks, Coding Models, and Perplexity 1:04:00 Continual Learning and Dynamic Weights This is the ultimate source of truth on the subject.
17
21
213
1,374,475
Pathway (www.pathway.com) retweeted
One deep learning debate every AI researcher should care about: Transformers vs Post Transformers. At the surface, it sounds like an architecture fight. Mathematically, it is about scaling laws, memory, online learning in frontier models, and hardware limits. That is what made the recent debate interesting. It featured @lukaszkaiser, @adrian_pathway, @YesThisIsLion, and @mlech26l, hosted by @zuzanna_pathway. Transformers won the last era because multi head self attention scales empirically and fits the hardware ecosystem extremely well. But the next bottleneck may be different. Full self attention has O(n²) compute pressure with sequence length. Transformer LLMs do not natively have persistent long-term memory. RAG retrieves. Longer context conditions. Neither necessarily forms new reasoning patterns inside the model. That is why continual learning is becoming central, recently covered by @a16z. The open questions: – How can models learn after deployment without catastrophic forgetting? – How can long term memory become part of the architecture? – How can models reason over longer horizons without paying infinite context costs? – How can hardware and AI architectures co-evolve more efficiently? – And, are we chasing the right benchmarks with these goals in mind? These questions were tackled head on, with counters from @lukaszkaiser, Transformer co-inventor and core contributor to ChatGPT and GPT models. The image below summarizes some notes from the 80 minute debate.
4
23
77
5,798
Pathway (www.pathway.com) retweeted
Computer Science & Engineering Department Chair Martín Farach-Colton hit the red carpet alongside his mentee @zuzanna_pathway , whose company @pathway_com was recognized as one of @FastCompany 's Most Innovative Companies. #NYUTandonMade
1
3
12
541
Pathway (www.pathway.com) retweeted
先日サンフランシスコで開催された討論会「Transformers vs Post-Transformers」に、Sakana AIの共同創業者兼CTOであるLlion Jones @YesThisIsLion が登壇しました。 本イベントは、現在のAI界を牽引するアーキテクチャ「トランスフォーマー」について、論文共著者を含む4人が、トランスフォーマー 支持と、継続学習や潜在空間での推論を武器にその次を見据える「ポスト・トランスフォーマー」の支持に分かれ、これからのAIの未来をどちらが形作るのかを深く議論する場となりました。 その中でLlionは、トランスフォーマーの原論文の共著者でありながら、現在のトランスフォーマーの有用性は十分に認めつつも、あえてポスト・トランスフォーマー側に立ち、その先のアーキテクチャの可能性を論じる役割を担いました。 Llionは、現在のトランスフォーマーの成功は構造そのものによるものではなく、並列処理に優れたハードウェア(GPU/TPU)に適応できたことによる「計算資源の力技」による側面が大きいと分析。それと並行して全く異なる前提に立つアーキテクチャを探る重要性を提起しました。 さらに、今後の研究コミュニティに対して、既存のベンチマークや現在のハードウェアの制約から解放されるべきだと提唱。「次の革新的なアーキテクチャは、初期段階ではトランスフォーマーより遅く、精度も劣るかもしれない。しかし、それを恐れずに全く異なる前提のシステムを探求すべきだ」と、研究姿勢そのものの変革を訴えました。 Sakana AIはトランスフォーマーをベースとした研究開発と並行して、次世代アーキテクチャの探求にも研究にも取り組んでおり、Llion自身が関わっている、生物学的な脳に倣った新アーキテクチャであるContinous Thought Machine(CTM)などはその一例です。 刺激的な議論の場を提供してくださった主催者の皆様、そして登壇者の皆様に心より感謝申し上げます。 当日の討論会の様子は、こちらからご覧いただけます: x.com/zuzanna_pathway/status… 🐟 @zuzanna_pathway
The full Transformer vs Post-Transformer debate is live. 80 minutes. Seven rounds. No slides. Real disagreement. @lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next. 00:00 Contenders enter the ring 06:30 Lukasz Kaiser defends the Transformer 10:08 Adrian Kosowski on BDH and the PageRank Moment for AI 17:35 Llion Jones: Why Transformers aren't the final architecture 29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI 40:28 Reasoning Beyond Language 44:15 Scaling Laws: Transformer vs Post Transformer 50:31 Benchmarks, Coding Models, and Perplexity 1:04:00 Continual Learning and Dynamic Weights This is the ultimate source of truth on the subject.
5
28
186
32,336
Pathway (www.pathway.com) retweeted
This was so much fun!!
The full Transformer vs Post-Transformer debate is live. 80 minutes. Seven rounds. No slides. Real disagreement. @lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next. 00:00 Contenders enter the ring 06:30 Lukasz Kaiser defends the Transformer 10:08 Adrian Kosowski on BDH and the PageRank Moment for AI 17:35 Llion Jones: Why Transformers aren't the final architecture 29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI 40:28 Reasoning Beyond Language 44:15 Scaling Laws: Transformer vs Post Transformer 50:31 Benchmarks, Coding Models, and Perplexity 1:04:00 Continual Learning and Dynamic Weights This is the ultimate source of truth on the subject.
1
4
11
1,240
Pathway (www.pathway.com) retweeted
May 19
never has the ai research world encountered so much whimsy. great hanging with @zuzanna_pathway @adrian_pathway @YesThisIsLion @mlech26l @lukaszkaiser and learning about what comes after transformers youtube.com/watch?v=hCjoMLuC…
3
1
21
2,573
The conversation that anchored last week's debate is now public. Not blog posts. Not Twitter threads. Four researchers who wrote the foundational papers, making the case for what shapes the Post-Transformer era. 📍Transformer vs Post-Transformer: The Deciding Round | May 2026 | San Francisco
The full Transformer vs Post-Transformer debate is live. 80 minutes. Seven rounds. No slides. Real disagreement. @lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next. 00:00 Contenders enter the ring 06:30 Lukasz Kaiser defends the Transformer 10:08 Adrian Kosowski on BDH and the PageRank Moment for AI 17:35 Llion Jones: Why Transformers aren't the final architecture 29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI 40:28 Reasoning Beyond Language 44:15 Scaling Laws: Transformer vs Post Transformer 50:31 Benchmarks, Coding Models, and Perplexity 1:04:00 Continual Learning and Dynamic Weights This is the ultimate source of truth on the subject.
1
2
14
406,366
Pathway (www.pathway.com) retweeted
Transformers unlocked massive productivity gains for startups and enterprises alike. But bigger models and more compute aren't solving the 95% failure rate in enterprise AI. The problem? An architecture with no memory. Pathways BDH is rethinking the foundation — building a post-transformer approach to enterprise AI on AWS that learns and adapts over time.
1
4
11
1,472
What comes after the Transformer? Zuzanna Stamirowska puts the debate out in the open, with the very inventors of Transformer and Post-Transformer architectures! Watch the 5-minute highlights. Follow @zuzanna_pathway and hit the bell, full fight drops tomorrow.
Transformer vs Post-Transformer: The 5-minute KO compilation is live now. 🥊 @lukaszkaiser (co-invented Transformer & co-created ChatGPT) @adrian_pathway (invented BDH and is CSO of Pathway) @mlech26l (co-invented LNNs & is CTO of Liquid AI) @YesThisIsLion (co-invented Transformer with Łukasz, now CTO of Sakana AI) Moderated by @dexhorthy (CEO, HumanLayer) and me. Full debate drops soon. Turn on notifications to catch the complete fight. This is the ultimate source of truth on the subject.
3
6
20,023
Pathway (www.pathway.com) retweeted
I largely agree with @YesThisIsLion on this. The biggest mistake right now is expecting the first Post-Transformer models to beat Transformers on day one by delivering massive gains on irrelevant axes.
3
4
12
439
2 Transformer co-authors. 2 post-transformer inventors. 1 REAL boxing ring. 📍San Francisco
Said it would be a real fight. IT WAS. 🥊 @adrian_pathway: “Transformers think in language. They do not think in latent thought.” @mlech26l: “I am convinced that the Transformer will find its own replacement.” @YesThisIsLion: “Lukasz is going to be correct up until that day, and then he is going to be wrong forever.” @lukaszkaiser: “Do not be scared of being 50-times slower!! If you show me a model that is 50-times slower but on a better slope, you win.” Good thing I told them to keep it clean, look at them! 😂 Transformer Vs Post Transformer: Deciding Round, By @pathway_com
3
4
233
Pathway (www.pathway.com) retweeted
The post-transformer era is here. We’ll shape it together. Heavens! What a fight!🥊 Last night in San Francisco we brought four of the inventors building today’s and tomorrow’s AI architectures into the ring for the deciding round. A year ago, when we said the post-transformer era was coming, most people saw it as just a research concept. Yesterday the room was packed with people who flew in because they felt the shift is real. Research peers from OpenAI, DeepMind, Anthropic, xAI, and NVIDIA. Leaders scaling AI at the largest banks (Goldman, BlackRock, Visa, First Citizens, Merck) and internet companies (Google, Meta, Apple, Microsoft, LinkedIn, Salesforce, Walmart, Waymo). AI investors and diplomats. And founders of other deeptechs in the space. Beyond the stage, what stood out was the willingness to question defaults, surface real disagreements, and keep pushing toward better answers. For this field, and for the people who will build on what we make. Thank you to the fighters who brought that energy: @adrian_pathway (Pathway), @lukaszkaiser (OpenAI), @mlech26l (Liquid AI), @YesThisIsLion (Sakana AI). You did give me a "good clean fight for the AI Champion Title" 🤣. And to @dexhorthy for being my co-moderator that this electric ring needed. Big thanks to our friends in the Bay for spreading the word. I’ll share more over the coming days — there's a lot worth pulling out of those few hours. 📍Transformer vs Post-Transformer: The Deciding Round
4
8
27
2,474