Google has published a paper that might end the transformer era. For the last 7 years, every major AI, ChatGPT, Claude, Gemini, has been built on the exact same architecture: The Transformer. But Transformers have a fatal flaw. To remember context, they have to process every single word against every other word. It’s called quadratic complexity. As your prompt gets longer, the compute cost explodes. The alternative is the old-school RNN (Recurrent Neural Network). RNNs are incredibly cheap and fast, but they have a fixed memory size. If you give them a long document, they get amnesia. Until today. Google researchers published Memory Caching: RNNs with Growing Memory. And it fixes the biggest bottleneck in AI. Instead of an RNN having a fixed, rigid memory that constantly overwrites itself, Google gave it a "save" button. The technique allows the RNN to cache checkpoints of its hidden states as it reads. The memory capacity of the RNN can now dynamically grow as the sequence gets longer. They built four different variants, including sparse selective mechanisms where the AI actively chooses exactly which checkpoints matter most. The results rewrite the rules of efficiency. On long-context understanding and recall-intensive tasks, these new Memory-Cached RNNs closed the gap with Transformers. They achieved competitive accuracy without the explosive, quadratic compute cost. It perfectly bridges the gap between the cheap efficiency of an RNN and the massive capability of a Transformer. We have spent billions scaling Transformers because we thought they were the only way an AI could remember a long conversation. But Google just proved we don't need to process the whole history every single time. We just needed a smarter cache.

Checkapp

Checkapp @EdwardMusinski

May 22

perplexity.ai/page/nearly-40…

Nearly 4,000 funds added AI infrastructure stocks in Q1, filings show

An analysis of more than 6,000 institutional investor filings revealed that the vast majority of hedge funds, pension funds, and endowments increased their...

perplexity.ai

AImetaall

Checkapp retweeted

AImetaall @metabookall

Apr 6

SeIf-Iearning AGI designer!

Nav Toor

@heynavtoor

Apr 5

🚨BREAKING: Researchers built an AI that designs better AI than humans can. It discovered 105 new architectures that beat human-designed models. Nobody guided it. It taught itself. The paper is called "ASI-Evolve: AI Accelerates AI." Published this week by researchers at Shanghai Jiao Tong University. Fully open-sourced. And what it demonstrates should stop every AI researcher cold. They built a system that runs the entire AI research loop on its own. It reads scientific papers. It forms hypotheses. It designs experiments. It runs them. It analyzes the results. Then it uses what it learned to design better experiments. Over and over. Without human intervention. They pointed it at neural architecture design first. Over 1,773 rounds of autonomous exploration, the system generated 1,350 candidate architectures. 105 of them beat the best human-designed model. The top architecture surpassed DeltaNet by 0.97 points. That is nearly 3 times the gain of the most recent human-designed state-of-the-art improvement. Humans spent years to get 0.34 points. The AI got 0.97 on its own. Then they pointed it at training data. The AI designed its own data curation strategies and improved average benchmark performance by 3.96 points. On MMLU, the most widely used knowledge benchmark, the improvement exceeded 18 points. Then they pointed it at learning algorithms. The AI invented novel reinforcement learning algorithms that outperformed the leading human-designed method GRPO by up to 12.5 points on competition math. Three pillars of AI development. Data. Architecture. Algorithms. The AI improved all three by itself. Then they tested whether what the AI built actually works in the real world. They applied an AI-discovered architecture to drug-target interaction prediction. It achieved a 6.94 point improvement in scenarios involving completely unseen drugs. The AI designed something that works better than human experts in biomedicine. This is the first system to demonstrate AI-driven discovery across all three foundational components of AI development in a single framework. The recursive loop is now closed. AI is building AI. And it is already better at it than we are.

Checkapp

Checkapp retweeted

Checkapp @EdwardMusinski

Apr 4

Safe AGI Alliance: bookmark & share this, protect friends worldmonitor-two-wine.vercel…

Conflict Monitor - Real-Time Geo & Conflict Dashboard

Real-time geo and conflict dashboard with live news streams, conflict tracking, sanctions, outages, weather alerts, and OSINT signals.

worldmonitor-two-wine.vercel.app

Alison Gopnik

Checkapp retweeted

Alison Gopnik @AlisonGopnik

May 17

So pleased that our paper on empowerment and causal models is out and freely available as part of this impressive special issue on world models and AI, with Melanie Mitchell, Josh Tenenbaum, Tom Griffiths and many other stars. royalsocietypublishing.org/r…

World models, artificial general intelligence and the hard problems of life–mind continuity: toward...

Abstract. This special issue examines how natural and artificial intelligences (AIs) model the world, and what this modelling reveals about cognition and r

royalsocietypublishing.org

164

11,426

Haider.

Checkapp retweeted

Haider.

@haider1

May 17

Yann LeCun says that within a year to 18 months, we'll have a general method for training hierarchical world models These models would learn from video and real-world data, then help plan actions in robotics, healthcare, and other areas "then scale them toward a universal world model"

1:30

122

1,107

200,944

Checkapp

Checkapp @EdwardMusinski

May 17

Michael Levin and Yann Lecun are making world models a next big thing in AI. Both attended AGI alliance Davos and invited to our next AI Academy UAE week Saisum.world

saisum.world

Strong AI Summit Safe AGI for all

saisum.world

Adam Safron

@adamsafron

May 16

It is the deepest honor to have been joined by Michael Levin (@drmichaellevin), Victoria Klimaj, Zahra Sheikhbahaee (@zah_bah), Dalton Sakthivadivel (@DaltonSakthi), Adeel Razi (@adeelrazi), David Ha (@hardmaru), Nick Hay, Kevin Schmidt, Irina Rish (@irinarish), David Krakauer (@sfiscience), Melanie Mitchell (@MelMitchell1), Samuel Gershman (@gershbrain), and Joshua Tenenbaum in organizing this special issue of the Royal Society’s (@RSocPublishing) Philosophical Transactions A: “World models, A(G)I, and the Hard problems of life-mind continuity: Toward a unified understanding of natural and artificial intelligence” royalsocietypublishing.org/r… This collection was motivated by a question with far reaching implications, ranging from the fundamental nature(s) of mind to choices that may determine the future of our civilization/species: what kinds of world modeling capabilities are likely to be realized by which kinds of minds and what world might we be in with respect to increasingly advanced artificial intelligences? Will the scaling and refinement of present approaches result in AI with human-like (and beyond) cognitive abilities, or do we need radically different paradigms that more closely follow the principles of natural intelligence? Learning “world models” to predict/compress information may be how biological learners so efficiently learn (to learn) to achieve goals and generalize that knowledge across a broad range of task environments. World models may also be useful for reverse-engineering forms of “System 2” cognition, or the self-reflexive, deliberate, multi-step reasoning associated with cognitive capabilities that may be unique to humans. Predictive models that reflect how the world may be causally modified by actions allow agents to adaptively control their behavior with flexibility and context-sensitivity. Spatiotemporally and causally coherent models of the physical world may not only be the key for creating AIs that we can rely on for real-world deployment, but may even be the (dynamic) core of conscious cognition. The contributions to this special issue consider the varieties of world models worth modeling from diverse points of view: Douglas Hofstadter explores whether sufficiently coherent self-referential world modeling could ground meaning, consciousness, and a genuine “I” in future AI systems. David Krakauer (@sfiscience), Melanie Mitchell (@MelMitchell1), and John Krakauer (@blamlab) examine the principles of emergent intelligence from a complex systems perspective. Alexander Ku (@alex_y_ku), Declan Campbell, Xuechunzi Bai (@baixuechunzi), Jiayi Geng (@JiayiiGeng), Ryan Liu (@theryanliu), Raja Marjieh (@RajaMarjieh), R. Thomas McCoy (@RTomMcCoy), Andrew Nam, Ilia Sucholutsky (@sucholutsky), Liyi Zhang (@LiyiZhang_Leo), Jian-Qiao Zhu (@JQ_Zhu), and Thomas Griffiths (@cocosci_lab) argue for using the tools of cognitive science to understand and evaluate LLMs across multiple levels of analysis. Evelina Leivada (@EvelinaLeivada), Gary Marcus (@GaryMarcus), Fritz Günther, and Elliot Murphy (@ElliotMurphy91) test whether LLMs deeply understand language and the “world behind words,” or primarily learn surface statistical regularities. Pedro Tsividis (@ptsividis), João Loula, Jake Burga, Juan Pablo Rodriguez, Sergio Arnaud, Nate Foss (@_npfoss), Andres Campero, Ajay Subramanian (@ajaysub110), Thomas Pouncy, Samuel Gershman (@gershbrain), and Joshua Tenenbaum introduce a theory-based meta-learning architecture inspired by the remarkable flexibility and efficiency of human cognition. Eunice Yiu (@eunice_yiu_), Kelsey Allen, Shiry Ginosar (@shiryginosar), and Alison Gopnik (@AlisonGopnik) explore empowerment, controllability, and causal reasoning as means of understanding the remarkable learning abilities of both child and adult minds. Nadav Amir, Stas Tiomkin, and Angela Langdon investigate how goals shape the structure of experience and how the world modeling abilities of natural intelligences may be inseparable from values. Vickram Premakumar, Michael Vaiana, Florin Pop (@FlorinPop17), Judd Rosenblatt (@juddrosenblatt), Diogo Schwerz de Lucena, Kirsten Ziman, and Michael Graziano show unexpected benefits of self-modeling as an inductive bias and regularizer for training artificial agents. Hanlin Zhu, Baihe Huang, and Stuart Russell analyze why model-based reinforcement learning may fundamentally outperform model-free approaches in representational efficiency. Bradly Alicea (@balicea1), Morgan Hough (@mhough), Amanda Nelson, and Jesse Parent (@JesParent) revisit fundamental cybernetic principles of regulation, adaptation, and world modeling across a wide assortment of complex adaptive systems. Francesco Sacco (@FrancescoSacco1), Dalton Sakthivadivel (@DaltonSakthi), and Michael Levin explore topological constraints on self-organization and suggest that biological systems maintain long-range coherence in ways that are fundamentally different from current transformer architectures. Georg Northoff (@NorthoffL), Yasir Catal, and Samira Abbasi examine how biological intelligence may depend on capabilities for flexible “inner time” to ensure adaptive alignment between the dynamics of system and world. Nicolas Rouleau (@DrNRouleau) and Michael Levin explore whether theories of consciousness generalize beyond brains to unconventional embodiments and living systems more broadly. Benjamin Lyons and Michael Levin investigate economies and collective intelligence as systems coordinated by “cognitive glues” in the form of shared models of scarcity and value. Katherine Collins (@katie_m_collins), Umang Bhatt (@umangsbhatt), and Ilia Sucholutsky (@sucholutsky) consider “Rogers’ paradox” to demonstrate ways in which collective learning is impacted by different kinds of human-AI interactions. Ruairidh Battleday (@RMBattleday) and Samuel Gershman (@gershbrain) distinguish between the “easy” and “hard” problems of science, and describe how while current AI systems demonstrate powerful narrow forms of optimization with respect to well-defined inference-spaces, further developments are needed for achieving capabilities for novel scientific discovery. Fritz Breithaupt (@FritzBreithaupt) explores narrative world models and the roles of uncertainty and transformative experiences in natural intelligences, suggesting that coherent agency may depend on better understanding human-like meaning-making. Taken together, these diverse perspectives suggest that while LLMs can clearly learn powerful generative models of language, they likely do so without having world models of sufficient spatiotemporal and causal coherence to achieve human-like reasoning abilities, capacities for generating subjective conscious experiences, or pathways to realizing artificial general superintelligence. However, by further developing world modeling architectures, we may eventually be able to create forms of intelligence that recapitulate the remarkable flexibility and generality of human intelligence. Finally, enhanced (e.g. more coherent/integrated) world models may not only afford expanded capabilities, but could potentially help ensure that increasingly powerful AI systems achieve both inner and outer alignment with human(e) values.

Checkapp

Checkapp @EdwardMusinski

May 17

linkedin.com/feed/update/urn…

AI Market Reality Check: Beyond Illusion and Hype | Srini Pagidyala posted on the topic | LinkedIn

You don’t explain it to the crocodile. Because you can’t. You let reality exhaust it. That’s most of the AI market right now. The crocodile saw the illusion of a real deer and attacked. The AI market...

linkedin.com

Checkapp

Checkapp @EdwardMusinski

May 7

linkedin.com/posts/adambry_a…

Draper Software Release Boosts GPS Accuracy with In-House Solver | Adam Bry posted on the topic |...

An invisible but massive technology step forward is coming in our Draper software release. X10 will now use an in-house GNSS (GPS) solver. We’ll still use the same GPS hardware, but now the math and...

linkedin.com

Checkapp

Checkapp @EdwardMusinski

Apr 19

Hottest debates of the week

The AI Investor

@The_AI_Investor

Apr 16

"You’re not talking to someone who woke up a loser” - Jensen Huang Jensen nearly lost his composure during a heated debate about selling chips to China, despite showing tremendous patience in response to the pushback.

3:13

Checkapp

Checkapp @EdwardMusinski

Apr 9

I know Masa as ASI advocate since 2020, now he is ultra bullish group.softbank/en/philosophy…

Message from Chairman & CEO | SoftBank Group Corp.

Message from Masayoshi Son, Chairman & CEO, SoftBank Group Corp. - Realizing Artificial Super Intelligence (ASI) for the Evolution of Humanity

group.softbank

Checkapp

Checkapp @EdwardMusinski

Apr 5

Binary AI skills building. self-improvement (RSI). Almost every lab now uses previous-generation models to build the next one. It's not fully automated yet "what's missing is long-horizon planning and full automation"

Lex Fridman

@lexfridman

Apr 2

Replying to @karpathy

Same, I have a similar setup. A mix of Obsidian, Cursor (for md), and vibe-coded web terminals as front-end. Since I do a podcast, the number/diversity of research interests is very large. But the knowledge-base approach has been working great. For answers, I often have it generate dynamic html (with js) that allows me to sort/filter data and to tinker with visualizations interactively. Another useful thing is I have the system generate a temporary focused mini-knowledge-base for a particular topic that I then load into an LLM for voice-mode interaction on a long 7-10 mile run. So it becomes an interactive podcast while I run, where I ask it questions and listen to the answers to learn more. Anyway, heading out for a run now, thanks for the write-up 👊

Andrej Karpathy

Checkapp retweeted

Andrej Karpathy

@karpathy

Apr 2

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

2,886

7,229

59,758

21,348,769

Michael Levin

Checkapp retweeted

Michael Levin

@drmichaellevin

Apr 4

New #preprint: @BeneHartl @LPiolopez arxiv.org/abs/2604.01932 "BraiNCA: brain-inspired neural cellular automata and applications to morphogenesis and motor control" Abstract: Most of the Neural Cellular Automata (NCAs) defined in the literature have a common theme: they are based on regular grids with a Moore neighborhood (one-hop neighbour). They do not take into account long-range connections and more complex topologies as we can find in the brain. In this paper, we introduce BraiNCA, a brain-inspired NCA with an attention layer, long-range connections and complex topology. BraiNCAs shows better results in terms of robustness and speed of learning on the two tasks compared to Vanilla NCAs establishing that incorporating attention-based message selection together with explicit long-range edges can yield more sample-efficient and damage-tolerant self-organization than purely local, grid-based update rules. These results support the hypothesis that, for tasks requiring distributed coordination over extended spatial and temporal scales, the choice of interaction topology and the ability to dynamically route information will impact the robustness and speed of learning of an NCA. More broadly, BraiNCA provides brain-inspired NCA formulation that preserves the decentralized local update principle while better reflecting non-local connectivity patterns, making it a promising substrate for studying collective computation under biologically-realistic network structure and evolving cognitive substrates.

BraiNCA: brain-inspired neural cellular automata and applications...

Most of the Neural Cellular Automata (NCAs) defined in the literature have a common theme: they are based on regular grids with a Moore neighborhood (one-hop neighbour). They do not take into...

arxiv.org

448

23,746