What makes an AI model good in 2026 is no longer its architecture. It's the part the lab won't publish. Everyone converged on the same linear hybrid, so the edge moved to data, RL, and tool use. My new piece: byin-cwi.github.io/MatrixWeb…#LLM
I updated my personal website:
byin-cwi.github.io/MatrixWeb…
I’ll use it to share research notes and personal reflections on learning algorithms, brain-inspired AI, LLMs, agents, scaling laws, memory, and neural network dynamics.
First posts are already online. More to come.
We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable).
What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t 1}) → m_{t 1} labels.
We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients.
Website: akarshkumar.com/smt/
arXiv: arxiv.org/abs/2606.06479
How do beta and gamma rhythms impact synaptic integration in pyramidal neurons? Our latest paper that examines this question studied this using a biophysically detailed pyramidal neuron model. Here’s what we found. A 🧵below.
elifesciences.org/articles/9…
1/10
Our NeuroAI study made it onto the cover of Nature Machine Intelligence (@NatMachIntell) ❤️.
In it, we demonstrate that a developmentally-inspired visual diet can drastically improve the robustness of ANN-based vision systems.
open access, open code, open weights, open science
Be the scientist, not just the researcher.
AI has already replaced much of research labor.
BUT it hasn’t replaced taste, judgment, courage, or the ability to ask the question that changes everything.
The "model weights are unreadable" excuse just died in one paper.
Neural networks have billions of parameters.
Nobody really knows what each one does.
A new paper introduces adVersarial Parameter Decomposition.
The method splits a model's weights into small, single-purpose subcomponents.
Each piece handles one specific job.
Things like emoticon prediction or gender identification.
Only a tiny fraction fires on any given input.
How they pulled this off:
1. Break weights into simple parts
2. Ablate components adversarially during training
3. Keep only what preserves behavior
The breakthrough is attention.
The technique decomposes attention computations even when spread across multiple heads.
That problem went unsolved for three years.
It scales beyond toy networks to real four-layer language models.
The authors now consider it a serious competitor to sparse autoencoders.
You can trace information flow through attribution graphs.
You can hand-edit specific behaviors and predict the outcome.
VPD makes weights legible.
The idea is simple
Take a PageIndex-style document tree.
Turn it into an evidence graph.
Use query-conditioned Personalized PageRank to let relevance flow across the graph.
Not just top-k chunks.
Not just one routed tree path.
Connected evidence.
Code: github.com/byin-cwi/PageRank…
I ran a retrieval-only FinanceBench annotation-derived eval. Graph propagation helps recover evidence missed by local tree routing. So retrieval improved clearly.
Answer generation is the next bottleneck.
My current takeaway:
Standard RAG retrieves chunks.
PageIndex retrieves document structure.
PageRankRAG retrieves evidence paths.
For long-document RAG, structure may not be enough.
We may need relevance propagation over that structure.
Code: github.com/byin-cwi/PageRank…
The entire RAG stack may be missing one layer.
Vector RAG retrieves similar chunks.
PageIndex @PageIndexAI retrieves document tree nodes.
PageRankRAG retrieves connected evidence paths.
I built a graph propagation layer on top of PageIndex-style document trees.
The geometry of neural dynamics along the cortical attractor landscape directly reflects changes in attention, as large-scale brain activity shifts across its hills and valleys depending on the state.
nature.com/articles/s41467-0…
📣 Submit your work to HBAI 2026!
The 5th International Workshop on Human Brain and Artificial Intelligence, co-located with IJCAI 2026, invites papers on the frontier of AI and brain research.
🗓 Deadline: May 15
🔗 hbai2026.github.io#NeuroAI#BrainInspiredAI#IJCAI2026
Preprint out arguing that we should build the techology to translate (compile) molecularly annotated connectomes into dynamics. I think this is incredibly important. arxiv.org/abs/2603.25713
The weather's looking good for tomorrow's Artemis II launch, and our teams are getting the rocket ready for liftoff!
Read the latest updates on our mission around the Moon: go.nasa.gov/4tiFY4P
Google Turbo Quant running Locally in Atomic Chat
MacBook Air M4 16 GB
Model: QWEN3.5-9B
Context window: 50000
Summarising 20000 words in just seconds..
You can do 3x larger context window, processing 3x faster than before!
0:35
Community note
Atomic Chat is a rebranded copy of Jan (github.com/janhq/jan,41K stars) with git history deleted. 99% of code is unchanged.
TurboQuant support is from TheTom's fork, not original work. System prompt still says "You are Jan, trained by Menlo Research." No attribution given.
github.com/janhq/jangithub.com/TheTom/llama-c…