Mathematician. Mostly here to observe 🕳️🐇

Joined March 2017
4 Photos and videos
Aleph-Null retweeted
Synchronization is all you need. Transformer attention emerges from Kuramoto oscillator dynamics, the same equations that sync fireflies and metronomes. No softmax, no exponentials. Just a physical network relaxing to equilibrium. Watch it write a story, one word at a time 👇🏽
5
22
126
12,361
Aleph-Null retweeted
Terrific release from @nvidia and my former PhD student @rohansawhney1: A GPU physics solver for fundamental problems like electrostatics and heat transfer, which handles extremely complex geometry without any mesh generation or basis approximation. Based on Monte Carlo walk on spheres methods developed by our group and others. See this page for lots of background info/tutorials: rohan-sawhney.github.io/mcgp…
Releasing Walk on Spheres Extensions (WoSX): a GPU-accelerated C /Python library for Monte Carlo physics simulation on complex geometry Think path tracing but for physics beyond light transport: heat, electrostatics, potential flow, deformation & more! github.com/nv-tlabs/wosx
6
111
1,203
104,228
Aleph-Null retweeted
Gradient descent on neural networks frequently drives the sharpest Hessian eigenvalue to exactly 2/learning_rate. This is the Edge of Stability. For five years, ML theory has failed to explain why this happens globally from any initialization. Until now. 🧵
13
62
512
59,292
Aleph-Null retweeted
Wish I found this one, for you guys before today! If you're not a pure mathematician and you're struggling with functional analysis, check out this masterpiece titled ''An Introduction to Functional Analysis for Science and Engineering'' by David Miller (Stanford University). If you're looking for a non formal primer, this is it. I'll keep it brief, go and check it out! 🔗👇
6
64
606
29,447
Aleph-Null retweeted
At maximum likelihood estimator, observed Fisher information = (expected) Fisher information. From 2nd Taylor expansion of likelihood: - likelihood curvature = Fisher information. - radius of osculating circle=Variance of MLE for large sample size
1
25
179
8,207
Aleph-Null retweeted
Replying to @yxy2168
The weight norm is actually much easier to understand than this What is the maximum compression for a layer weight matrix? This is simply the point of which the covariance volume stops changing and the variance itself remain remains bounded This can be formalized using renormalization group method from theoretical physicists/chemistry More over, it can be testing empirically Not only does it tell you the optimal state but it tells you what the optimizer should do to achieve it
2
1
2
515
Aleph-Null retweeted
A Chicago philosopher wrote one book in 1940 proving that 95% of the books you have read in your life, you didn't actually read, and Charlie Munger has been telling people to read it for 50 years. His name was Mortimer Adler. He spent 40 years at the University of Chicago, ran the editorial board of the Encyclopædia Britannica, and built his entire career on one uncomfortable observation about the people around him. Most adults who called themselves well-read had not actually read a book in the real sense even once. They had run their eyes over the pages, registered the words, formed a vague impression, and put it back on the shelf. The book had passed through them without ever entering them. In 1940 he wrote How to Read a Book. It has stayed in print for 86 years. Charlie Munger recommends it. Naval Ravikant recommends it. Fareed Zakaria recommends it. Every serious thinker who builds a career on absorbing information eventually finds their way to this book, and the reason is that Adler had isolated something nobody else was naming clearly. There are four levels of reading. Almost everyone is stuck on the second one. The fourth level is so different from what most people call reading that you have probably never done it in your entire life. Level one is elementary. You learn it as a child. You decode the letters into words and the words into sentences. You finish the sentence and understand roughly what it said. This is reading the way a 7-year-old reads, and almost every adult on earth has stopped developing past this point in some quiet way. Level two is inspectional. This is skimming. You move through a book quickly to figure out what it is broadly about. You read the back cover, scan the table of contents, glance at a few paragraphs, and form an opinion. Most adults who claim to have read 50 books a year are actually doing this. They are inspecting books, not reading them. They walk away with a vague sense of the argument and almost none of the evidence that supports it. Level three is analytical. This is the level Adler said most people have never properly experienced. You take one book and you wrestle with it for as long as it takes. You identify the question the author is trying to answer. You map their argument from front to back. You write your disagreements in the margins. You force yourself to articulate, in your own words, what the author is claiming and why. The point is not to finish the book. The point is to argue with it as if the author were sitting across the table from you. Most people never do this once in their life, because it is exhausting and slow and feels nothing like the reading they were taught as children. Level four is the one almost nobody knows exists. Adler called it syntopical reading. The word means "across topics," and the technique is something closer to running a small private research lab in your own head. You pick a single question that actually matters to you. How does power corrupt people. Why do civilizations collapse. What makes a marriage last. How does a person change their own mind. Then you assemble five or ten or twenty books from different authors, different centuries, different traditions, all of them taking a swing at the same question. You do not read any of them cover to cover. You move between them. You find the chapter in book three that addresses the same question as the chapter in book seven. You force those two authors to argue with each other inside your own head. The book stops being the unit of reading. The question becomes the unit. And the authors become voices in a conversation you are now hosting. This is the level where reading stops being consumption and starts being construction. You are no longer absorbing what someone else thinks. You are building a position of your own out of the friction between people who disagreed. Adler argued that this is the only level of reading where you stop being a passive receiver of other people's ideas and start being someone who can produce ideas of their own. The reason Charlie Munger has been recommending this book for 50 years is that this is exactly how Munger has always thought. He calls it building a latticework of mental models. The technique he is describing is just syntopical reading applied for a lifetime. You take the strongest insight from psychology, the strongest insight from biology, the strongest insight from economics, and you stack them against the same problem until something new falls out the bottom. The reason most people never reach level four is not that it is intellectually difficult. It is that it is logistically uncomfortable. It requires you to keep multiple books open at once. It requires you to take notes that nobody is going to grade. It requires you to abandon the goal of finishing books and replace it with the goal of answering questions. This is also why AI just changed everything Adler was teaching. NotebookLM, Claude, and tools like them let you do syntopical reading at a speed that would have looked like magic to a Chicago philosopher in 1940. You upload 10 books on the same question. You ask the AI to surface every place those authors agree and every place they contradict each other. The technique Adler said almost nobody on earth had reached can now be run on a Sunday afternoon by anyone with a laptop and one good question. The technique was always the unlock. The bottleneck used to be time. The bottleneck is now curiosity. Most people will keep reading the way they always have. A book at a time. Eyes over the pages. No question driving it. No other authors in the room. Adler called that level two for a reason. You are not behind on your reading list. You are behind on the level you are reading at.
65
828
3,498
213,868
Aleph-Null retweeted
preparing for the future of mathematics
9
84
630
41,596
Aleph-Null retweeted
Statistical Mechanics asks an extremely hard question How do we track the macroscopic state of a gas of N particles when its full description lives in 6N-dimensional phase space? We don't track it directly. We replace the microscopic state with a probability density over phase space and work with its reduced projections instead.
24
36
254
9,750
Aleph-Null retweeted
co-occurence is a noisy signal that can be distorted by distribution shifts. In LLMs distortion can be cured considering context, see e.g. figure below from arxiv.org/abs/2602.15029 on time/space embeddings. "May" is distorted because it is a month and a verb.
1
2
10
407
Aleph-Null retweeted
LLMs represent concepts as vectors. Strikingly, taxonomies (organism → animal → bird) appear as hierarchies in embedding space. Led by my student @AndresNava, we show this comes from co-occurrence statistics alone.📄 arxiv.org/abs/2605.23821
17
64
461
25,568
Aleph-Null retweeted
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. arxiv.org/abs/2605.10878
30
152
1,123
146,688
Aleph-Null retweeted
Replying to @lauriewired
Programmers for the last 30 years: "EXCEL IS NOT A DATABASE"
5
6
152
3,219
Aleph-Null retweeted
From today, Algebrica’s content is open, free, downloadable in Markdown, and reusable by anyone. This is a step toward a university-level knowledge base that is freely accessible to everyone. Entries will be progressively released on GitHub in Algebrica’s public repository, and can be reused for non-commercial purposes. To increase transparency, I’m also documenting the editorial process and revising content to improve accuracy and reliability. On some pages, a quality indicator is now visible, including a GPTZero score (not affiliated), as an additional signal of transparency. I believe these changes move Algebrica toward something more open, more reliable, and more accessible. I’d also like to thank everyone for the unexpected response to the project, and for the many visits and thoughtful comments.
90
362
3,322
3,561,909
Aleph-Null retweeted
Replying to @__paleologo
a beautiful visualization of the diagonalizing flow x.com/gabrielpeyre/status/17…

Brockett’s flow progressively diagonalizes a symmetric matrix. hrl.harvard.edu/publications…
5
28
1,408
Aleph-Null retweeted
I am dogshit at exams I always struggled with exam anxiety. I went back to grad school after 14 years. I had undergrad quantum mechanics 19 years ago. First few semesters were online so exams were e-proctored, sometimes open-book. Easy mode psychologically. But last few semesters it’s back to in-class exams, no equation sheets, no notes, just a calculator and a pencil. Just got a 96% on a graduate applied quantum mid term from a few weeks ago. Like. I’m old. I’ve got a life. Full time job. Hobby bullshit. If you’ve considered going back after many years but you think you can’t get the math back, get the technical juice flowing again; or it’s too late. If I can, you can. It’s not too late.
13
3
106
3,028
Aleph-Null retweeted

2
1
4
278
Aleph-Null retweeted
From imitation to Spatial Reasoning Learn2Fold is based on a simple idea: treat cloth folding as a robotics task, moving beyond imitation learning toward better generalization across deformable objects. We hope this is a real step toward reasoning.
6
59
411
23,221