@Harvard. Science of Intelligence

Joined May 2023
76 Photos and videos
Core Francisco Park retweeted
Slow SSH connections causing keyboard lag? Try "slsh," it eliminates typing latency! slsh is a: - Drop-in SSH wrapper - Latency compensation for all interactive TUIs - Open-source w/ MIT license, written in Rust, built for Linux/Windows/Mac on x86/arm slsh-rs.github.io
1
2
9
384
This is a great way to understand why bigger models are better!
We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
7
601
Core Francisco Park retweeted
Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)
We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.
4
19
185
16,275
Super cool work showing how analogies work in models!
Our paper was accepted as a #ICML2026 Spotlight! Reasoning in LLMs has improved largely by chaining local steps. But is that the whole story? Humans occasionally make inferential "leaps" across domains, a faculty known as analogy. We design a synthetic task to show how small Transformers acquire analogical reasoning, and find that the same signatures appear in pretrained LLMs. arxiv: arxiv.org/abs/2602.01992 code: github.com/gouki510/Analogy_…
1
7
52
8,456
Generative models don't always learn the distribution of latent variables properly. In other words: You don't get five fingers on a hand. (Before they got strongly data augmented) Here is some mechanistic insight why that's the case!
🚨New paper! 📃Mechanisms of Misgeneralization in Physical Sequence Modeling Planners for the physical world produce motions that look safe, but quietly change quantities the demonstrations are meant to control. When does this happen? Why? Can we predict it before training?👇🧵
7
24
5,367
🚨 New Paper! (Part 1: Pretraining) Many recent works show beautiful representational geometry in neural networks. But what controls the geometry of world representations during pretraining? We decouple the world from data to study this in a controlled setup. 1/n
12
81
579
47,593
Huge thanks for following! This was a pretty open-ended research project, and the whole research procedure is available below. Stay tuned for part 2, where we fine-tune new cities into the network, an analogy of a new concept introduced to the world! Paper: arxiv.org/abs/2602.00533 Research Process: cfpark00.github.io/world-rep… 15/n
2
2
40
1,551