Kento Nishi｜🐔

Kento Nishi｜🐔

76 Photos and videos

Tweets

Core Francisco Park retweeted

Kento Nishi｜🐔@kento_nishi

Jun 2

Slow SSH connections causing keyboard lag? Try "slsh," it eliminates typing latency! slsh is a: - Drop-in SSH wrapper - Latency compensation for all interactive TUIs - Open-source w/ MIT license, written in Rust, built for Linux/Windows/Mac on x86/arm slsh-rs.github.io

384

Core Francisco Park

Core Francisco Park

@corefpark

Jun 2

This is a great way to understand why bigger models are better!

Christopher Potts

@ChrisGPotts

Jun 1

We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.

ALT Title card for a research paper. The title reads "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention." Authors listed: Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Lampinen, Christopher Potts, and Ekdeep Singh Lubana. A Goodfire logo appears below the names. Author affiliations: Stanford University, Kempner Institute at Harvard University, MIT, and Anthropic.

601

Ekdeep Singh Lubana

Core Francisco Park retweeted

Ekdeep Singh Lubana @EkdeepL

Jun 1

Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)

0:09

Christopher Potts

@ChrisGPotts

Jun 1

185

16,275

Core Francisco Park

Core Francisco Park

@corefpark

May 26

Super cool work showing how analogies work in models!

Gouki Minegishi

@GoukiMinegishi

May 26

Our paper was accepted as a #ICML2026 Spotlight! Reasoning in LLMs has improved largely by chaining local steps. But is that the whole story? Humans occasionally make inferential "leaps" across domains, a faculty known as analogy. We design a synthetic task to show how small Transformers acquire analogical reasoning, and find that the same signatures appear in pretrained LLMs. arxiv: arxiv.org/abs/2602.01992 code: github.com/gouki510/Analogy_…

0:14

8,456

Core Francisco Park

Core Francisco Park

@corefpark

May 21

Generative models don't always learn the distribution of latent variables properly. In other words: You don't get five fingers on a hand. (Before they got strongly data augmented) Here is some mechanistic insight why that's the case!

Kento Nishi｜🐔@kento_nishi

May 21

🚨New paper! 📃Mechanisms of Misgeneralization in Physical Sequence Modeling Planners for the physical world produce motions that look safe, but quietly change quantities the demonstrations are meant to control. When does this happen? Why? Can we predict it before training?👇🧵

5,367

Core Francisco Park

Core Francisco Park

@corefpark

May 20

🚨 New Paper! (Part 1: Pretraining) Many recent works show beautiful representational geometry in neural networks. But what controls the geometry of world representations during pretraining? We decouple the world from data to study this in a controlled setup. 1/n

579

47,593

more replies

Core Francisco Park

Core Francisco Park

@corefpark

May 20

Huge thanks for following! This was a pretty open-ended research project, and the whole research procedure is available below. Stay tuned for part 2, where we fine-tune new cities into the network, an analogy of a new concept introduced to the world! Paper: arxiv.org/abs/2602.00533 Research Process: cfpark00.github.io/world-rep… 15/n

Convergent World Representations and Divergent Tasks

While neural representations are central to modern deep learning, the conditions governing their geometry and their roles in downstream adaptability remain poorly understood. We develop a...

arxiv.org

1,551

Core Francisco Park

Core Francisco Park

@corefpark

May 20

Tagging some who might be interested! @Hidenori8Tanaka @kento_nishi @a_jy_l @EkdeepL @thomas_fel_ @SophieLWang @apoorvkh @Michael_Lepori @AndrewLampinen @ZechenZhang5 @KihoPark_ 16/n

1,194