Jae Hee Lee

Jae Hee Lee

7 Photos and videos

Tweets

Jae Hee Lee @dschaehi

Apr 30

1/ Excited to share that our paper “The Expert Strikes Back” has been accepted to #ICML2026! We ask: does sparse routing in MoE language models make them easier to interpret? Paper: arxiv.org/abs/2604.02178

The Expert Strikes Back: Interpreting Mixture-of-Experts Language...

Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are...

arxiv.org

351

more replies

Jae Hee Lee

Jae Hee Lee @dschaehi

Apr 30

11/ Takeaway: MoE sparsity does more than save compute. It appears to make both neurons and experts more interpretable, making the expert level a promising unit for scalable mechanistic interpretability.

Jae Hee Lee

Jae Hee Lee @dschaehi

Apr 30

12/ We hope this helps build better tools for understanding sparse LLMs at scale. Paper: arxiv.org/abs/2604.02178 Code: github.com/jerryy33/MoE_anal… #ICML2026 #LLMs #MoE #MechanisticInterpretability

The Expert Strikes Back: Interpreting Mixture-of-Experts Language...

Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are...

arxiv.org

130

Steven Schockaert

Jae Hee Lee retweeted

Steven Schockaert @StevenSchockae2

Apr 25

We're looking for a postdoctoral research associate, to work on reasoning in multimodal generative models: Deadline for applications: 20th May Location: Cardiff, UK Details: jobs.ac.uk/job/DRH404/resear… @Cardiff_NLP @cardiff_krr

Research Associate at Cardiff University

jobs.ac.uk

666

Jae Hee Lee

Jae Hee Lee @dschaehi

Apr 22

📢 Call for Papers: STRL 2026 — 5th International Workshop on Spatio-Temporal Reasoning and Learning @ IJCAI-ECAI 2026, Bremen, Germany. 🗓 Submission deadline: May 24, 2026 (AoE) 🔗 strl-workshop.github.io/strl… #STRL2026 #IJCAI #ECAI #AI #MachineLearning #NeuroSymbolicAI

STRL 2026 — 5th International Workshop on Spatio-Temporal Reasoning and Learning

STRL 2026 aims to strengthen the bridge between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML) communities through work on neuro-symbolic methods, commonsense reasoning,...

strl-workshop.github.io

440

Chris Hayduk

Jae Hee Lee retweeted

Chris Hayduk

@ChrisHayduk

Apr 11

After speaking with @kalomaze I actually suspect that the main contribution of this paper (ie dynamic loop depth) is likely NOT used by Mythos The theorem result I mentioned is still interesting imo in that it suggests scaling up depth has substantial theoretical advantages over scaling up reasoning trace length, so I think overall this just suggest that Mythos is much deeper than previous models (whether that’s through full new layers or some form of weight tying/fixed length looping is still an open question)

kalomaze

@kalomaze

Apr 11

Replying to @ChrisHayduk

but then the entire point of the ByteDance paper (all the kludges they use to make dynamic loops work) is gone, and what you're actually doing becomes a glorified form of weight tying (re: parameter reuse), so you're back to square one basically

13,392

Zhengzhong Tu

Jae Hee Lee retweeted

Zhengzhong Tu

@_vztu

Mar 27

We are entering the second half of research. Here is my advice to every PhD student before starting a project: 1. Can Claude Code solve it in a day? 2. Will a Research Agent solve it soon? 3. Will scaling solve it anyway? If the answer to all three is No, then maybe you have found a real research problem. Because in the age of AI, many things that looked like research are being revealed as delayed engineering. That does not make research less important. It makes problem selection more important than ever. The scarce resource is no longer intelligence. It is taste. It is originality. It is the ability to ask questions that survive automation. The first half of research was about solving hard problems. The second half is about knowing which problems are still worth solving. #research #academic #AI #GenAI #generativeai #airesearch #taste

146

43,486

Anne Lauscher (she/her)

Jae Hee Lee retweeted

Anne Lauscher (she/her)@anne_lauscher

Mar 20

📢 Join our team in Hamburg! The Trustworthy AI lab 🤝 is looking for a Research Associate for a novel DFG-funded project on ethical LLM-based multi-agent systems. 🤖 Full-time | EGR. 13 TV-L | Apply by 8 April 2026 🔗 uni-hamburg.de/en/stellenang… #TrustworthyAI #LLM #AcademicJobs

Job advertisement

uni-hamburg.de

5,088

Daniel Tan

Jae Hee Lee retweeted

Daniel Tan

@DanielCHTan97

Mar 2

I rarely run my own experiments anymore. My days are spent managing mentees, writing proposals, connecting dots between other people's projects. From a pure efficiency standpoint, this is probably optimal. The skills I have that are hardest to automate — taste, conceptual synthesis, writing — are the ones I'm exercising. The skills that are easiest to automate — writing scripts, running demos, implementing baselines — are the ones I've delegated. Last week I had a conversation about a new research direction. We were both excited about it. At the end I offered to spend 1-2 days building a quick demo. "No need," they said. "I'll get an agent to do it." They were right. It would have been a poor use of my time. But something in me deflated. There's a mental itch that used to get scratched by problem-solving — by sitting with a bug for an hour, by writing a training loop from scratch, by the quiet satisfaction of watching a loss curve bend. That itch isn't getting scratched anymore, and I think it's the root cause of a low-level dissatisfaction I've been carrying around for months. Part of the problem is identity. I used to think of myself as someone who was good at coding and math. Those skills still exist somewhere in me, but they're atrophying from disuse. When the thing you built your self-concept around stops being the thing you do every day, you need a new story about who you are. I haven't written that story yet. Part of the problem is fear. My research space is getting crowded. I'll mention an idea and someone will say, "oh, talk to so-and-so, they're already doing that." I worry about being scooped, about important conversations happening without me, about fading into irrelevance. The world feels increasingly fast-paced and I feel increasingly bogged down — by existing commitments, by structural friction, by my own indecision about what to prioritise. I want to acknowledge that these fears are partly rational. The incentives in AI research right now are extreme. But I also suspect the world is less cutthroat than it feels at 11pm on a Tuesday. People with taste and energy tend to find something to succeed at. The anxiety is more about pace than about outcome. The solution, I think, has two parts. The first is professional: I need to accept the new shape of my role and get good at it, rather than mourning the old one. Being the person who sees connections, who mentors well, who writes the crisp proposal — that's genuinely valuable work. It's just not the work that scratches the itch. The second is personal: I need something where it's just me. No agents, no augmentation, no delegation. Just my hands and my brain and some problem. Rock climbing fits. Dance fits. Making things with my hands — pottery, woodworking — sounds right. Something where the point is the struggle itself, where efficiency is beside the point, where no one is going to suggest I outsource it to an LLM. Maybe what I'm really afraid of is inefficiency. That I'll waste time. That the weekend spent on a demo could have been spent on something more "leveraged." But I'm starting to suspect that the waste is the point. That the itch exists for a reason, and starving it in the name of optimality is its own kind of failure. (co-authored with opus 4.6)

178

14,989

cherrie

Jae Hee Lee retweeted

cherrie @cherrishkhera

Jan 25

every time i try to have an original research idea but Zhang et al. already published it 3 years ago

283

9,224

62,142

1,724,355

Nouha Dziri

Jae Hee Lee retweeted

Nouha Dziri

@nouhadziri

3 Feb 2025

To sum up, I'm still trying to wrap my head around this! why do recent frontier LLMs struggle on simple math if results on extremely hard math problems show some "signs of reasoning". Different hypotheses: 1⃣ CoT in data & RL are not enough to teach them proper search & backtracking 2⃣ Maybe it's not reasoning, it's just sophisticated pattern matching 3⃣Their internal computational graphs differ from those of humans, so teaching them linearized algorithms that seem simple to us might not translate effectively for them.

18,197