Joined December 2023
7 Photos and videos
11/ Takeaway: MoE sparsity does more than save compute. It appears to make both neurons and experts more interpretable, making the expert level a promising unit for scalable mechanistic interpretability.
1
1
41
Jae Hee Lee retweeted
We're looking for a postdoctoral research associate, to work on reasoning in multimodal generative models: Deadline for applications: 20th May Location: Cardiff, UK Details: jobs.ac.uk/job/DRH404/resear… @Cardiff_NLP @cardiff_krr
5
8
666
Jae Hee Lee retweeted
After speaking with @kalomaze I actually suspect that the main contribution of this paper (ie dynamic loop depth) is likely NOT used by Mythos The theorem result I mentioned is still interesting imo in that it suggests scaling up depth has substantial theoretical advantages over scaling up reasoning trace length, so I think overall this just suggest that Mythos is much deeper than previous models (whether that’s through full new layers or some form of weight tying/fixed length looping is still an open question)
Replying to @ChrisHayduk
but then the entire point of the ByteDance paper (all the kludges they use to make dynamic loops work) is gone, and what you're actually doing becomes a glorified form of weight tying (re: parameter reuse), so you're back to square one basically
4
4
76
13,392
Jae Hee Lee retweeted
We are entering the second half of research. Here is my advice to every PhD student before starting a project: 1. Can Claude Code solve it in a day? 2. Will a Research Agent solve it soon? 3. Will scaling solve it anyway? If the answer to all three is No, then maybe you have found a real research problem. Because in the age of AI, many things that looked like research are being revealed as delayed engineering. That does not make research less important. It makes problem selection more important than ever. The scarce resource is no longer intelligence. It is taste. It is originality. It is the ability to ask questions that survive automation. The first half of research was about solving hard problems. The second half is about knowing which problems are still worth solving. #research #academic #AI #GenAI #generativeai #airesearch #taste
8
20
146
43,486
Jae Hee Lee retweeted
📢 Join our team in Hamburg! The Trustworthy AI lab 🤝 is looking for a Research Associate for a novel DFG-funded project on ethical LLM-based multi-agent systems. 🤖 Full-time | EGR. 13 TV-L | Apply by 8 April 2026 🔗 uni-hamburg.de/en/stellenang… #TrustworthyAI #LLM #AcademicJobs
1
11
84
5,088
Jae Hee Lee retweeted
I rarely run my own experiments anymore. My days are spent managing mentees, writing proposals, connecting dots between other people's projects. From a pure efficiency standpoint, this is probably optimal. The skills I have that are hardest to automate — taste, conceptual synthesis, writing — are the ones I'm exercising. The skills that are easiest to automate — writing scripts, running demos, implementing baselines — are the ones I've delegated. Last week I had a conversation about a new research direction. We were both excited about it. At the end I offered to spend 1-2 days building a quick demo. "No need," they said. "I'll get an agent to do it." They were right. It would have been a poor use of my time. But something in me deflated. There's a mental itch that used to get scratched by problem-solving — by sitting with a bug for an hour, by writing a training loop from scratch, by the quiet satisfaction of watching a loss curve bend. That itch isn't getting scratched anymore, and I think it's the root cause of a low-level dissatisfaction I've been carrying around for months. Part of the problem is identity. I used to think of myself as someone who was good at coding and math. Those skills still exist somewhere in me, but they're atrophying from disuse. When the thing you built your self-concept around stops being the thing you do every day, you need a new story about who you are. I haven't written that story yet. Part of the problem is fear. My research space is getting crowded. I'll mention an idea and someone will say, "oh, talk to so-and-so, they're already doing that." I worry about being scooped, about important conversations happening without me, about fading into irrelevance. The world feels increasingly fast-paced and I feel increasingly bogged down — by existing commitments, by structural friction, by my own indecision about what to prioritise. I want to acknowledge that these fears are partly rational. The incentives in AI research right now are extreme. But I also suspect the world is less cutthroat than it feels at 11pm on a Tuesday. People with taste and energy tend to find something to succeed at. The anxiety is more about pace than about outcome. The solution, I think, has two parts. The first is professional: I need to accept the new shape of my role and get good at it, rather than mourning the old one. Being the person who sees connections, who mentors well, who writes the crisp proposal — that's genuinely valuable work. It's just not the work that scratches the itch. The second is personal: I need something where it's just me. No agents, no augmentation, no delegation. Just my hands and my brain and some problem. Rock climbing fits. Dance fits. Making things with my hands — pottery, woodworking — sounds right. Something where the point is the struggle itself, where efficiency is beside the point, where no one is going to suggest I outsource it to an LLM. Maybe what I'm really afraid of is inefficiency. That I'll waste time. That the weekend spent on a demo could have been spent on something more "leveraged." But I'm starting to suspect that the waste is the point. That the itch exists for a reason, and starving it in the name of optimality is its own kind of failure. (co-authored with opus 4.6)
9
5
178
14,989
Jae Hee Lee retweeted
every time i try to have an original research idea but Zhang et al. already published it 3 years ago
283
9,224
62,142
1,724,355
Jae Hee Lee retweeted
To sum up, I'm still trying to wrap my head around this! why do recent frontier LLMs struggle on simple math if results on extremely hard math problems show some "signs of reasoning". Different hypotheses: 1⃣ CoT in data & RL are not enough to teach them proper search & backtracking 2⃣ Maybe it's not reasoning, it's just sophisticated pattern matching 3⃣Their internal computational graphs differ from those of humans, so teaching them linearized algorithms that seem simple to us might not translate effectively for them.
18
13
79
18,197