PhD Candidate in CS @UMassAmherst | Research Intern @Dolby

Joined September 2023
31 Photos and videos
Jaechul Roh retweeted
My best interview in some time. Rohin Shah leads AGI alignment/safety at DeepMind. And he has a lot of spicy personal takes: We probably won’t get catastrophic misalignment (00:49) Safety 'commitments' have severe limitations (10:38) The intelligence explosion probably isn't imminent (1:52:44) Why he's not working to pause AI advances (51:44) Pre-deployment evals aren't the right focus (for catastrophic risks) (37:41) Signalling concern for safety sometimes diverts resources from actually making AI safe (01:09:51) Reading AI thoughts is v useful for safety – and we'll probably be able to for years to come (54:17) Governance is somewhat more likely to be the bottleneck than alignment (43:55) Rohin's team doesn't have a veto, and that's OK (27:36) Central banks are a promising model for regulating AI (33:34) Also: Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03) On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)
24
84
848
153,460
New preprint: Codec-Robust Attacks on Audio LLMs #CodecAttack Lossy codecs (Opus, MP3, AAC) have been treated as a defense against adversarial audio. We show they're actually an attack surface.
1
1
2
177
Why does it survive? The latent perturbation concentrates 88% of energy below 4 kHz, exactly where codecs allocate the most bits. A Jacobian analysis confirms this is structural: the decoder has no basis functions above 4 kHz.
1
17
We still listen to old songs not because they are the best recordings, but because they remind us of something. A place, a person, a feeling. There is usually something imperfect about them, and I think that imperfection is part of why they stay with us. My daily research is in AI security, but I have also been interested in a different kind of threat lately. Not a technical one, but a cultural one. Questioning myself: what happens when more of the music, art, and stories around us are AI-generated? Not whether they will be good or bad, but whether they will carry the same weight over time. My recent blog post explores that question through the lens of why imperfection matters, how it connects to memory, and what we might quietly lose if it disappears. It is a highly opinionated writing, not a research paper. Just a casual read. But it has been on my mind for a while and I wanted to share.
1
1
23
Jaechul Roh retweeted
12 May 2025
After supervising 20 papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
25
276
2,612
339,458
1/ Fine-tuning an Audio LLM on benign audio dataset pushed its jailbreak rate from 4.62% β†’ 87.12%.No adversary. No harmful data. New paper 🧡
4
3
23
2,554
7/ Good news: two simple defenses bring JSR back to near-zero. πŸ›‘οΈ Distant filtering (training time): pick benign samples farthest from harmful embeddings πŸ›‘οΈ System prompt (inference time): just tell the model to refuse Safety is fragile, but recoverable.
1
92
Excited to have contributed to this work during my internship at Brave. Turns out making AI agents more private also makes them more useful, up to 17.9% better task success. Paper: arxiv.org/pdf/2602.13516

Mar 5
AI agents that browse for us can perform a lot of tasks on our behalf, from booking reservations to filling out forms. Unfortunately, these agents have a serious privacy issue: oversharing users' personal information. Fixing this problem is key to making AI more effective.
1
114
Jaechul Roh retweeted

22
234
1,578
254,095