Polo Club of Data Science at @georgiatech. Scalable Interactive Data Analytics. Visit homepage for info on club members, project and more! @gtcomputing @gtcse

Joined June 2014
9 Photos and videos
Polo Data Club retweeted
Replying to @Alibaba_Qwen
Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! ๐Ÿ‘‰ arxiv.org/abs/2505.17196
6
14
1,270
Polo Data Club retweeted
๐ŸŽ‰Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting ๐Ÿ‘‰First survey connecting LLM interpretation & safety
4
20
176
13,906
Polo Data Club retweeted
๐Ÿšจ New work: We rethink how we finetune safer LLMs โ€” not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like ๐Ÿ›ก๏ธ Llama Guard and Granite Guardian to score evolving risk across each response ๐Ÿ“‰ โ€” giving rise to the STAR โญ score, a fine-grained safety signal that enables more targeted safety supervision. On top of this, we introduce โญDSS (STAR-Guided Dynamic Safety Shaping) โ€” a training method that ๐Ÿšซ suppresses unsafe patterns, ๐Ÿ’ช preserves capability, and generalizes across LLMs, guardrails, harm levels, and datasets. Our method outperforms "Deep Token," the method from this yearโ€™s #iclr2025 Best Paper ๐Ÿ† โ€” remaining robust against key finetuning-as-a-service threats like ๐Ÿ”„ response adaptation, ๐Ÿงช prompt poisoning, and ๐Ÿ›‘ harmful prefilling. #MachineLearning #DeepLearning #LLM #AISafety #Alignment #Finetuning
3
17
81
9,662
Polo Data Club retweeted
Guardrail models like ๐Ÿ›ก๏ธ Llama Guard do more than filtering โ€” we repurpose them to track how safety risk evolves ๐Ÿ“‰ through a response. This gives rise to the STAR โญ score: a fine-grained signal for finetuning LLMs more safely ๐Ÿค–๐Ÿ”’ Curious how it works? More in the thread ๐Ÿ‘‡
1
4
10
812
Polo Data Club retweeted
12 Apr 2025
This website has visualizations to understand almost all major topics in Machine Learning (link in comment)
3
36
271
14,670
Polo Data Club retweeted
One of the simplest algorithms for sampling from a probability distribution is Random Walk Metropolis-Hastings. It proposes new samples by taking Gaussian-distributed steps, accepting or rejecting them to maintain the target distribution. I call this pdf the "fidget spinner".
7
149
1,285
79,873
Polo Data Club retweeted
Create heatmaps that localize text concepts in generated videos. We discovered that our approach, ConceptAttention, can be directly extended from image generation to video generation models! It's amazing how simple techniques often generalize way better than more complex ones.
11
65
531
40,016
Polo Data Club retweeted
Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo ๐Ÿ‘‡
10
55
356
24,411
Polo Data Club retweeted
Introducing ConceptAttention, an approach to interpreting diffusion transformer models! Write a prompt, choose some concepts, generate an image, and get high-quality heatmaps of text concepts. Our method outperforms existing methods like cross attention. Link to demo ๐Ÿ‘‡
9
82
474
36,637
Polo Data Club retweeted
Gradient descent alone tends to converge to local minima. Momentum frames optimization as a ball with mass moving down a hill. By adding inertia, the ball resists settling in small basins, allowing it to arrive at the global minimum.
1
6
37
1,507
Polo Data Club retweeted
๐Ÿš€ Effective Guidance for Model Attention with Simple Yes-no Annotations Excited to share that I'll be presenting our recent work ๐ŸŽจCRAYON๐Ÿ–๏ธ at @ieeebigdata soon! Catch me at 2pm in the Deep Learning II session!
4
3
15
1,246
Polo Data Club retweeted
๐ŸŽ‰The coolest #CSE school in the world is hiring multiple faculty members! Application link below๐Ÿ‘‡
1
18
44
5,647
Polo Data Club retweeted
๐Ÿง‘โ€๐Ÿ’ป The code of our NeurIPS'24 LLM safety landscape paper is now publicly available at: github.com/poloclub/llm-landโ€ฆ x.com/RealAnthonyPeng/statusโ€ฆ
LLM safety alignment can be easily compromised by finetuning with only a few adversarially designed training examples. ๐Ÿ˜ฒ Why? Are all open-source LLMs equally vulnerable to finetuning? How fast does the model start to break during finetuning? ๐Ÿค”
4
16
1,627
Polo Data Club retweeted
29 Oct 2024
Transformers visually explained: poloclub.github.io/transformโ€ฆ
32
629
3,223
212,054
Polo Data Club retweeted
14 Oct 2024
CSE Prof. @PoloChau and his group are presenting two papers and two posters this week at @ieeevis! Check out the interactive graphic ๐Ÿ”—๐Ÿ‘‡ for a peek of all Georgia Tech research presented this week, including award-winning work on Transformer Explainer! public.tableau.com/views/VISโ€ฆ
7
19
1,341
Polo Data Club retweeted
๐Ÿš€Excited to present Diffusion Explainer at the @ieeevis tomorrow at 1:45pm EST in the AI & LLM session! Try it now: poloclub.github.io/diffusionโ€ฆ #StableDiffusion #GenerativeAI #AI #Visualization #IEEEVIS2024
1
7
30
2,381
Polo Data Club retweeted
Please join us in congratulating longtime staff member, Queenie Kravitz, on her retirement today. She started @CarnegieMellon in 1993 and the HCII in 2004, and as graduate program coordinator certified our very first HCI PhD and master's degrees. Congrats, Queenie! #CMUhcii
1
7
73
7,937
Polo Data Club retweeted
๐Ÿ˜Ž Our paper on the LLM safety landscape has been accepted at @NeurIPSConf 2024! #Safety #LLM #MachineLearning
LLM safety alignment can be easily compromised by finetuning with only a few adversarially designed training examples. ๐Ÿ˜ฒ Why? Are all open-source LLMs equally vulnerable to finetuning? How fast does the model start to break during finetuning? ๐Ÿค”
2
11
47
4,518
Polo Data Club retweeted
28 Aug 2024
More exciting news from #KDD2024! A CSE/@NASAJPL collaborative paper won the conference best paper runner-up! Congratulations ML Ph.D. student Austin Wright, Professor Polo Chau, and Scott Davidoff! Check out the paper on Nested Fusion here: dl.acm.org/doi/10.1145/36375โ€ฆ @PoloChau
2
3
14
1,473
Polo Data Club retweeted
26 Aug 2024
#KDD2024 kicked off yesterday in Barcelona, and we are already on a fast start! Several School of CSE faculty, students, and alumni organized and presented today at the @EpidamikW workshop! Check out the website ๐Ÿ”—๐Ÿ‘‡ for more on the workshop! epidamik.github.io/index.htmโ€ฆ
3
13
1,121