Towards Knowledgeable Language Models @ ACL 2024 Workshop

Joined April 2024
Photos and videos
KnowledgeLM Workshop retweeted
What left for humans with powerful coding agents? Right now, we evaluate agents mostly on Success Rate. But if fixing one simple issue by adding 2000 lines of spaghetti code, is that a win? I see the AI agents solve problems by endlessly adding new functions, growing into chaotic, million-line codebase that no humans can manage. But top engineers indeed care about the elegant simplicity beneath the mess (hello, Occam's Razor). What is left for humans? Might be just this. Yeah I became more and more excited about Abstraction. This paper is only about Abstracting and Reusing Skills, like macro functions. But might be a baby-step start.
๐Ÿ“ Can LLMs discover, abstract, and reuse higher-level tool skills across tasks? Existing tool-use benchmarks test solving tasks with fixed tools. But real workflows contain recurring structures where efficiency comes from reusable tool compositions, not isolated calls. We introduce SkillCraft: 126 tasks across 6 domains designed to test whether LLM agents can acquire compositional skills, not just call atomic tools. We also propose Skill Mode, a lightweight protocol with four MCP primitives that let agents compose, verify, cache, and reuse tool chains at test time. Our Key findings across evaluating 8 SOTA models: โšกSkill Mode enables agents to self-discover and reuse skills, leading to higher success and efficiency than agents without it. The gains are larger for stronger models. ๐Ÿง  Stronger models (e.g., Claude) discover more generalizable skills, which transfer across tasks and even across models. ๐Ÿ” Deeper composition โ‰  better โ€” shallow, well-tested skills generalize best. ๐Ÿ”— Paper: arxiv.org/abs/2603.00718 ๐Ÿ’ป Code: github.com/shiqichen17/Skillโ€ฆ ๐Ÿ  Page: skillcraft-website.github.ioโ€ฆ (1/7)
1
16
70
15,135
KnowledgeLM Workshop retweeted
Failure mode of LLM Agent RL training: reasoning shrinks, shorter and more similar. "diversity" has been a key to make LLM Agent RL training work, but I have always been wondering how to define "diversity". RAGEN used Entropy; RAGEN-v2 introduces Mutual Information (MI). The key insight comes from this decomposition: H(Z) = H(Z|X) I(X;Z) So we can systemically classify four types of reasoning evolving patterns: - diverse reasoning - compression reasoning - entropy collapse - template collapse Top-p filtering: The most fascinating thing is that we find top-p filtering using reward variance is simple, but effective! We also try to explain this failure mode from gradient updates, check more at @wzenus 's threads ๐Ÿ‘‡
In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. ๐Ÿš€ Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. ๐Ÿงต
3
19
137
37,753
KnowledgeLM Workshop retweeted
1. What is a good exploration? More steps โ‰  more information. Good exploration = prioritize information gain per step, so that forming a complete internal map of the world. It is about knowing what you donโ€™t know, and choosing actions that reduce that uncertainty. We ask LLMs/VLMs the best action to take next: not to solve a task, not to maximize a task reward, but to reduce spatial uncertainty, to build an internal spatial belief of the world that can support future spatial reasoning.
1
3
12
2,522
RT @ManlingLi_: Huge congrats to @hengjinlp on being named an ACL Fellow! I still feel incredibly lucky to have been advised by her. Subโ€ฆ
1
7
KnowledgeLM Workshop retweeted
3 Dec 2025
VAGEN poster at #NeurIPS: โฒ๏ธ11am-2pm Wed ๐Ÿ“Exhibit Hall C,D,E #5502 We look forward to discussing with you about: 1. MDP โ†’ POMDP 2. World modeling in agent internal belief 3. What is a good representation in agent internal belief for visual states? 4. How to use World Modeling to help reward shaping? 5. How to do turn-level critic learning? Drop by if you are interested in related topics!
VAGEN poster ๐ญ๐จ๐ฆ๐จ๐ซ๐ซ๐จ๐ฐ at #NeurIPS! ๐ŸŽฎ๐Ÿง  - ๐Ÿ•š 11amโ€“2pm Wed - ๐Ÿ“ Exhibit Hall C,D,E #5502 We had much fun exploring: โ€ข How ๐ฐ๐จ๐ซ๐ฅ๐ ๐ฆ๐จ๐๐ž๐ฅ๐ข๐ง๐  helps VLM RL agents learn better policies โ€ข ๐Œ๐ฎ๐ฅ๐ญ๐ข-๐ญ๐ฎ๐ซ๐ง ๐๐๐Ž credit assignment via ๐ญ๐ฐ๐จ-๐ฅ๐ž๐ฏ๐ž๐ฅ ๐š๐๐ฏ๐š๐ง๐ญ๐š๐ ๐ž ๐ž๐ฌ๐ญ๐ข๐ฆ๐š๐ญ๐จ๐ซ (Bi-Level GAE) for turn-level and token-level critic learning Come chat about agents, RL, and world models ๐Ÿ‘€
3
21
118
15,667
KnowledgeLM Workshop retweeted
Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching โ€“ it's enacted through interaction. ๐Ÿ‘‰We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a home-scale environment from a robot's egocentric view. ๐ŸŒenact-embodied-cognition.gitโ€ฆ ๐Ÿ“„enact-embodied-cognition.gitโ€ฆ 1/N
7
57
247
142,915
Join her lab!
24 Nov 2025
We are looking for PhDs and Postdocs! So proud of my students on achieving so many amazing things during their "very first year". I have been asked many times how I like being faculty, especially with funding cuts. My answer is always "it is the prefect job for me"! Still deep in the honeymoon phase. The only reason is the students are so amazing, making my transition so much easier. One year in, they already collected paper awards, orals, spotlights, etc What makes me proudest is they are vividly alive: curious, playful, confident in their own weird way, light up when talking about ideas, and never afraid to explore "the thing might fail". Everyone is justโ€ฆ themselves. And somehow, that version of themselves keeps shipping amazing work. In today's anxious academic world, this kind of aliveness is what I will try best to protect. Maybe the best part of being an advisor is that every student is so different and unique lol Interestingly, coming to second year, they've got their own passions, I can't just plug my ideas into their heads. So when I get excited about sth new, my first thought is: "Okay, time to find some fresh first-years who will be thrilled about this!" MLL lab is 1 year old, we started right in Oct 2024. We are growing and looking for more phds to join us! 1. Why our lab? (1/2) 2. Why @northwesterncs? (2/2) In 2025 alone: NU has 7 faculty as Sloan Fellows, plus a Nobel winner! Check more below
1
372
KnowledgeLM Workshop retweeted
๐Ÿงต Academic job market season is almost here! There's so much rarely discussedโ€”nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! โฌ‡๏ธ (1/N)
3
40
258
30,901
What is the difference between spatial reasoning and text-based reasoning?
30 Jun 2025
Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! ๐ŸŒmll-lab-nu.github.io/mind-cuโ€ฆ ๐Ÿ“ฐarxiv.org/pdf/2506.21458 ๐Ÿค—huggingface.co/datasets/MLL-โ€ฆ ๐Ÿ‘ฉโ€๐Ÿ’ปgithub.com/mll-lab-nu/MindCuโ€ฆ
1
67
KnowledgeLM Workshop retweeted
22 May 2024
[KnowledgeLM @ ACL24] @lm_knowledge ๐Ÿšจ Update: We've extended the paper submission deadline to May 30 to accommodate COLM review releasing. ๐Ÿ“ข We welcome submissions of Finding papers to present at our workshop! We have lined up wonderful speakers, and we are eager to engage with you in Thailand! Meet with our organizers: @ZoeyLi20 @hengjinlp @megamor2 @eunsolc @mjqzhang @peterbhase @mohitban47 @preslav_nakov @Meng_CS @JiaweiHan Website: knowledgeable-lm.github.io/
18
83
13,206
๐Ÿš€ Knowledgeable Language Model Workshop at ACL24 @aclmeeting Are you ever curious about how much LLMs know? Do you ever wish that LLMs could become smarter with more knowledge? Or maybe you are thinking about removing certain facts from its memory? knowledgeable-lm.github.io/
1
5
8
15,123
If you feel captivated by these problems, come join us at the Knowledge Language Model Workshop at ACL!
1
1
323
We will have a Best Paper Award, supported by @amazon. Appreciate it!!
252