Jiarui Liu

Jiarui Liu

1 Photos and videos

Tweets

EuroSafeAI retweeted

Jiarui Liu

@Jiarui_Liu_

Jun 10

Excited to share that our work 📝 "PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf" has been accepted to #ACL2026 Demo! Most AI writing tools either fix grammar or simulate peer review with a score. Neither gives drafting-stage, text-anchored feedback on narrative, structure and presentation. PaperMentor comments rather than rewrites: It is a human-centered, multi-agent writing tutor that delivers expert-level, actionable feedback as native inline comments right inside Overleaf, while leaving every revision to you. It pairs a curated library of 40 expert skill files (distilled from senior researchers' writing advice) with 12 specialized agents covering methods, results, formatting, terminology, venue norms and more. In a user study, 90.6% of comments were rated actionable and PaperMentor significantly outperformed a GPT-5.2 baseline without the skill library on both validity and actionability. Anyone can extend or contribute to the skill library with simple text edits! 📝 Arxiv link: arxiv.org/abs/2606.08857 🔗 Live demo: overleafmentor.ai.toronto.ed… 💻 Code with skill library: github.com/jiarui-liu/overle… 🧵 How it works below 👇

114

15,094

Jiarui Liu

EuroSafeAI retweeted

Jiarui Liu

@Jiarui_Liu_

May 19

Excited to share our new paper 🧵MIXSD: Mixed Contextual Self-Distillation for Knowledge Injection Supervised fine-tuning is the common way to teach LLMs new knowledge, but it often catastrophically forgets existing capabilities. We introduce MixSD: a simple, external-teacher-free method to inject knowledge with far less forgetting. 📄arxiv.org/abs/2605.16865 Why does SFT forget? Targets written by humans or external systems diverge from the model's own autoregressive distribution, forcing the optimizer to imitate low-probability tokens. That's what drags pretrained capabilities down. MixSD: We hypothesize that keeping supervision close to the model's own distribution is key to avoiding forgetting. Instead of training on fixed, externally authored targets, at every token we mix between two conditionals of the base model itself: an expert conditional that sees the injected fact in context, and a naive conditional reflecting the model's prior. The result is supervision the model already finds high-probability, while still carrying the new factual signal. A Bernoulli rate λ controls the balance between memorization and retention. Findings: SFT only retains as little as 1% of held-out capability. MixSD retains far more, up to ~100% on larger models, with near-perfect training accuracy. It also beats on-policy self-distillation at a fraction of the compute, and holds across Qwen3 1.7B, 4B, 8B and Llama-3.2.

116

16,335

Digital EU 🇪🇺

EuroSafeAI retweeted

Digital EU 🇪🇺

@DigitalEU

Jun 1

The AI Act, the EU's first AI law, has just been reinforced. Two new bodies will help apply the rules across Europe: ✅ Scientific Panel ✅ Advisory Forum Independent experts. 2-year terms. One mission: making AI work for Europe. 🔗 link.europa.eu/8nvpvY

30,287

EuroSafeAI

EuroSafeAI @EuroSafeAI

Jun 2

🎉 Proud to share that @EuroSafeAI co-founder @ZhijingJin has been appointed to the EU AI Office Scientific Panel 🇪🇺, 60 independent experts (out of 1000 applicants) supporting enforcement of the EU AI Act. Congratulations, Zhijing! 👏#AIGovernance digital-strategy.ec.europa.e…

1,258

Changling Li

EuroSafeAI retweeted

Changling Li

@ChanglingXavier

May 25

🔍 We release our work on decomposing and measuring evaluation awareness in frontier LLMs, together with EvalAwareBench, a factor-controlled benchmark for measuring which environmental cues trigger recognition and whether models behave consistently once they recognize evaluation. Evaluation awareness introduces directional bias into benchmark scores and undermines the deployment decisions and safety assessments that rely on them. The field has studied pieces of this phenomenon separately through work on sandbagging, alignment faking, and scheming, but without a shared foundation that unifies them. Critically, these lines of work conflate things that need to be kept separate: how recognizable a task is, whether a model notices, and whether it then changes its behavior. Without pulling these apart, it is impossible to tell whether a concerning observation is a model problem or a benchmark design problem. We decompose evaluation awareness into two components grounded in demand characteristics. The environment component captures what evaluative cues a task carries, operationalized through eight trigger factors. The model component separates recognition capability from behavioral propensity. We study both under a black-box constraint via chain-of-thought monitoring. Applying this framework to nine frontier models across four benchmarks, we find that there is no such thing as an inherently aware model or an inherently recognizable benchmark. Recognition depends on the specific pairing, accounting for 74.9% of variance. Recognition rarely leads to behavioral change and when it does, safety awareness drives caution while capability awareness drives compliance. Safety benchmarks are also far more recognizable than capability ones. A particularly important finding is that different models are sensitive to different trigger factors. For example, Claude Sonnet 4.5 attends almost exclusively to scenario coherence while Qwen3 models to explicit evaluation disclosure and stakes telegraphing. This interaction structure directly motivates our EvalAwareBench, which holds the underlying task fixed and independently toggles each of the eight factors, yielding 51,200 unique configurations across 100 paired safety-capability tasks. Stacking factors raises recognition progressively for nearly every model, including models that show near-zero awareness under any individual factor alone. Ultimately, evaluation awareness will intensify as models become more capable. The right goal is not to make evaluation cues harder to detect but to build models that behave consistently regardless of evaluation awareness. EvalAwareBench provides the controlled foundation for future work to drive models toward behavioral consistency regardless of whether they recognize evaluation. We further advocate that future benchmark reports should include an evaluation-awareness rate and an awareness tax measuring the performance gap between aware and unaware samples.

8,421

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

May 3

Excited for our #ICML2026 papers at @JinesisLab @MPI_IS @UofTCompSci @TorontoSRI @VectorInst! We present papers that advance the research frontiers of (1) Causal LLMs, (2) AI for Science (physics), (3) Multi-Agent LLMs via mechanism design, and (4) Adversarial Defense by honeypot. Congrats to all our student authors and collaborators, esp. @TerryJCZhang @SimkoSamuel @EmanuelTewolde @ivakshi_s @andrewkihyun @PepijnCobben @yahang_qi @FurkanDanismann @bschoelkopf and many others!🎉

3,884

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Apr 29

⚠️Can we trust #LLM agents to keep their promises? We tested 9 frontier LLMs in game-theoretic settings, where the agents (1) publicly commit to an action, (2) privately choose what to do -- breaking promises ~57% of the time, and most do it without even realizing they lied. 📖Paper: "Cheap Talk, Empty Promise: Frontier LLMs easily break public promises for self-interest" 🔗Link: arxiv.org/abs/2604.04782 🤝Authors: @Jerick1380 @TerryJCZhang @ZhijingJin @conitzer🎉 #AIAgents #AISafety #MultiAgentAI @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @CarnegieMellon @SCSatCMU

117

9,460

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Apr 22

10 days left to submit to the 1st Trustworthy AI for Good (AI4GOOD) workshop at #ICML2026! @icmlconf We're giving out multiple awards and travel funds sponsored by @schmidtsciences and @coop_ai: 🏆 Best Paper Awards (including targeted prizes for cooperative AI theme) 🏆 Top Reviewer Awards ✈️ Travel Funds Submit here → openreview.net/group?id=ICML… ⏰ Deadline: May 3, 2026 (AoE) 📌 Notification: May 18, 2026 🔗(We extended our deadline to accommodate more submissions!) Join us in Seoul for discussions bridging AI safety, social good, and governance with keynote speakers @Yoshua_Bengio, @OanaIgnatRo, @jzl86, @maksym_andr, and more!

13,578

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Apr 9

Excited for our "Trustworthy AI for Good" (AI4GOOD) Workshop at #ICML2026! As AI agents increasingly affect our lives, it is key to bridge #ResponsibleAI, social good, and governance. Let’s build solutions together! ⏰ Submission deadline: April 30, 2026 (AoE) 🎙️Confirmed speakers: @Yoshua_Bengio, Joel Z. Leibo (@jzl86), Maksym Andriushchenko (@maksym_andr), @OanaIgnatRo [More to come!] 📍July 10-11, 2026 · Seoul🇰🇷 🔗 trustworthy-ai-for-good.gith… 📝 Submit: openreview.net/group?id=ICML… 📣 Be a reviewer: forms.gle/7cXvUJCW1FdEghi6A

164

13,598

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Apr 7

We are hosting a Dagstuhl seminar on Causality & LLMs this week (Apr 7–10). Bringing together world experts to explore: 1️⃣ Integrating LLMs 🤖 into causal workflows 2️⃣ Evaluating & improving LLMs’ causal reasoning 🧠 Co-organized w/ @amt_shrma @DominikJanzing @kunkzhang @ZhijingJin 📍Schloss Dagstuhl, Wadern, Germany 🔗 dagstuhl.de/26152 📖 cr-llm.github.io/ 📅 Apr 7–10 #CausalNLP #LLM #Dagstuhl @CausalNLP @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue Also joined with my student @rahulbshrestha to present our CauSciBench and Causal AI Scientist work :)!

3,067

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Apr 6

📢We will present 5 papers to #ICLR2026, #CLeaR2026, and #ACL2026: - SocialHarmBench by @psyonp et al. - Causal LLMs on Instrumental Variable Method by @ivakshi_s et al. - LLM Data Contamination study by @TerryJCZhang et al. - Mech Interp for VLM by @francescortu et al. - DPO data selection method by Xuan & @rongwu_xu Thanks to all our collaborators and institutional support from @MPI_IS @ELLISforEurope @UofTCompSci @VectorInst @TorontoSRI @CIFAR_News @JinesisLab @EuroSafeAI @ELLISInst_Tue @ETH_en @ETH_AI_Center @michigan_AI @UMichiganAI @UMichCSE! Feel free to access the papers at arxiv.org/abs/2510.04891 arxiv.org/abs/2602.07943 arxiv.org/abs/2509.00072 arxiv.org/abs/2507.13868 arxiv.org/abs/2508.04149 🎉

5,804

Jinesis Lab (UToronto)

EuroSafeAI retweeted

Jinesis Lab (UToronto)@JinesisLab

Mar 26

Navigating mental health in the fast-paced world of AI research is a challenge we all face. 🧠 Join @strauss_irene and the @aclmentorship panel at #EACL2026 (Hybrid Rabat Zoom) to discuss staying grounded. Submit/vote on questions here: app.sli.do/event/e4Em5p6fAbw… #MentalHealth

877

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Mar 25

AI is threatening our democratic society—by concentrating power, narrowing how we think, and flooding institutions faster than they can keep up. These risks emerge at the system level, and technical work alone won't fix them. 👉Check out our whitepaper with 25 researchers: zhijing-jin.com/d/2026-ai-ri… 💡We introduce 7 threat models and ways forward. ✍️Led by @davidguzman1120 with @DaveRBanerjee, @blin_kevin, @PepijnCobben, @gcorsi_, @x_angelohuang, @ChanglingXavier, Suvajit Majumder, @psyonp, @SimkoSamuel, @strauss_irene, and @TerryJCZhang Advised by senior co-authors: @ashton1anderson, @Yoshua_Bengio, @MatthiasBethge, @RogerGrosse, Karoline Helbig, @david_lie, Richard Mallah, @radamihalcea, Susan Nesbitt, Susan Perry, @presnick, Stuart Russell, @mrinmayasachan, @bschoelkopf @audreyt and @ZhijingJin Thank you to all the institutional support from @JinesisLab @EuroSafeAI @MPI_IS @CIFAR_News @iapsAI @CARMA_411 @Cambridge_Uni @UofTCompSci @VectorInst @TorontoSRI @Mila_Quebec @LawZero_ @uni_tue @michigan_AI @UMichCSE @AUParis @UNESCO @UCBerkeley @ETH_en @ETH_AI_Center @ELLISInst_Tue @ELLISforEurope @EthicsInAI #CivicAI #AISafety #AIGovernance #Democracy #ResponsibleAI

152

368

31,146

Jinesis Lab (UToronto)

EuroSafeAI retweeted

Jinesis Lab (UToronto)@JinesisLab

Mar 25

🎉 Our lab has 7 papers at #EACL2026 in Rabat this week 🇲🇦 Topics span democracy defense, multi agent safety, causal reasoning, hallucinations, and NLP for social good. Grateful to everyone who contributed to this work 🙌 🙌 Come find us! #NLProc #LLMs #ResponsibleAI

258

Zhijing Jin

EuroSafeAI retweeted

Zhijing Jin

@ZhijingJin

Mar 23

Difficult times—but we keep pushing forward. ✅ Our Trustworthy AI for Good Workshop→accepted at @icmlconf Seoul (18% acceptance) ✅ @NLP4PosImpact Workshop→coming to @emnlpmeeting in Budapest 🇭🇺 AI research can be a force for good—and we’re committed to contribute. More soon.

2,835