Khoa D. Doan

Khoa D. Doan

33 Photos and videos

Tweets

Khoa D. Doan

@khoaddoan

Jun 14

Conventional wisdom tells us that (Reasoning) LLMs do have self-critique. Why? Because they often revise their intermediate steps with backtracking or self-verification. But how strong is this self-critique ability? And more importantly, can we control it during inference to improve reasoning performance?

more replies

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Jun 14

8/ On standard reasoning benchmarks such as AIME24, AIME25, GSM8K, and MATH500, positive steering also leads to performance gains across various models.

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Jun 14

For more information, please take a read: Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan. Decoding the Critique Mechanism in Large Reasoning Models. Paper: arxiv.org/abs/2603.16331 Code: github.com/mail-research/lrm…

Decoding the Critique Mechanism in Large Reasoning Models

Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on...

arxiv.org

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Jun 6

🤖 Are LLMs sufficient reviewers to evaluate scientific work and, critically, are they better at identifying gaps in a paper than human reviewers who increasingly work under time constraints and review overload? 📚 Our latest work, PRISM, attempts to answer this question through the creation of a structured and multidimensional evaluation of reviews across four dimensions: Depth of Analysis, Novelty Assessment, Flaw Identification & Issue Prioritization, and Constructiveness. 🔎 What are the findings?

216

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Jun 6

[2/3] 🧠 Some LLM reviewers match human analytical depth, but some fall into a surface-level trap, over-indexing on presentation anomalies. 💡 Some LLM reviewers outperform human reviewers on grounded novelty verification; however, some systems exhibit measurable novelty hallucination. 🎯 LLM reviewers broadly achieve near-perfect critical issue prioritization, demonstrating a cognitive alignment comparable to human reviewers. 📝 Some LLM reviewers produce the most actionable feedback, but a constructiveness gap relative to human reviewers persists across all systems.

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Jun 6

[3/3] ⚖️ In summary, no single system dominates across all four dimensions: each excels in a distinct niche while leaving structured gaps invisible to aggregate metrics. Ngoc Phan Phuoc Loc, La Viet, Toan Huynh, Thanh Tran Khanh, Duy Nguyen, Tuan Anh Nguyen Pham, Thanh Nguyen, Nitesh Chawla, Wray Buntine, Kok-Seng Wong, Binh T Nguyen. PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers. 🌐 Demo: lnkd.in/gHnUxJHc 📄 Paper: lnkd.in/gk_YdX9Q 🔬 Project Page: lnkd.in/gVkMMQaY 💻 Code: lnkd.in/gDZHgiEE

This link will take you to a page that’s not on LinkedIn

lnkd.in

156

Parshin Shojaee

Khoa D. Doan retweeted

Parshin Shojaee

@ParshinShojaee

May 21

Exciting new paper! 🚨 Why Do Reasoning Models Lose Coverage? We revisit this open question with a data-centric lens 🔍 , and show how “forks in the road” situations in the post-training data can shape—and shrink—model coverage ⤵️

4,554

Khoa D. Doan

Khoa D. Doan

@khoaddoan

May 21

We are pleased to announce the launch of the TRIDENT Grand Challenge @ ACM Multimedia 2026 in Rio de Janeiro. As generative AI continues to advance, deepfakes are rapidly evolving from simple image manipulations into highly realistic multimodal content spanning images, videos, and cloned audio. The rapid progress raises an important challenge for current detection systems: while many models can successfully identify fake content, they often remain black-box systems, offering limited insight into the evidence or reasoning behind their decisions. TRIDENT is designed to address this gap by encouraging more interpretable and evidence-grounded multimodal deepfake analysis. 🔥 The challenge is now in Phase 2, with test sets released and the online leaderboard open for submissions. 📌Challenge Tracks • Image deepfake analysis • Video deepfake analysis • Audio deepfake analysis Participants may choose to compete in one or multiple tracks, depending on their research interests. 📌Core Capabilities The challenge focuses on three key abilities: • Deepfake detection • Fine-grained artifact perception • Evidence-based reasoning without hallucination Participants will engage with structured and open-ended tasks designed to evaluate both forensic understanding and reasoning quality in multimodal settings. 🏆Top-performing teams from each track will be invited to submit a technical paper to the ACM MM 2026 main conference proceedings. 📅 Important Dates (AOE): • Registration Deadline: June 1, 2026 • Final Submission Deadline: June 10, 2026 • Winner Announcement: June 15, 2026 --- More details and participation information are available here: tridentatmm26mgc.github.io/t… We warmly welcome the research community to join us in advancing trustworthy multimodal AI forensics!

105

NeurIPS Conference

Khoa D. Doan retweeted

NeurIPS Conference

@NeurIPSConf

Apr 29

NeurIPS 2026 is accepting submissions for Workshops!   If you are interested in hosting a workshop, read the Call for Workshops neurips.cc/Conferences/2026/… on how to submit a proposal, and be sure to follow the submission guidelines for proposals, with important changes this year neurips.cc/Conferences/2026/…

105

23,523

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Apr 22

NeurIPS'26 Call for Workshops is officially live at neurips.cc/Conferences/2026/…. Important dates: Workshop Application Deadline: June 06, 2026, AoE Workshop Acceptance Notification: July 11, 2026, AoE This year, workshops can be hosted in 3 different locations: Sydney, Paris, and Atlanta @NeurIPSConf

103

NeurIPS Conference

Khoa D. Doan retweeted

NeurIPS Conference

@NeurIPSConf

Mar 27

We want to speak directly to the concern many of you have expressed, and we owe you a clear explanation of what happened, why it happened, and where we stand now. We understand this situation caused genuine alarm and we take that seriously. In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow. This error was due to miscommunication between the NeurIPS Foundation and our legal team; there was never an intention to restrict participation beyond our mandatory compliance obligations. The responsibility for that error is ours as an organization, and we deeply apologize for the alarm and impact this miscommunication had on our community. We have updated the link and clarified the text of our policy, which is consistent with that of ACM and IEEE, as well as other international conferences and NeurIPS in the past. As in previous years, NeurIPS welcomes submissions from all compliant institutions and individuals. We want to reiterate that NeurIPS is a community-driven event, created by and for the community, and strives to be inclusive. The NeurIPS 2026 organizing committee was particularly saddened to learn of this institutional miscommunication. The organizing committee has taken on the responsibility of running the conference this year with the goal of fostering open communication, knowledge sharing, and global scientific discourse. We thank the community for bringing this issue to our attention and working with us through this situation.

264

127

503

498,073

Khoa D. Doan

Khoa D. Doan

@khoaddoan

Mar 27

Official @NeurIPSConf statement (from website): The NeurIPS Foundation, like any entity operating within U.S. legal jurisdiction, is required by law to comply with U.S. sanctions and trade restrictions. Under these regulations, providing 'services' (which would include peer review, editing, and publishing) to Specially Designated Nationals and Blocked Persons, or "SDNs", is strictly prohibited. Consequently, we are unable to accept or publish submissions from any SDN, or any individual or institution that NeurIPS reasonably believes represents or is affiliated with an SDN. The US State Department's Office of Foreign Asset Controls maintains a searchable website that lists all SDNs. NeurIPS will consider submissions from institutions and individuals that the US State Department has categorized as "Non-SDN". sanctionslist.ofac.treas.gov…

NeurIPS Conference

@NeurIPSConf

Mar 26

NeurIPS is aware of the community's concerns regarding the list of sanctions. NeurIPS is an inclusive community focused on free scientific discourse. We deeply value the research that comes from everyone in our community. The present concerns are not about science or academic freedom. They are about legal requirements that apply to the NeurIPS Foundation, which is responsible for complying with sanctions. We are actively consulting legal counsel to fully understand the legal constraints and we will update the NeurIPS community as soon as we have reliable guidance from our lawyers.

536