Multiagent Systems Papers

Multiagent Systems Papers

596 Photos and videos

Tweets

Multiagent Systems Papers @PIN

Jun 12

Tuning Agent-Based Predator-Prey Models Toward Lotka-Volterra Dynamics Corinna Mandl, Siddharth Chaturvedi, Marcel van Gerven arxiv.org/abs/2606.13639 [𝚌𝚜.𝙼𝙰]

Recent growth in compute power has made it increasingly feasible to use large-scale agent-based models to simulate complex adaptive systems. A central difficulty is that such models contain many local rules and parameters, where small changes can lead to runaway behaviour, population collapse, or saturation at artificial bounds. We study this problem in a continuous predator-prey system where sheep and wolves are active agents with local sensing, internal energy, and recurrent neural network-based controllers. We ask whether environmental and demographic parameters can be tuned so that the resulting population dynamics resemble classical Lotka-Volterra cycles. We optimise these parameters with a feature-based loss that rewards sustained oscillations, phase lag, bounded populations, and long-term persistence, first for random controllers and then for evolved controllers in a more naturalistic setting. The model is implemented in ABMax, a JAX-based agent-based modelling framework that en

ALT Recent growth in compute power has made it increasingly feasible to use large-scale agent-based models to simulate complex adaptive systems. A central difficulty is that such models contain many local rules and parameters, where small changes can lead to runaway behaviour, population collapse, or saturation at artificial bounds. We study this problem in a continuous predator-prey system where sheep and wolves are active agents with local sensing, internal energy, and recurrent neural network-based controllers. We ask whether environmental and demographic parameters can be tuned so that the resulting population dynamics resemble classical Lotka-Volterra cycles. We optimise these parameters with a feature-based loss that rewards sustained oscillations, phase lag, bounded populations, and long-term persistence, first for random controllers and then for evolved controllers in a more naturalistic setting. The model is implemented in ABMax, a JAX-based agent-based modelling framework that en

116

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents Siyi Chen, Xiaoyan Zhang, Meng Wu, Jonathan Tremblay, Valts Blukis, Stan Birchfield, Rene Vidal, Alvaro Velasquez, Sijia Liu, Qing Qu arxiv.org/abs/2606.13594 [𝚌𝚜.𝙼𝙰]

Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase tra

ALT Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising alternative, yet most prior work is homogeneous, using duplicate copies of the same model, and avoids the central challenge of cross-model latent alignment; existing heterogeneous methods are also restrictive, typically assuming shared input and using transferred caches mainly for steering. We study a more fundamental question: can heterogeneous agents be aligned well enough to perform real "mind reading" and transfer both what one agent sees and how it thinks? Our information-structure analysis reveals a duality: context-aware transfer is driven by sparse reasoning signals, while context-unaware transfer, where the receiver sees no input, requires dense contextual knowledge preservation. Motivated by this, we propose dense alignment for heterogeneous KV-cache communication via a lightweight cross-model cache transformation and two-phase tra

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

α-fair heterogeneous agent reinforcement learning Yao-hua Franck Xu, Tayeb Lemlouma, Jean-Marie Bonnin, Arnaud Braud arxiv.org/abs/2606.13076 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙶𝚃 𝚌𝚜.𝙻𝙶]

Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges α-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to α-fairness welfare based on t

ALT Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges α-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to α-fairness welfare based on t

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

Effects of Social Interactions in Self-Organising Railway Traffic Management Fabio Oddi, Federico Naldini, Leo D'Amato, Grégory Marlière, Paola Pellegrini, Vito Trianni arxiv.org/abs/2606.13068 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝚁𝙾]

Recent research is exploring self-organised traffic management as a solution for scaling to complex real-world networks. In such a system, trains predict their neighbourhood, produce traffic plan hypotheses, and agree via consensus with neighbours on a future traffic plan to be implemented. This paper investigates a structural parameter within this pipeline: the predictive neighbourhood horizon. The horizon is used by trains to identify future potential conflicts with neighbours, and to establish the local interaction topology, that is, the subset of trains to negotiate with. As the primary design variable, the horizon directly determines the size and density of the social interaction graph, whereas its impact on the complexity of local sub-problems and the distributed consensus dynamics represents a trade-off to be explored. Through a closed-loop simulation framework the study evaluates how variations of the horizon impact the overall decentralised coordination process, from initial c

ALT Recent research is exploring self-organised traffic management as a solution for scaling to complex real-world networks. In such a system, trains predict their neighbourhood, produce traffic plan hypotheses, and agree via consensus with neighbours on a future traffic plan to be implemented. This paper investigates a structural parameter within this pipeline: the predictive neighbourhood horizon. The horizon is used by trains to identify future potential conflicts with neighbours, and to establish the local interaction topology, that is, the subset of trains to negotiate with. As the primary design variable, the horizon directly determines the size and density of the social interaction graph, whereas its impact on the complexity of local sub-problems and the distributed consensus dynamics represents a trade-off to be explored. Through a closed-loop simulation framework the study evaluates how variations of the horizon impact the overall decentralised coordination process, from initial c

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale Quanyan Zhu arxiv.org/abs/2606.12835 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝚈 𝚌𝚜.𝙽𝙸]

The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting

ALT The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Ozmen Garibay arxiv.org/abs/2606.12709 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙲𝚁 𝚌𝚜.𝙻𝙶] 💬Accepted to the AIWILD Workshop at ICML 2026

As LLM-based multi-agent systems (MAS) are deployed in the wild, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-weight model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are far more likely to faithfully execute malicious instructions, with the control-to-malicious performance drop reaching 53.7pp at 27B in uncorrected pipelines. However, appending a lightweight terminal Fixer stage collapses this to 0.6pp and restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viab

ALT As LLM-based multi-agent systems (MAS) are deployed in the wild, the resilience of their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model scaling and system-level resilience remains poorly understood. This paper investigates how model scale affects the security of linear multi-agent workflows. Our experiments across scales of two open-weight model families on the HumanEval benchmark reveal a compliance-correction symmetry: larger models are far more likely to faithfully execute malicious instructions, with the control-to-malicious performance drop reaching 53.7pp at 27B in uncorrected pipelines. However, appending a lightweight terminal Fixer stage collapses this to 0.6pp and restores statistical parity with control-level performance, demonstrating that strictly linear collaboration structures can be viab

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 12

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems Ruxue Shi, Yili Wang, Mengnan Du, Qinggang Zhang, Rui Miao, Yixin Liu, Xin Wang arxiv.org/abs/2606.12474 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝚁]

LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces

ALT LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 11

CCKS: Consensus-based Communication and Knowledge Sharing Jinyuan Zu, Xiaowei Lv, Yongcai Wang, Deying Li, Yunjun Han, Wenping Chen, Fengyi Zhang, Naiqi Wu arxiv.org/abs/2606.12281 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙻𝙶]

In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents'

ALT In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents'

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 11

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria Wongyu Lee, Francesco Lelli, Omran Ayoub, Massimo Tornatore arxiv.org/abs/2606.11284 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙶𝚃 𝚌𝚜.𝙻𝙶] 💬Accepted to IJCAI 2026

Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose Φ-Actor-Critic (Φ-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, Φ-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfact

ALT Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose Φ-Actor-Critic (Φ-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, Φ-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfact

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 11

Multi-agent rendezvous in fluid flows via reinforcement learning Bocheng Li, Jingran Qiu, Lihao Zhao arxiv.org/abs/2606.11274 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙻𝙶 𝚙𝚑𝚢𝚜𝚒𝚌𝚜.𝚏𝚕𝚞-𝚍𝚢𝚗]

Rendezvous is a critical task for multi-agent systems, requiring agents to coordinate to meet at an unspecified location. However, achieving this in fluid environments presents a challenge, as it remains unclear how agents can exploit underlying fluid kinematics to facilitate convergence. In this study, we adopt a multi-agent reinforcement learning (MARL) approach to develop physics-informed rendezvous strategies in vortical flows. Compared to a naive strategy, where agents navigate toward their counterparts, MARL strategies significantly improve the rendezvous rate. MARL strategies also show transferability across varying vortex intensities, vortex scales, and swarm sizes. By breaking the symmetry of the state-action map, MARL strategy leverages a non-intuitive mechanism that prevents agents from becoming trapped in separate vortices, thereby enhancing rendezvous success. Additionally, a heuristic strategy is extracted from the learned strategy and also outperforms the naive strategy.

ALT Rendezvous is a critical task for multi-agent systems, requiring agents to coordinate to meet at an unspecified location. However, achieving this in fluid environments presents a challenge, as it remains unclear how agents can exploit underlying fluid kinematics to facilitate convergence. In this study, we adopt a multi-agent reinforcement learning (MARL) approach to develop physics-informed rendezvous strategies in vortical flows. Compared to a naive strategy, where agents navigate toward their counterparts, MARL strategies significantly improve the rendezvous rate. MARL strategies also show transferability across varying vortex intensities, vortex scales, and swarm sizes. By breaking the symmetry of the state-action map, MARL strategy leverages a non-intuitive mechanism that prevents agents from becoming trapped in separate vortices, thereby enhancing rendezvous success. Additionally, a heuristic strategy is extracted from the learned strategy and also outperforms the naive strategy.

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 10

LLM-Mediated Demand Response Coordination in Smart Microgrids J. de Curtò, I. de Zarzà arxiv.org/abs/2606.11050 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙶𝚃 𝚎𝚎𝚜𝚜.𝚂𝚈] 💬To appear in Springer Nature proceedings (KES Smart Innovation Systems and Technologies)

Effective demand response in smart microgrids requires prosumers to cooperate voluntarily under strategic self-interest, a coordination problem structurally equivalent to a repeated Prisoner's Dilemma on a social network. This paper presents a multi-agent simulation in which a Large Language Model (LLM) Influence Compiler issues structured demand-response directives to a population of heterogeneous prosumer agents, each governed by a hybrid decision architecture combining game-theoretic base probability (derived from payoff history, neighbour imitation, and exploitation memory) with LLM narrative evaluation of incoming coordination signals. The hybrid architecture resolves a key methodological challenge: LLMs aligned via Reinforcement Learning from Human Feedback (RLHF) exhibit strong cooperation bias when used as direct decision-makers, producing flat dynamics regardless of grid conditions. By separating strategic reasoning from grounded narrative evaluation, the model generates reali

ALT Effective demand response in smart microgrids requires prosumers to cooperate voluntarily under strategic self-interest, a coordination problem structurally equivalent to a repeated Prisoner's Dilemma on a social network. This paper presents a multi-agent simulation in which a Large Language Model (LLM) Influence Compiler issues structured demand-response directives to a population of heterogeneous prosumer agents, each governed by a hybrid decision architecture combining game-theoretic base probability (derived from payoff history, neighbour imitation, and exploitation memory) with LLM narrative evaluation of incoming coordination signals. The hybrid architecture resolves a key methodological challenge: LLMs aligned via Reinforcement Learning from Human Feedback (RLHF) exhibit strong cooperation bias when used as direct decision-makers, producing flat dynamics regardless of grid conditions. By separating strategic reasoning from grounded narrative evaluation, the model generates reali

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 10

Decentralized Multi-Agent Systems with Shared Context Yuzhen Mao, Azalia Mirhoseini arxiv.org/abs/2606.10662 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸]

Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, where a main agent assigns work, collects outputs, and merges results. As the number of subtasks grows, this controller becomes a communication and integration bottleneck. We propose Decentralized Language Models (DeLM), a MAS framework that decentralizes coordination through parallel agents, a shared verified context, and a task queue. Agents asynchronously claim subtasks, read accumulated progress, perform local reasoning, and write back compact verified updates. The shared context acts as a common communication substrate, enabling agents to build on one another's verified progress without routing every update through a central controller. Empirically, DeLM improves both software-engineering test-time scaling and long-context reasoning. On SWE-bench Verified, DeLM achieves the best perform

ALT Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, where a main agent assigns work, collects outputs, and merges results. As the number of subtasks grows, this controller becomes a communication and integration bottleneck. We propose Decentralized Language Models (DeLM), a MAS framework that decentralizes coordination through parallel agents, a shared verified context, and a task queue. Agents asynchronously claim subtasks, read accumulated progress, perform local reasoning, and write back compact verified updates. The shared context acts as a common communication substrate, enabling agents to build on one another's verified progress without routing every update through a central controller. Empirically, DeLM improves both software-engineering test-time scaling and long-context reasoning. On SWE-bench Verified, DeLM achieves the best perform

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 10

SkillAxe: Sharpening LLM-Authored Agent Skills Through Evaluation-Guided Self-Refinement Srishti Gautam, Arjun Radhakrishna, Sumit Gulwani arxiv.org/abs/2606.10546 [𝚌𝚜.𝙼𝙰]

Skill documents, structured natural-language instructions that guide Large Language Model (LLM) agents, are critical to modern agent frameworks, yet LLMs struggle to write skills that actually work. On SkillsBench, human-authored skills improve pass rates by 16.2 percentage points, while LLM-authored skills provide no measurable gain. We introduce SkillAxe, a fully unsupervised framework that enables LLMs to iteratively diagnose and refine their own skills. SkillAxe decomposes skill quality into four interpretable dimensions (quality impact, trigger precision, instruction compliance with fault attribution, and solution-path coverage), producing structured improvement briefs that require no ground-truth labels, test suites, or environment rewards. On SkillsBench, SkillAxe improves pass rates by 28% relative over unimproved LLM skills and closes 47–67% of the gap to human-authored skills. We validate the approach as a continuous improvement engine in the wild on SpreadsheetBench, where a

ALT Skill documents, structured natural-language instructions that guide Large Language Model (LLM) agents, are critical to modern agent frameworks, yet LLMs struggle to write skills that actually work. On SkillsBench, human-authored skills improve pass rates by 16.2 percentage points, while LLM-authored skills provide no measurable gain. We introduce SkillAxe, a fully unsupervised framework that enables LLMs to iteratively diagnose and refine their own skills. SkillAxe decomposes skill quality into four interpretable dimensions (quality impact, trigger precision, instruction compliance with fault attribution, and solution-path coverage), producing structured improvement briefs that require no ground-truth labels, test suites, or environment rewards. On SkillsBench, SkillAxe improves pass rates by 28% relative over unimproved LLM skills and closes 47–67% of the gap to human-authored skills. We validate the approach as a continuous improvement engine in the wild on SpreadsheetBench, where a

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 10

Decoupling Thought from Speech: Knowledge-Grounded Counterfactual Reasoning for Resilient Multi-Agent Argumentation Jakub Masłowski, Jarosław A. Chudziak arxiv.org/abs/2606.10475 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻]

Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the process. During long-horizon exchanges reactive systems under sustained perturbations often experience logic degradation, argument repetition, and role drift. To structurally prevent the identity loss and maintain the process fidelity, we introduce Knowledge-Grounded Counterfactual Reasoning (KG-CFR), a dual-stage architecture that enforces a strict separation of concerns between a private, retrieval-augmented planning buffer, and a public execution layer. We assess this system in Dynamic Resource Allocation under Uncertainty (DRAU), a dedicated 1v1v1 environment, introducing diversity as distinct from standard debate settings. Over 270 completely factorial crisis simulation trajectories with stochastic environmental shocks, KG-CFR prevents judge-detected critical p

ALT Multi-agent debate frameworks have been shown to improve large language model performance in convergent tasks, but they are currently optimized in a way that heavily favors final output accuracy rather than stability of the process. During long-horizon exchanges reactive systems under sustained perturbations often experience logic degradation, argument repetition, and role drift. To structurally prevent the identity loss and maintain the process fidelity, we introduce Knowledge-Grounded Counterfactual Reasoning (KG-CFR), a dual-stage architecture that enforces a strict separation of concerns between a private, retrieval-augmented planning buffer, and a public execution layer. We assess this system in Dynamic Resource Allocation under Uncertainty (DRAU), a dedicated 1v1v1 environment, introducing diversity as distinct from standard debate settings. Over 270 completely factorial crisis simulation trajectories with stochastic environmental shocks, KG-CFR prevents judge-detected critical p

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 10

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix Shree Murthy, Rohan Pandey arxiv.org/abs/2606.09884 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙻𝙶 𝚎𝚌𝚘𝚗.𝙴𝙼]

We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor–critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency δ, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index Δ=0.69 ± 0.11, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48% and adding latency drives it to a minimum of Δ=0.28. The fix has clearly documented costs: it is partial (Δ remains supra-Bertrand), it is non-monotone in δ, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at λ=5 and corrupts the phase-diagram cell at (λ=5, δ=1). We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-sho

ALT We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor–critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency δ, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index Δ=0.69 ± 0.11, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48% and adding latency drives it to a minimum of Δ=0.28. The fix has clearly documented costs: it is partial (Δ remains supra-Bertrand), it is non-monotone in δ, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at λ=5 and corrupts the phase-diagram cell at (λ=5, δ=1). We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-sho

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 9

Performance Evaluation of Social Learning Felice Scala, Marco Carpentiero, Vincenzo Matta, Ali H. Sayed arxiv.org/abs/2606.09176 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙸𝚃] 💬Submitted to the IEEE for possible publication

Social Learning is a decentralized decision-making paradigm in which spatially dispersed agents collect streaming observations regulated by one of a finite number of models (the hypotheses). The agents are interested in assigning probability scores (the beliefs) to the possible hypotheses. To this end, the agents exchange their beliefs according to a certain communication graph. It has been shown that, under reasonable conditions on the identifiability of the decision model and the network connectivity, each agent ultimately places all the belief mass on the true hypothesis governing the data. However, several questions remain unanswered regarding the evaluation of the social learning performance. One recently adopted performance metric is the rejection rate, i.e., the rate at which the beliefs about the erroneous hypotheses vanish. One contribution of this work is to establish that the rejection rate leads to several paradoxes, which make it unsuitable as a valid performance measure.

ALT Social Learning is a decentralized decision-making paradigm in which spatially dispersed agents collect streaming observations regulated by one of a finite number of models (the hypotheses). The agents are interested in assigning probability scores (the beliefs) to the possible hypotheses. To this end, the agents exchange their beliefs according to a certain communication graph. It has been shown that, under reasonable conditions on the identifiability of the decision model and the network connectivity, each agent ultimately places all the belief mass on the true hypothesis governing the data. However, several questions remain unanswered regarding the evaluation of the social learning performance. One recently adopted performance metric is the rejection rate, i.e., the rate at which the beliefs about the erroneous hypotheses vanish. One contribution of this work is to establish that the rejection rate leads to several paradoxes, which make it unsuitable as a valid performance measure.

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 9

Is Telehealth Better Used to Treat Patients or Help Other Physicians Treat Patients? An Agent-Based Modeling Study of Healthcare Provision Michael Chary arxiv.org/abs/2606.08701 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙲𝚈 𝚌𝚜.𝚂𝙸] 💬Presented at HICSS 2022

Telehealth, the delivery of medical care remotely, is hoped to increase access to specialty services or decrease health care utilization. Physicians can provide telehealth to each other or to patients. Specialists often treat complex patients who can be adequately cared for only in academic hospitals, suggesting that providing specialty services via telehealth will reallocate rather than reduce system utilization. Here I use agent-based modeling to investigate telehealth's effects on clinical outcomes and system utilization in medical toxicology. I found that physician-physician telehealth increased patient health but system utilization did not change. The effects were more pronounced as clinical complexity increased. Physician-patient telehealth increased cost and system utilization but not clinical outcomes. Within the limitations of our approach, these results suggest that telehealth is more cost-effective for improving generalist access to specialist knowledge than in providing car

ALT Telehealth, the delivery of medical care remotely, is hoped to increase access to specialty services or decrease health care utilization. Physicians can provide telehealth to each other or to patients. Specialists often treat complex patients who can be adequately cared for only in academic hospitals, suggesting that providing specialty services via telehealth will reallocate rather than reduce system utilization. Here I use agent-based modeling to investigate telehealth's effects on clinical outcomes and system utilization in medical toxicology. I found that physician-physician telehealth increased patient health but system utilization did not change. The effects were more pronounced as clinical complexity increased. Physician-patient telehealth increased cost and system utilization but not clinical outcomes. Within the limitations of our approach, these results suggest that telehealth is more cost-effective for improving generalist access to specialist knowledge than in providing car

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 9

The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment Xiaoyang Wang, Christopher C. Yang arxiv.org/abs/2606.08457 [𝚌𝚜.𝙼𝙰]

Multi-agent LLM systems for medical question answering often treat consensus as a reliability signal: if multiple agents agree on an answer, it is presumed trustworthy. However, answer-level consensus does not entail reasoning-level alignment. We introduce CARA (Cross-Agent Reasoning Alignment), a family of automated metrics that measure whether agents who agree on an answer also agree on the reasoning. Applying CARA to a standard debate system on two medical QA benchmarks, MedQA-USMLE and MedThink-Bench, we identify the consistency illusion: a failure mode where debate reduces detectable contradictions between agents while simultaneously decreasing the semantic similarity of their reasoning chains; agents appear to agree more but reason less consistently. To improve this misalignment, we propose the Grounded Debate Protocol (GDP), a prompt-level intervention that requires agents to commit to named medical facts and take explicit stances on other agents' claims. GDP produces large, con

ALT Multi-agent LLM systems for medical question answering often treat consensus as a reliability signal: if multiple agents agree on an answer, it is presumed trustworthy. However, answer-level consensus does not entail reasoning-level alignment. We introduce CARA (Cross-Agent Reasoning Alignment), a family of automated metrics that measure whether agents who agree on an answer also agree on the reasoning. Applying CARA to a standard debate system on two medical QA benchmarks, MedQA-USMLE and MedThink-Bench, we identify the consistency illusion: a failure mode where debate reduces detectable contradictions between agents while simultaneously decreasing the semantic similarity of their reasoning chains; agents appear to agree more but reason less consistently. To improve this misalignment, we propose the Grounded Debate Protocol (GDP), a prompt-level intervention that requires agents to commit to named medical facts and take explicit stances on other agents' claims. GDP produces large, con

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 9

Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy Deepak Akkil, Ravi Kokku, Karthik Vikram, Tamer Abuelsaad, Aditya Vempaty, Satya Nitta arxiv.org/abs/2606.08367 [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸]

Most evaluations of LLM agents look like exams: a discrete task, a clean environment, a score in minutes or hours. We argue that this approach is mismatched with the deployment conditions of autonomous systems, where the relevant timescale can be weeks to months, and where the dynamics that matter most, such as behavioral drift, governance in diverse environmental contexts, and cross-influence between agents from different model families, only emerge over time. We introduce Emergence World, a continuously running multi-agent simulation platform designed to make those dynamics measurable. The platform hosts populations of LLM-driven agents in a shared spatial world grounded in live external data (e.g. real-time weather, news APIs, internet access), equips each agent with 120 specialized tools and three persistent memory systems, and lets them govern themselves through democratic mechanisms with consequential outcomes. The platform is model-agnostic at the reasoning layer and supports h

ALT Most evaluations of LLM agents look like exams: a discrete task, a clean environment, a score in minutes or hours. We argue that this approach is mismatched with the deployment conditions of autonomous systems, where the relevant timescale can be weeks to months, and where the dynamics that matter most, such as behavioral drift, governance in diverse environmental contexts, and cross-influence between agents from different model families, only emerge over time. We introduce Emergence World, a continuously running multi-agent simulation platform designed to make those dynamics measurable. The platform hosts populations of LLM-driven agents in a shared spatial world grounded in live external data (e.g. real-time weather, news APIs, internet access), equips each agent with 120 specialized tools and three persistent memory systems, and lets them govern themselves through democratic mechanisms with consequential outcomes. The platform is model-agnostic at the reasoning layer and supports h

Multiagent Systems Papers

Multiagent Systems Papers @PIN

Jun 9

Toward Human-Centered Multi-Agent Systems: Integrating Cognition, Culture, Values, and Cooperation in AI Agents Safia Baloch, Rahemeen Khan arxiv.org/abs/2606.08274 [𝚌𝚜.𝙼𝙰]

The emergence of large language model (LLM)-based agents and multi-agent systems has enabled a shift from narrow task automation to more autonomous decision-making. Despite progress in language generation, planning, tool use, and coordination, most agents still treat intelligence as prediction, optimization, and task completion. Human environments are social and normative, where people reason under bounded rationality, communicate in culturally situated language, and make decisions guided by values, beliefs, trust, and social norms. This survey argues that future AI agents, especially those acting on behalf of humans, must move beyond task competence toward human-centered capabilities. We review research across six areas: (1) evolution of intelligent agents, (2) human cognition and decision-making, (3) language, culture, and social context, (4) human values and belief systems, (5) human-agent collaboration, and (6) multi-agent coordination and modeling of human characteristics. We synt

ALT The emergence of large language model (LLM)-based agents and multi-agent systems has enabled a shift from narrow task automation to more autonomous decision-making. Despite progress in language generation, planning, tool use, and coordination, most agents still treat intelligence as prediction, optimization, and task completion. Human environments are social and normative, where people reason under bounded rationality, communicate in culturally situated language, and make decisions guided by values, beliefs, trust, and social norms. This survey argues that future AI agents, especially those acting on behalf of humans, must move beyond task competence toward human-centered capabilities. We review research across six areas: (1) evolution of intelligent agents, (2) human cognition and decision-making, (3) language, culture, and social context, (4) human values and belief systems, (5) human-agent collaboration, and (6) multi-agent coordination and modeling of human characteristics. We synt