G-Designer: Architecting Multi-agent Communication
Topologies via Graph Neural Networks
G-Designer:通过图神经网络构建多智能体通信拓扑结构
Abstract. 摘要。
Recent advancements in large language model (LLM)-based agents have demonstrated that collective intelligence can significantly surpass the capabilities of individual agents, primarily due to well-crafted inter-agent communication topologies. Despite the diverse and high-performing designs available, practitioners often face confusion when selecting the most effective pipeline for their specific task: Which topology is the best choice for my task, avoiding unnecessary communication token overhead while ensuring high-quality solution? In response to this dilemma, we introduce G-Designer, an adaptive, efficient, and robust solution for multi-agent deployment, which dynamically designs task-aware, customized communication topologies. Specifically, G-Designer models the multi-agent system as a multi-agent network, leveraging a variational graph auto-encoder to encode both the nodes (agents) and a task-specific virtual node, and decodes a task-adaptive and high-performing communication topology. Extensive experiments on six benchmarks showcase that G-Designer is: (1) high-performing, achieving superior results on MMLU with accuracy at and on HumanEval with pass@1 at ; (2) task-adaptive, architecting communication protocols tailored to task difficulty, reducing token consumption by up to on HumanEval; and (3) adversarially robust, defending against agent adversarial attacks with merely accuracy drop. The code is anonymously available at https://anonymous.4open.science/r/GDesigner-3063.
近期基于大型语言模型(LLM)的智能体研究进展表明,集体智能可以显著超越单个智能体的能力,这主要归功于精心设计的智能体间通信拓扑。尽管有各种多样且性能优异的设计方案,实践者们在选择最适合特定任务的管道时往往感到困惑:哪种拓扑最适合我的任务,在避免不必要的通信令牌开销的同时确保高质量的解决方案?针对这一困境,我们引入了 G-Designer,这是一种自适应、高效且鲁棒的智能体部署解决方案,能够动态设计任务感知的定制化通信拓扑。具体来说,G-Designer 将多智能体系统建模为多智能体网络,利用变分图自动编码器对节点(智能体)和特定任务的虚拟节点进行编码,并解码出任务自适应且性能优异的通信拓扑。 大量实验在六个基准测试中展示了 G-Designer 的特点:(1) 性能优越,在 MMLU 上达到 0#的准确率,在 HumanEval 上通过率高达 1#; (2) 任务自适应,针对任务难度设计通信协议,在 HumanEval 上最多减少 2#的 token 消耗; (3) 对抗鲁棒,仅造成 3#的准确率下降来防御代理对抗攻击。代码可在 https://anonymous.4open.science/r/GDesigner-3063 匿名获取。
多智能体网络,图机器学习,基于LLM的智能体
1. Introduction 1. 引言
Web data, as a naturally occurring data structure, prevails in social networks (Sun et al., 2023a; Greene et al., 2010), trade networks (Serrano and Boguná, 2003; Fagiolo et al., 2010; Garlaschelli et al., 2007), transportation systems (Bell et al., 1997; Farahani et al., 2013), and recommendation platforms (Fan et al., 2019; Wang et al., 2019), etc. Web data can inherently be represented as graphs, where nodes and edges capture the topological relationships between numerous instances. Recently, there has been a surge of interest in the academic community toward optimizing the topology design for Large Language Model-based multi-agent (LLM-MA) systems, essentially, how to weave the web of agents (Chen et al., 2024c).
网络数据作为一种自然发生的数据结构,在社会网络(Sun 等,2023a;Greene 等,2010)、贸易网络(Serrano 和 Boguná,2003;Fagiolo 等,2010;Garlaschelli 等,2007)、交通系统(Bell 等,1997;Farahani 等,2013)和推荐平台(Fan 等,2019;Wang 等,2019)等领域普遍存在。网络数据本质上可以表示为图,其中节点和边捕捉了众多实例之间的拓扑关系。最近,学术界对基于大型语言模型的多智能体(LLM-MA)系统拓扑设计的优化产生了极大兴趣,本质上是如何编织智能体网络(Chen 等,2024c)。
An LLM-based agent, which integrates the language generation capabilities of LLMs with decision-making and action-execution functionalities (Richards and et al., 2023; Nakajima, 2023; Reworkd, 2023), has exhibited impressive performance across a wide range of tasks, from reasoning (Yao et al., 2023b) and code generation (Shinn et al., 2023) to even more complex applications like video gaming (Wang et al., 2023) and autonomous driving (Jin et al., 2023). Even more exciting, researchers have discovered that combining multiple LLM-based agents–whether implicitly or explicitly–into a team can outperform individual agents when tackling complex tasks (Du et al., 2023; Liang et al., 2023; Wang et al., 2023a; Jiang et al., 2023; Shinn et al., 2023; Zheng et al., 2023; Wu et al., 2023), demonstrating a form of collaborative intelligence reminiscent of human teamwork in multi-agent systems (Zhang et al., 2023a). This emergence of human-esque collective intelligence is fundamentally driven by the design of their topology, i.e., how multi-agents are connected, and how they transmit, exchange, and assimilate information reciprocally.
基于LLM的智能体,该智能体集成了LLMs的语言生成能力、决策和动作执行功能(Richards 和 et al., 2023;Nakajima,2023;Reworkd,2023),在推理(Yao et al., 2023b)和代码生成(Shinn et al., 2023)等广泛任务中表现出令人印象深刻的能力,甚至包括更复杂的视频游戏(Wang et al., 2023)和自动驾驶(Jin et al., 2023)应用。更令人兴奋的是,研究人员发现,将多个LLM基于的智能体——无论是隐式还是显式地——组合成团队,在处理复杂任务时,其表现优于单个智能体(Du et al., 2023;Liang et al., 2023;Wang et al., 2023a;Jiang et al., 2023;Shinn et al., 2023;Zheng et al., 2023;Wu et al., 2023),展示了类似人类团队合作的协同智能形式(Zhang et al., 2023a)。这种类似人类的集体智能的出现,其根本动力在于它们拓扑结构的设计,即多智能体如何连接,以及它们如何相互传输、交换和吸收信息。

图 1. 基于LLM的多智能体通信拓扑设计现有实践。
In practice, prior research has extensively explored how multiple instances of LLMs, referred to as agents (Wang et al., 2024; Xi et al., 2023; Gao et al., 2023; Cheng et al., 2024; Ma et al., 2024), should be structured and organized to converse, collaborate, debate, or even compete. Various topological designs have been investigated, such as chain (Wei et al., 2022; Hong et al., 2023), tree (Yao et al., 2023a; Wu et al., 2023), star (Wu et al., 2023), complete graphs (Qian et al., 2024), random graphs (Qian et al., 2024), optimizable graphs (Zhuge et al., 2024; Zhang et al., 2024), and LLM-based networks (Hao et al., 2023; Liu et al., 2023). These elaborately designed communication topologies have demonstrated remarkable performance with minimal human supervision, bridging the gap between individual intelligence and collective intelligence. Faced with numerous structures available, an inquisitive practitioner might ask: how should I select or design a topology that best suits my task at hand?
在实践中,先前的研究广泛探讨了如何构建和组织多个实例的LLMs,即所谓的代理(Wang 等,2024;Xi 等,2023;Gao 等,2023;Cheng 等,2024;Ma 等,2024),以进行对话、协作、辩论或甚至竞争。已经研究了各种拓扑设计,例如链(Wei 等,2022;Hong 等,2023)、树(Yao 等,2023a;Wu 等,2023)、星形(Wu 等,2023)、完全图(Qian 等,2024)、随机图(Qian 等,2024)、可优化图(Zhuge 等,2024;Zhang 等,2024)以及基于LLM的网络(Hao 等,2023;Liu 等,2023)。这些精心设计的通信拓扑在最小的人为监督下表现出显著性能,弥合了个体智能与集体智能之间的差距。面对众多结构,一个好奇的实践者可能会问:我应该如何选择或设计一个最适合当前任务的拓扑?
The question posed above is non-trivial and, at times, perplexing. A piece of experimental evidence is presented in Figure 2, where we evaluated the performance of different multi-agent structures on the MMLU dataset (Hendrycks et al., 2021), a collection of multiple-choice questions across various subjects. The results reveal that even within the same dataset, the suitability of different communication topologies varies. ❶ Simpler Case: in the simpler ”High School Biology” subset, the chain structure performs comparably to the complex GPTSwarm, while consuming significantly fewer tokens (0.5k versus 7.8k). In this case, the chain structure is clearly a more economical choice. ❷ Harder Case: However, for the more challenging ”College Mathematics” subset, GPTSwarm outperforms the chain structure by , primarily attributed to its intricate topology and prompt optimization. In summary, practitioners often find it challenging to effortlessly identify the most efficient and complexity-adaptive multi-agent topology for a given task.
上述问题非同小可,有时甚至令人困惑。图 2 展示了一项实验证据,其中我们评估了不同多智能体结构在 MMLU 数据集(Hendrycks 等人,2021 年)上的性能,该数据集包含各学科的多选题。结果显示,即使在同一数据集中,不同通信拓扑的适用性也各不相同。❶简单案例:在更简单的“高中生物学”子集中,链结构的表现与复杂的 GPTSwarm 相当,但消耗的标记数显著更少(0.5k 比 7.8k)。在这种情况下,链结构显然是一个更经济的选择。❷困难案例:然而,对于更具挑战性的“大学数学”子集,GPTSwarm 的表现优于链结构,这主要归因于其复杂的拓扑结构和提示优化。总之,实践者往往发现很难轻松地识别出针对特定任务最有效和复杂度自适应的多智能体拓扑结构。

图 2. 在 MMLU 数据集的两个子集“高中生物学”和“大学数学”上,使用四个基于 gpt-4 的智能体测试的不同多智能体协议的标记消耗和准确率。
In light of this dilemma, we propose the LLM-based Multi-agent Communication Protocol (MACP), which aims to establish standardized guidance for future LLM-MA topology design:
鉴于这一困境,我们提出了基于LLM的多元代理通信协议(MACP),旨在为未来的LLM-MA 拓扑设计提供标准化指导:
Multi-agent Communication Protocol (MACP): Given a task/query , an optimal LLM-MA communication topology for should satisfy the following protocol logics: (1) Effectiveness: The communication structure must effectively produce the qualified solution for task ; (2) Complexity-adaptiveness: The topology should dynamically adjust to the complexity of the task, minimizing communication overhead; (3) Adversarial robustness: The topology should maintain reliable under adversarial attacks.
多代理通信协议(MACP):对于任务/查询 ,一个最优的 -MA 通信拓扑应满足以下协议逻辑:(1)有效性:通信结构必须有效地产生任务 的合格解决方案;(2)复杂性适应性:拓扑应动态调整以适应任务的复杂性,最小化通信开销;(3)对抗鲁棒性:拓扑应在对抗攻击下保持可靠性。
The formal definition of MACP is provided in Section 3.3. To design a communication topology that ideally adheres to the MACP principles, we propose an effective, adaptive, and robust LLM-powered multi-agent communication graph designer, termed G-Designer. Technically, G-Designer first architects a multi-agent graph, where each agent, along with its specific properties (e.g., profile (Li et al., 2023a), external API tools (Zhuang et al., 2023), or knowledge base (Chen et al., 2024b)), is represented as a node, and communication between agents forms the edges. G-Designer employs a variational graph auto-encoder to encode the nodes (agents) along with task-specific information, and to decode the resulting collaboration network between agents. This input-dependent paradigm allows G-Designer to
design task-adaptive, high-performing communication topology, which is, at the same time, assured of efficiency and robustness with sparsity regularization. Unlike previous LLM-based multi-agent topology designs, which rely on a static structure for all queries/tasks, G-Designer adaptively crafts customized topologies for different domains and tasks, serving as a fully autonomous and flexible assistant for multi-agent system establishment and deployments.
MACP 的正式定义在 3.3 节中给出。为了设计一个理想地遵循 MACP 原则的通信拓扑,我们提出了一种有效、自适应和鲁棒的基于LLM的多智能体通信图设计器,称为 G-Designer。技术上,G-Designer 首先构建一个多智能体图,其中每个智能体及其特定属性(例如,配置文件(Li 等,2023a)、外部 API 工具(Zhuang 等,2023)或知识库(Chen 等,2024b))被表示为一个节点,智能体之间的通信形成边。G-Designer 使用变分图自动编码器对节点(智能体)以及任务特定信息进行编码,并解码智能体之间的协作网络。这种输入依赖的范式允许 G-Designer 设计任务自适应、高性能的通信拓扑,同时确保了效率和鲁棒性,并具有稀疏正则化。 与基于LLM-的多智能体拓扑设计不同,这些设计依赖于静态结构来处理所有查询/任务,G-Designer 能够根据不同领域和任务自适应地构建定制化拓扑,作为多智能体系统建立和部署的完全自主和灵活的助手。
Our contribution can be summarized as follows:
我们的贡献可以概括如下:
-
❶
Protocol Proposal. We propose the first communication protocol tailored for LLM-powered multi-agent systems, MACP, which comprehensively regulates multi-agent topology design across three dimensions: performance, adaptability, and robustness, and incisively highlights the shortcomings of existing designs.
协议提案。我们提出首个针对LLM-驱动的多智能体系统的通信协议,MACP,该协议从性能、适应性和鲁棒性三个维度全面调节多智能体拓扑设计,并尖锐地指出了现有设计的不足。 -
❷
Practical Solution. We present G-Designer, an effective, adaptive, and robust designer of LLM-powered multi-agent communication graphs. By leveraging a variational graph auto-encoder to construct and process the multi-agent network, G-Designer decodes task-adaptive and high-performing agent communication, which is also equipped with strong robustness against agent-rooted adversarial attacks via dynamic topology adjustment.
实用解决方案。我们提出了一种基于LLM的、有效、自适应且鲁棒的多人通信图设计器 G-Designer。通过利用变分图自动编码器构建和处理多人网络,G-Designer 解码了任务自适应且性能优异的代理通信,同时通过动态拓扑调整,增强了针对代理根攻击的鲁棒性。 -
❸
Experimental Validation. Extensive experiments across six benchmarks show that G-Designer is: (1) high-performing, achieving superior results on MMLU with accuracy at and on HumanEval with pass@1 at , surpassing state-of-the-art topologies by ; (2) task-adaptive, dynamically adjusting topology complexity with task awareness, outperforming state-of-the-art methods on MMLU with a cost of merely compared to their , reducing token consumption by up to ; and (3) adversarially robust, defending against agent adversarial attacks with merely accuracy drop.
实验验证。在六个基准上的大量实验表明,G-Designer 具有以下特点:(1)高性能,在 MMLU 上实现优于现有拓扑结构的性能,准确率达到 ,在 HumanEval 上通过率 ,超越现有拓扑结构 ;(2)任务自适应,动态调整拓扑复杂性,具有任务意识,与现有方法相比仅增加 成本,在 MMLU 上表现更优,降低 token 消耗高达 ;(3)对抗鲁棒,仅导致 准确率下降,抵御代理对抗攻击。
2. Related Works 相关作品
2.1. LLM-agent Collaboration
2.1. LLM-代理协作
While the academic community has widely recognized the success of single LLM-based agents in reasoning (Wei et al., 2022; Yao et al., 2023a; Besta et al., 2023) and planning (Huang et al., 2022; Sun et al., 2023b; Ruan et al., 2023), collaboration among multiple LLM-based agents has swiftly emerged as a powerful approach for integrating the specialized capabilities of different agents, even exceeding the performance of individual LLMs (Park et al., 2023; Chen et al., 2023a; Chan et al., 2023; Chen et al., 2023b; Cohen et al., 2023; Hua et al., 2023). A basic form of collaboration is majority voting (Chen et al., 2024a), where agents operate independently. However, more effective multi-agent collaboration should construct an interconnected system and iterative topology that encourages interdependent interactions and deliberate decision-making (Piatti et al., 2024; Chen et al., 2024a). Building on this insight, pioneering research has explored various multi-agent communication topologies, including: (1) Non-interactive, where agents operate independently without inter-agent communication, as employed in systems like LATM (Zhang et al., 2023b) and LLM-Debate (Du et al., 2023); (2) Chain, where agents are arranged in a sequential structure, each receiving the output from its predecessor and passing information to its successor, utilized by ChatDev (Qian et al., 2023), MetaGPT (Hong et al., 2023), and L2MAC (Holt et al., 2024); (3) Star, where a central administrative agent (often referred to as a commander, manager, teacher, etc.) directs subordinate agents, seen in AutoGen (Wu et al., 2023), SecurityBot (Yan et al., 2024), and MiniGrid (Zhou et al., 2023); (4) Tree, where a root or parent agent hierarchically manages multiple child agents, as in SoA (Ishibashi and Nishimura, 2024); and (5) Graph, encompassing complete graphs (Qian et al., 2024; Zhuge et al., 2024), layered graphs (Liu et al., 2023; Hao et al., 2023; Qian et al., 2024), and random graphs (Qian et al., 2024), among others.
尽管学术界普遍认可基于LLM的代理在推理(Wei 等人,2022;Yao 等人,2023a;Besta 等人,2023)和规划(Huang 等人,2022;Sun 等人,2023b;Ruan 等人,2023)方面的成功,但多个基于LLM的代理之间的协作迅速成为一种强大的方法,用于整合不同代理的专业能力,甚至超过了单个LLMs(Park 等人,2023;Chen 等人,2023a;Chan 等人,2023;Chen 等人,2023b;Cohen 等人,2023;Hua 等人,2023)的性能。协作的基本形式是多数投票(Chen 等人,2024a),其中代理独立操作。然而,更有效的多代理协作应构建一个相互连接的系统以及迭代拓扑,以鼓励相互依赖的交互和深思熟虑的决策(Piatti 等人,2024;Chen 等人,2024a)。 在此基础上,开创性研究探讨了各种多智能体通信拓扑结构,包括:(1)非交互式,智能体独立运行,无需智能体间通信,如 LATM(张等,2023b)和LLM-Debate(杜等,2023)所采用;(2)链式,智能体按顺序排列,每个智能体接收其前驱的输出并将信息传递给后继者,被 ChatDev(钱等,2023)、MetaGPT(洪等,2023)和 L2MAC(霍尔特等,2024)采用;(3)星形,中心管理智能体(通常称为指挥官、经理、教师等)指导下属智能体,如在 AutoGen(吴等,2023)、SecurityBot(严等,2024)和 MiniGrid(周等,2023)中看到;(4)树形,根或父智能体分层管理多个子智能体,如 SoA(石川和西村,2024)所示;(5)图形,包括完全图(钱等,2024;诸葛等,2024)、分层图(刘等,2023;郝等,2023;钱等,2024)和随机图(钱等,2024)等。
2.2. Multi-agents as Graphs
2.2. 多智能体作为图
Graphs, as a fundamental data structure for organizing and representing relationships between entities (Bondy et al., 1976; Zhang and Chartrand, 2006), are widely adopted in the pre-LLM era as a powerful tool to facilitate effective communication in multi-agent reinforcement learning (MARL) (Pesce and Montana, 2023; Hu et al., 2024; Liu et al., 2022). With the rise of LLMs and the proliferation of LLM-based agents (Park et al., 2023; Chen et al., 2023a; Chan et al., 2023; Chen et al., 2023b; Cohen et al., 2023; Hua et al., 2023), researchers have similarly recognized that interactions among multiple agents can naturally be modeled from a graph-based perspective (Chen et al., 2023a; Zhuge et al., 2024; Qian et al., 2024; Liu et al., 2023). Early attempts are implicit, where ChatEval (Chan et al., 2023) and AutoGen (Wu et al., 2023) employ fixed graphs to facilitate information exchange among agents. Subsequent works, such as STOP (Zelikman et al., 2023) and DSPy (Khattab et al., 2023), further explore joint optimization of prompts and inference structures. More recent practices including ChatLLM (Hao et al., 2023), DyLAN (Liu et al., 2023), GPTSwarm (Zhuge et al., 2024), and MacNet (Qian et al., 2024), have explicitly represented the organization of multiple agents as a graph. Specifically, both ChatLLM (Hao et al., 2023) and DyLAN (Liu et al., 2023) utilize a multilayer perception (MLP)-like layered graph, while MacNet (Qian et al., 2024) systematically evaluates various predefined topologies. GPTSwarm (Zhuge et al., 2024) parameterizes and optimizes the fully-connected graph distribution. However, all these attempts, whether predefined static topologies or those iteratively optimized, remain input-independent. Consequently, they fail to be task-aware and adaptively design topologies that suit the complexity of the specific task.
图作为组织和表示实体之间关系的根本数据结构(Bondy 等,1976;张和查兰德,2006),在LLM之前的时代被广泛采用作为促进多智能体强化学习(MARL)中有效沟通的有力工具(Pesce 和 Montana,2023;胡等,2024;刘等,2022)。随着LLMs的兴起和基于LLM的智能体的激增(Park 等,2023;陈等,2023a;Chan 等,2023;陈等,2023b;Cohen 等,2023;华等,2023),研究人员同样认识到,多个智能体之间的交互可以从图的角度自然建模(陈等,2023a;诸葛等,2024;钱等,2024;刘等,2023)。早期的尝试是隐式的,其中 ChatEval(Chan 等,2023)和 AutoGen(吴等,2023)使用固定图来促进智能体之间的信息交换。后续工作,如 STOP(Zelikman 等,2023)和 DSPy(Khattab 等,2023),进一步探索了提示和推理结构的联合优化。 近期的研究实践包括 ChatLLM(郝等,2023 年)、DyLAN(刘等,2023 年)、GPTSwarm(诸葛等,2024 年)和 MacNet(钱等,2024 年),这些研究明确地将多个智能体的组织结构表示为图。具体来说,ChatLLM(郝等,2023 年)和 DyLAN(刘等,2023 年)利用类似于多层感知器(MLP)的多层图,而 MacNet(钱等,2024 年)系统地评估了各种预定义拓扑结构。GPTSwarm(诸葛等,2024 年)对全连接图分布进行参数化和优化。然而,所有这些尝试,无论是预定义的静态拓扑还是迭代优化的,都保持与输入无关。因此,它们无法感知任务并自适应地设计适合特定任务复杂性的拓扑结构。
3. Formalization 3. 形式化
This section establishes the notation, formalizes key concepts in multi-agent systems from a topology perspective, and formally defines our proposed multi-agent communication protocol.
本节建立了符号体系,从拓扑角度对多智能体系统中的关键概念进行形式化,并正式定义了我们提出的多智能体通信协议。
3.1. Topology Structure 3.1. 拓扑结构
We model the multi-agent system as a directed graph , where represents the set of nodes (with ) and denotes the set of edges. Each node corresponds to an agent, which can be formalized as:
我们将多智能体系统建模为一个有向图 ,其中 代表节点集(带有 ), 表示边集。每个节点 对应一个智能体,可以形式化为:
(1) |
where each agent is composed of four key elements: (1) , the language model instance powering ;
(2) , the agent’s pre-assigned role or function;
(3) , representing the agent’s accumulated knowledge and interaction history; and (4) , a set of external tools or plugins available to , such as web searchers (Ma et al., 2023), code compilers (Richards and et al., 2023; Wu et al., 2023; Hong et al., 2023; Bouzenia et al., 2024; Ishibashi and Nishimura, 2024), or file readers (Zhuge et al., 2024; Richards and et al., 2023). Each LLM-based agent receives prompt and generates response :
每个智能体 由四个关键元素组成:(1) ,为 提供动力的语言模型实例;(2) ,智能体的预分配角色或功能;(3) ,代表智能体的累积知识和交互历史;(4) ,一组可供 使用的工具或插件,例如网络搜索器(Ma 等,2023 年)、代码编译器(Richards 等人,2023 年;Wu 等人,2023 年;Hong 等人,2023 年;Bouzenia 等人,2024 年;Ishibashi 和 Nishimura,2024 年)或文件阅读器(Zhuge 等人,2024 年;Richards 等人,2023 年)。每个基于LLM的智能体 接收提示 并生成响应 :
(2) |
where represents the system prompt encompassing its role and state, and denotes the user prompt, which possibly includes the
given tasks, responses/instructions from other agents and externally retrieved knowledge.
代表包含其角色和状态的系统提示, 表示用户提示,可能包括分配的任务、其他代理的响应/指令以及从外部检索到的知识。
The connectivity of can also be characterized by a (non-symmetric) adjacency matrix , where if , otherwise . Each edge represents the flow of information from agent to agent .
的连通性也可以通过一个(非对称的)邻接矩阵 来描述,其中 当且仅当 ,否则 。每条边 代表从代理 到代理 的信息流。
3.2. Communication Pipeline
3.2. 通信管道
Given a query/problem , the multi-agent system engages in rounds of interactive utterances, which collaboratively drive the agents toward producing the final solution based on their cumulative dialogue exchanges.
At the beginning of the -th dialogue round, a mapping function is applied to determine the execution index for each agent:
给定一个查询/问题 ,多智能体系统进行 轮交互对话,通过协同推动智能体根据其累积的对话交流产生最终解决方案 。在第 轮对话开始时,应用映射函数 确定每个智能体的执行索引:
(3) | |||
where is the execution sequence of agents, denotes the in-neighborhood of , and the constraint ensures that an agent can only execute after any agent from which it receives information.
Once the execution order is determined, each agent proceeds to perform input-output operations sequentially:
表示代理的执行序列, 表示 的邻域,约束条件确保代理 只能在接收到信息后的任何代理 执行。一旦确定执行顺序,每个代理将依次执行输入输出操作:
(4) |
where represents the output of agent , which could be a rationale, an answer, or a partial solution, depending on the specific context. The output is generated based on the system prompt and the context prompt, consisting of the query and messages from other agents. At the end of each dialogue round, a certain aggregation function is adopted to generate the answer/solution based on the dialogue history:
代表智能体 的输出,可能是理由、答案或部分解决方案,具体取决于特定上下文。输出 基于系统提示 和上下文提示(包括查询 和其他智能体的消息)生成。在每个对话回合结束时,采用某种聚合函数根据对话历史生成答案/解决方案 。
(5) |
The implementation of the function is flexible, with possible options including majority voting (Chen et al., 2024a; Zhuge et al., 2024; Li et al., 2024), aggregating all agents’ responses and delegating one agent to provide the final answer (Wu et al., 2023; Jiang et al., 2023; Liu et al., 2023; Zhang et al., 2024), or simply using the output of the last agent (Qian et al., 2024). Through rounds of utterances, either predefined (Qian et al., 2024) or determined by an early-stopping mechanism (Liu et al., 2023), the overall system produces the final answer in response to .
函数 的实现具有灵活性,包括多数投票(Chen 等,2024a;Zhuge 等,2024;Li 等,2024)、汇总所有代理的响应并委托一个代理提供最终答案(Wu 等,2023;Jiang 等,2023;Liu 等,2023;Zhang 等,2024),或简单地使用最后一个代理 的输出(Qian 等,2024)。通过 轮的对话,可以是预定义的(Qian 等,2024)或由早期停止机制决定(Liu 等,2023),整体系统 针对 产生最终答案 。
3.3. MACP Protocol 3.3. MACP 协议
We give the formal definition of MACP Protocol as follows:
我们给出 MACP 协议的正式定义如下:
Definition 0 (Multi-agent Communication Protocol).
Given an LLM-based multi-agent system , we establish the following objective as an optimization principle or protocol:
(6) |
where represents the feasible parameter space of , , is the utility evaluator, measures the computational and communication overhead of the entire graph, and and denote the query description and the multi-agent system after adversarial perturbation, respectively. The first term in Equation 6 corresponds to high performance, aiming to maximize the utility of the system’s output; the second term addresses task-adaptiveness, seeking to minimize system complexity to reduce power consumption and economic cost; and the third term focuses on robustness, constraining the deviation of system output under adversarial attacks.
表示 、 、 的可行参数空间, 是效用评估器, 衡量整个图的计算和通信开销, 和 分别表示对抗扰动后的查询描述和多智能体系统。方程 6 中的第一项对应高性能,旨在最大化系统输出的效用;第二项关注任务适应性,旨在最小化系统复杂性以降低功耗和经济效益;第三项侧重于鲁棒性,限制系统输出在对抗攻击下的偏差。
定义 0(多智能体通信协议)。给定基于LLM的多智能体系统 ,我们建立以下目标作为优化原则或协议:

图 3. 我们提出的 G-Designer 的设计工作流程。
4. G-Designer 4.G-设计师
Figure 3 illustrates how G-Designer adaptively designs communication topologies for any given query. Specifically, the process begins with a few “raw materials”: the input query , the agent set , the profile pool, and the toolset. In the Construct stage, G-Designer leverages a node encoder to construct a multi-agent network along with a task-specific virtual node. In the Design stage, a graph auto-encoder is employed to decode the communication graph topology , which is leveraged for multi-round inter-agent collaboration in the Optimize stage.
图 3 展示了 G-Designer 如何针对任何给定查询自适应地设计通信拓扑。具体来说,该过程从一些“原材料”开始:输入查询 ,代理集 ,配置文件池和工具集。在构建阶段,G-Designer 利用节点编码器构建一个多代理网络以及一个特定任务的虚拟节点。在设计阶段,使用图自动编码器解码通信图拓扑 ,在优化阶段,该拓扑被用于多轮代理间的协作。
4.1. Multi-agent Network Construction
4.1 多智能体网络构建
Given an input query and a set of LLM-agents , G-Designer aims to design a task-adaptive and effective communication topology . We begin by assigning each agent a unique role and profile, as previous research (Wang et al., 2023a) has shown that assigning distinct personas or roles to LLM-based agents can enhance cognitive synergy. Based on these roles, different external tools are allocated to the agents (e.g., Mathematica for a math analyst, Python compiler for a programmer). Thus, we successfully initialize each agent as , as defined in Equation 1.
给定输入查询 和一组LLM-代理 ,G-Designer 旨在设计一个任务自适应和有效的通信拓扑 。我们首先为每个代理分配一个独特的角色和配置文件,因为先前的研究(王等,2023a)表明,将不同的角色或个性分配给LLM-基于的代理可以增强认知协同。基于这些角色,不同的外部工具被分配给代理(例如,Mathematica 用于数学分析师,Python 编译器用于程序员)。因此,我们成功地将每个代理 初始化为 ,如方程 1 所定义。
We proceed to construct a structured multi-agent network as input to G-Designer, represented as , where is the node (agent) feature matrix and represents the connectivity matrix. For the feature matrix , we employ a node encoder to transform each agent’s unique profile into a fixed-length embedding representation:
我们继续构建一个结构化多智能体网络,作为 G-Designer 的输入,表示为 ,其中 是节点(智能体)特征矩阵, 表示连接矩阵。对于特征矩阵 ,我们使用节点编码器将每个智能体的独特配置文件转换为固定长度的嵌入表示:
(7) |
where extracts the textual description of the agent’s LLM backbone and its assigned plugins, and can be realized using small and lightweight text embedding models such as SentenceBERT (Reimers, 2019) or MiniLM (Wang et al., 2020). After encoding the individual agents, we aim to ensure that the multi-agent network incorporates information related to the query , as this query-dependent approach enables G-Designer to be task-aware and adaptive. To this end, we introduce an additional task-specific virtual global node , which is bidirectionally connected to all agent nodes, enabling a global ”storage sink” and facilitating smoother information flow among agents (Shirzad et al., 2023; Tan et al., 2023; Rosenbluth et al., 2024). This task node is encoded by the as follows: .
提取代理的 LLM 背骨及其分配的插件文本描述,而 可以通过使用小型轻量级文本嵌入模型如 SentenceBERT(Reimers,2019)或 MiniLM(Wang 等人,2020)来实现。在编码单个代理之后,我们旨在确保多代理网络包含与查询 相关的信息,因为这种基于查询的方法使 G-Designer 能够感知任务并具有适应性。为此,我们引入了一个额外的特定任务虚拟全局节点 ,该节点双向连接到所有代理节点,实现全局“存储汇”并促进代理之间更流畅的信息流动(Shirzad 等人,2023;Tan 等人,2023;Rosenbluth 等人,2024)。此任务节点由 按以下方式编码: 。
After obtaining the agent node features and the task-specific embedding , we provide a simple anchor topology , which serves as a starting point for G-Designer’s topology design process. For instance, given a code generation task with three agents: manager/programmer/code reviewer, the anchor topology could be configured as a chain structure, i.e., “manager programmer reviewer”, reflecting the typical workflow of code completion. The anchor topology, being either user-defined or automatically generated by LLMs, is often simple and sub-optimal. However, it provides a foundational reference and prior knowledge for G-Designer’s subsequent optimization process. We incorporate the task-specific vertex and its corresponding edges and obtain . Consequently, we establish a task-specific multi-agent network :
在获取代理节点特征 和任务特定嵌入 后,我们提供了一个简单的锚拓扑 ,这作为 G-Designer 拓扑设计过程的起点。例如,给定一个有三个代理的代码生成任务:经理/程序员/代码审查员,锚拓扑可以配置为链结构,即“经理 程序员 审查员”,反映代码补全的典型工作流程。锚拓扑,无论是用户定义的还是由 LLMs 自动生成的,通常很简单且次优。然而,它为 G-Designer 的后续优化过程提供了一个基础参考和先验知识。我们纳入任务特定的顶点 及其对应的边,并得到 。因此,我们建立了一个任务特定的多代理网络 。
(8) | ||||
where can also be jointly denoted as .
也可以共同表示为 。
4.2. Designing Communication Topology
4.2 设计通信拓扑
Building upon the task-specific multi-agent network , G-Designer seeks to establish a more fine-grained and precise communication topology . Drawing inspiration from the variational graph auto-encoder (VGAE) framework (Kipf and Welling, 2016; Zhao and Zhang, 2024), G-Designer employs a VGAE-based encoder-decoder to generate the multi-agent interaction topology, which is formulated as:
基于特定任务的多元代理网络 ,G-Designer 旨在建立一个更精细和精确的通信拓扑 。从变分图自动编码器(VGAE)框架(Kipf 和 Welling,2016;Zhao 和 Zhang,2024)中汲取灵感,G-Designer 采用基于 VGAE 的编码器-解码器 来生成多元代理交互拓扑,其公式为:
(9) |
where is the encoder-decoder architecture with parameters , is the encoder module, is the decoder module. The encoder utilizes posterior probabilities to encode the node embeddings
into low-dimensional latent vector representations , which can be formulated as:
是具有参数 的编码器-解码器架构, 是编码器模块, 是解码器模块。编码器利用后验概率将节点嵌入编码成低维潜在向量表示 ,可以表示为:
(10) | |||
where is the matrix of mean vectors ; similarly . The choice of GNN backbone can be customized as needed; here, we utilize a simple two-layer GCN (Kipf and Welling, 2017). , , and denote the -th column of , , and , respectively. The encoder is parameterized by . Following the encoding phase, the decoder employs the latent representations to generate a comprehensive blueprint for multi-agent communication. More specifically, the decoder first constructs a parameterized, sketched graph , which is then refined into the final multi-agent communication topology:
是均值向量矩阵 ;同样地, 。GNN 骨干网络的选择可以根据需要定制;在这里,我们使用了一个简单的两层 GCN(Kipf 和 Welling,2017)。 、 和 分别表示 、 和 的第 列。编码器 由 参数化。在编码阶段之后,解码器使用潜在表示来生成多智能体通信的全面蓝图。更具体地说,解码器 首先构建一个参数化的草图图 ,然后将其细化成最终的多智能体通信拓扑结构:
(11) |
At the first step, constructs the fully-connected sketched adjacency matrix from the latent representation :
在第一步, 从潜在表示 构建了全连接的近似邻接矩阵 :
(12) |
whose detailed derivation is as follows:
其详细推导如下:
(13) | |||
where with parameterized by , , and denotes the temperature coefficient. When approaches zero, Equation 13 essentially return the Bernouli sampling result for .
The resulting matrix represents a densely-connected, non-negative graph distribution, indicating an overly complex and resource-intensive pair-wise communication structure, which is not yet suitable for guiding multi-agent collaboration. To align with G-Designer’s objectives of task adaptiveness and minimizing costs, we apply a refinement decoder to refine the sketched into a compact, sparse, and highly informative communication graph, instantiated by a regularization objective:
在由 、 和 参数化的 中, 表示温度系数。当 趋近于零时,方程 13 本质上返回 的伯努利采样结果。得到的矩阵 代表一个密集连接、非负图分布,表明一个过于复杂且资源密集的成对通信结构,目前还不适合指导多智能体协作。为了与 G-Designer 的任务适应性和最小化成本的目标相一致,我们应用一个细化解码器 将草图 细化成一个紧凑、稀疏且高度信息化的通信图,通过正则化目标实现。
(14) | |||
where is the top- columns of left singular matrix , is a coefficient hyperparameter, is an optimizable weight matrix, denotes the Frobenius norm and where is the -th singular value of .
is the desired sparse topology, which is decomposed as . In Equation 14, the first and second terms are jointly denoted as anchor regularization, which encourage the learned to maintain similarity with both the original and the anchor topology. The third term, denoted as sparsity regularization, though appearing to minimize the nuclear norm of , essentially sparsifies , since holds due to . Therefore, Equation 14 achieves two key goals: (1) producing a sparse, refined communication topology, and (2) constraining the design to remain grounded in practical intuition. The resulting communication design can be represented as follows:
是左奇异矩阵 的前 列, 是系数超参数, 是可优化的权重矩阵, 表示 Frobenius 范数, 中 是 的第 个奇异值。 是期望的稀疏拓扑,它被分解为 。在方程 14 中,第一项和第二项共同表示锚正则化,它鼓励学习到的 与原始 和锚拓扑保持相似性。第三项,表示为稀疏正则化,虽然看起来是使 的核范数最小化,但实际上由于 的原因,本质上使 稀疏化。因此,方程 14 实现了两个关键目标:(1)生成一个稀疏、精细的通信拓扑,以及(2)将设计约束在实用直觉的基础上。结果通信设计可以表示如下:
(15) |
At this stage, we have successfully distilled a lightweight and informative collaboration network from the roughly constructed task-specific multi-agent network , which is now ready to guide inter-agent message passing in the following process.
在这个阶段,我们已成功从大致构建的任务特定多智能体网络 中提取出一个轻量级且信息丰富的协作网络 ,该网络现在可以指导后续过程中的智能体间消息传递。
4.3. Optimizing G-Designer
4.3.优化 G-Designer
Upon obtaining , the multi-agent utterances and dialogues can proceed as usual using , as detailed in Section 3.2. After rounds of interaction, the agents converge to a final solution . We then give the following optimization objective:
在获得 后,多智能体的话语和对话可以像往常一样使用 进行,具体内容见第 3.2 节。经过 轮交互后,智能体收敛到最终解 。然后我们给出以下优化目标:
(16) |
where and are the parameters of the encoder and decoder , respectively, is the parameter space and denotes the mathematical expectation. Equation 16 aims to maximize the utility of the generated solution, but it is inherently intractable and non-differentiable, as often depends on external API calls (Li et al., 2023b; Hendrycks et al., 2021). To address this, following standard approaches in multi-agent structure design (Zhuge et al., 2024; Zhang et al., 2024), we apply policy gradient (Williams, 1992) to approximate and optimize Equation 16:
和 分别是编码器 和解码器 的参数, 表示参数空间, 表示数学期望。方程 16 旨在最大化生成解的效用,但它本质上难以处理且不可微,因为 通常依赖于外部 API 调用(Li 等,2023b;Hendrycks 等,2021)。为了解决这个问题,我们遵循多智能体结构设计中的标准方法(Zhuge 等,2024;Zhang 等,2024),应用策略梯度(Williams,1992)来近似和优化方程 16:
(17) |
where , are indepently samples from , and are the corresponding output. calculates the probability of being sampled, which can be expressed as . Through iterative optimization guided by Equations 14 and 16 over a limited set of queries as the “training set”, G-Designer efficiently develops task-awareness and the capability to strategically design the agent network, achieving truly task-customized multi-agent topology design.
和 是从 中独立抽取的样本, 是相应的输出。 计算被抽取的 的概率,可以表示为 。通过在有限的查询集上,由方程 14 和 16 引导的迭代优化,作为“训练集”,G-Designer 有效地发展了任务感知能力和战略性地设计代理网络的能力,实现了真正任务定制的多代理拓扑设计。
表 1. 与三种基线性能比较,包括单代理执行、空间通信和时间通信。最佳结果以粗体显示,并列名次以下划线标注。除单代理类别外,所有方法均使用基于 gpt-4 的五个代理。 “Mul.”、“Ada.”和“Rob.”分别表示方法是否支持多代理设置、是否是任务自适应以及是否具有对抗鲁棒性。✗、✓✗和✓表示在这些方面的支持程度分别为无/部分/全部。
Method 方法 | Mul. 乘法 | Ada. Ada.(注:Ada 可能指的是编程语言 Ada,由于它是专有名词,因此保持原文不变。) | Rob. 罗布 | MMLU | GSM8K | MultiArith 多算术 | SVAMP | AQuA | HumanEval 人类评估 | Avg. 平均 |
Vanilla 香草 | ✗ | ✗ | ✗ | 82.14 | 85.40 | 93.15 | 87.18 | 70.34 | 71.68 | 81.65 |
CoT CoT:概念图(Conceptual Overlay) | ✗ | ✗ | ✗ | 82.65↑0.51 | 87.17↑1.77 | 94.79↑1.64 | 88.32↑1.14 | 73.91↑3.57 | 75.52↑3.84 | 83.73 |
ComplexCoT 复杂概念图 | ✗ | ✗ | ✗ | 83.78↑1.64 | 87.62↑2.22 | 95.86↑2.71 | 90.17↑2.99 | 77.58↑7.24 | 74.94↑3.26 | 84.99 |
SC (CoT) SC(CoT) | ✗ | ✗ | ✗ | 82.66↑0.52 | 87.93↑2.53 | 96.88↑3.73 | 88.69↑1.51 | 75.08↑4.74 | 77.30↑5.62 | 84.75 |
SC (ComplexCoT) SC(复杂概念图) | ✗ | ✗ | ✗ | 83.65↑1.51 | 86.14↓0.74 | 96.94↑3.79 | 89.72↑2.54 | 77.69↑7.35 | 77.94↑6.26 | 85.35 |
PHP | ✓ | ✗ | ✗ | 83.45↑1.31 | 95.50↑10.1 | 98.10↑2.84 | 90.02↑3.44 | 79.00↑8.66 | 82.96↑11.36 | 88.17 |
Chain 链 | ✓ | ✗ | ✗ | 82.35↑0.21 | 85.57↑0.17 | 94.38↑1.23 | 83.41↓3.77 | 70.94↑0.60 | 80.88↑9.20 | 82.92 |
Star 星 | ✓ | ✗ | ✗ | 80.79↓1.35 | 85.55↑0.15 | 93.79↓0.64 | 88.09↑0.91 | 68.57↓1.77 | 75.65↑3.97 | 82.07 |
Tree 树 | ✓ | ✗ | ✗ | 81.89↓0.25 | 84.56↓0.84 | 94.60↑1.45 | 89.25↑2.07 | 72.84↑2.50 | 77.38↑5.70 | 83.42 |
Complete Graph 完全图 | ✓ | ✗ | ✗ | 83.15↑1.01 | 86.49↑1.09 | 97.20↑4.05 | 89.48↑2.30 | 79.21↑8.87 | 83.75↑12.07 | 86.55 |
Random Graph 随机图 | ✓ | ✗ | ✗ | 83.76↑1.62 | 86.14↑0.74 | 95.46↑2.31 | 85.41↓1.77 | 74.07↑3.73 | 82.66↑10.98 | 84.58 |
AutoGen 自动生成 | ✓ | ✗ | ✗ | 82.13↓0.01 | 90.06↑7.92 | 93.80↑0.65 | 88.44↓1.26 | 73.65↑3.31 | 85.41↑13.73 | 85.58 |
MetaGPT 元 GPT | ✓ | ✗ | ✗ | - | - | - | - | - | 85.90↑14.22 | 84.90 |
LLM-Blender LLM-Blender LLM-搅拌机 | ✓ | ✗ | ✗ | 81.22↓0.92 | 89.17↑3.77 | 94.27↑1.12 | 88.77↑1.59 | 77.05↑6.71 | - | 86.09 |
LLM-Debate LLM-辩论 | ✓ | ✗ | ✓ | 83.69↑1.55 | 90.23↑4.83 | 96.27↑3.12 | 90.56↑3.38 | 77.52↑7.18 | 83.79↑12.11 | 87.01 |
DyLAN | ✓ | ✓✗ | ✓ | 80.16↓1.98 | 88.16↑2.76 | 94.27↑1.12 | 87.40↑0.22 | 74.16↑3.82 | 89.70↑18.02 | 85.64 |
GPTSwarm | ✓ | ✓✗ | ✓ | 83.98↑1.84 | 89.74↑4.34 | 97.84↑4.69 | 86.42↓0.76 | 78.16↑7.82 | 88.49↑16.81 | 87.32 |
Optimization configuration
优化配置
The overall training objective of our method is formulated as , where represents the optimization target from Equation 16, corresponds to the first and third terms in Equation 14, and is the second term. Given a benchmark consisting of queries, G-Designer begins by optimizing with a small subset of queries and then fixes the learned parameters for testing on the remaining queries. The whole algorithm workflow of G-Designer is depicted in Algorithm 1.
我们的方法的整体训练目标被定义为 ,其中 代表方程 16 中的优化目标, 对应方程 14 中的第一和第三项, 是第二项。给定一个包含 查询的基准 ,G-Designer 首先使用 查询的小子集进行优化,然后固定学习到的参数,在剩余的 查询上进行测试。G-Designer 的整个算法流程如图 1 所示。
5. Experiments 5. 实验
In this section, we conduct extensive experiments to answer the following research questions:
在这一节中,我们进行了广泛的实验,以回答以下研究问题:
-
•
(RQ1) Can G-Designer design effective and high-performing multi-agent communication topologies?
(RQ1) G-Designer 能否设计出有效且高性能的多智能体通信拓扑结构? -
•
(RQ2) Can G-Designer generate more task-adaptive topologies, resulting in less token consumption?
(RQ2) G-Designer 能否生成更多适应任务的拓扑结构,从而减少令牌消耗? -
•
(RQ3) Is G-Designer more robust against adversarial attacks?
(RQ3) G-Designer 对对抗攻击的鲁棒性更强吗? -
•
(RQ4) How sensitive is the proposed G-Designer sensitive to its key components and parameters?
(RQ4) 提出的 G-Designer 对关键组件和参数的敏感性如何?
5.1. Experimental Setup 5.1 实验设置
5.1.1. Datasets and Metrics
5.1.1. 数据集和指标
We evaluate G-Designer’s ability to enhance LLM-MA collaborative intelligence using three major categories of datasets:
General Reasoning: We utilize MMLU (Hendrycks et al., 2021), which provides a comprehensive set of logical reasoning assessments across diverse subjects in the form of multiple-choice questions. The performance is evaluated using accuracy on the generated solutions.
Mathematical Reasoning: To assess mathematical reasoning capabilities, we use GSM8K (Cobbe et al., 2021), MultiArith (Roy and Roth, 2016), SVAMP (Patel et al., 2021), and AQuA (Ling et al., 2017), with accuracy as the evaluation metric across all datasets. Code Generation: We opt for HumanEval (Chen et al., 2021), a widely recognized benchmark for function-level code generation designed to evaluate fundamental programming skills. Performance is measured using pass@1, which reflects the correctness of generated functions across multiple test cases.
我们评估 G-Designer 增强LLM-MA 协作智能的能力,使用三个主要类别的数据集: 一般推理:我们使用 MMLU(Hendrycks 等人,2021 年),它提供了一套全面的逻辑推理评估,涵盖多个主题,以多项选择题的形式呈现。性能通过生成解决方案的准确性进行评估。 数学推理:为了评估数学推理能力,我们使用 GSM8K(Cobbe 等人,2021 年)、MultiArith(Roy 和 Roth,2016 年)、SVAMP(Patel 等人,2021 年)和 AQuA(Ling 等人,2017 年),所有数据集的评价指标均为准确性。 代码生成:我们选择 HumanEval(Chen 等人,2021 年),这是一个广泛认可的函数级代码生成基准,旨在评估基本编程技能。性能通过 pass@1 来衡量,它反映了生成函数在多个测试案例中的正确性。
5.1.2. Baselines 5.1.2.基线
We comprehensively select representative baselines from both single-agent enhancement and multi-agent collaboration methods. For single-agent approaches, we select:
我们全面选择来自单智能体增强和多智能体协作方法的代表性基线。对于单智能体方法,我们选择:

图 4.G-Designer 在 HumanEval 和 GSM8K 基准上设计的通信拓扑案例研究。
-
•
COT (Wei et al., 2022) equips an individual LLM with the capability to generate coherent intermediate reasoning steps.
COT(Wei 等人,2022)使个体LLM具备生成连贯的中间推理步骤的能力。 -
•
ComplexCoT (Fu et al., 2022) is built on CoT, introducing a complexity-based criterion that spans both the prompting (input selection) and decoding (output selection) stages.
ComplexCoT(Fu 等人,2022)基于 CoT 构建,引入了一个跨越提示(输入选择)和解码(输出选择)阶段的基于复杂性的标准。 -
•
Self-Consistency (Wang et al., 2023b) is a reasoning chain ensemble method, working in conjunction with COT or ComplexCoT to ensemble multiple reasoning chains. We use -way for experiments.
自洽性(王等,2023b)是一种推理链集成方法,与 COT 或复杂 CoT 协同工作以集成多个推理链。我们在实验中使用了 -way。 -
•
PHP (Zheng et al., 2023) is a progressive-hint prompting technique, and current state-of-the-art single agent reasoning plugin.
PHP(郑等,2023)是一种渐进式提示提示技术,是目前最先进的单代理推理插件。
For multi-agent collaboration topologies, we select the following:
对于多智能体协作拓扑,我们选择以下方案:
-
•
Chain, Star, and Tree are simple and straightforward topological configurations, formally defined in (Qian et al., 2024).
链、星和树是简单直接的拓扑配置,在(钱等,2024)中正式定义。 -
•
Complete Graph and Random Graph are slso delineated in (Qian et al., 2024), whose execution order is defined by topological ordering.
完全图和随机图也在(Qian 等,2024)中进行了界定,其执行顺序由拓扑排序定义。 -
•
AutoGen (Wu et al., 2023) is one of the earliest frameworks for multi-agent collaboration. We primarily employ “A1: Math Problem Solving” for mathematical tasks, “A5: Dynamic Group Chat” for general reasoning, and “A4: Multi-Agent Coding” for code generation.
AutoGen(吴等人,2023)是多智能体协作的最早框架之一。我们主要使用“A1:数学问题解决”进行数学任务,“A5:动态群聊”进行一般推理,“A4:多智能体编码”进行代码生成。 -
•
MetaGPT (Hong et al., 2023) is a pioneering multi-agent framework specifically designed for software engineering. Therefore, we report its performance only on code generation tasks.
MetaGPT(洪等,2023)是一个专门为软件工程设计的开创性多智能体框架。因此,我们只报告其在代码生成任务上的性能。 -
•
LLM-Debate (Du et al., 2023) enhances system reasoning capabilities by facilitating debates among multiple agents.
LLM-辩论(Du 等人,2023)通过促进多个代理之间的辩论,增强了系统的推理能力。 -
•
LLM-Blender (Jiang et al., 2023) utilizes a GenFuser agent to aggregate solutions from independently operating agents.
LLM-Blender(江等,2023)利用 GenFuser 代理聚合独立运行的代理的解决方案。 -
•
DyLAN (Liu et al., 2023) optimizes agent teams by dynamically assessing the agent importance scores and selecting the most valuable ones.
DyLAN(刘等人,2023)通过动态评估代理重要性得分并选择最有价值的代理来优化代理团队。 -
•
GPTSwarm (Zhuge et al., 2024) conceptualizes a swarm of LLM agents as computational graphs and continuously optimizes its distribution, albeit at a relatively high training cost.
GPTSwarm(诸葛等,2024)将 1001#个智能体组成的群体概念化为计算图,并持续优化其分布,尽管训练成本相对较高。
5.1.3. Implementation Details
5.1.3.实现细节
We access the GPT models via the OpenAI API, and mainly test on gpt-4-1106-preview (gpt-4). we set temperature to 0 for the single execution and single agent baselines and 1 for multi-agent methods. For decision-making in the multi-agent system, we set a summarizer agent to aggregate the dialogue history and produce the final solution , with across all experiments. The is implemented using all-MiniLM-L6-v2 (Wang et al., 2020), with the embedding dimension set to . The anchor topology is predefined as a simple chain structure. The sampling times are set as , and and are set for all experiments. We provide explicit agent profiling for multi-agent methods, following the classical configurations in LLM-MA systems (Sun et al., 2023c; Liu et al., 2023; Zhuge et al., 2024; Yin et al., 2023), and use gpt-4 to generate agent profile pools. For all benchmarks, we merely use queries for the optimization process.
我们通过 OpenAI API 访问 GPT 模型,主要在 gpt-4-1106-preview(gpt-4)上进行测试。对于单次执行和单智能体基线,我们将温度设置为 0,对于多智能体方法,设置为 1。在多智能体系统的决策中,我们设置一个摘要智能体来汇总对话历史并生成最终解决方案 ,所有实验中均使用 。 使用所有-MiniLM-L6-v2(Wang 等,2020)实现,嵌入维度设置为 。锚拓扑 预定义为简单的链结构。采样次数 设置为 , 和 适用于所有实验。我们为多智能体方法提供明确的智能体配置文件,遵循LLM-MA 系统中的经典配置(Sun 等,2023c;Liu 等,2023;Zhuge 等,2024;Yin 等,2023),并使用 gpt-4 生成智能体配置文件池。对于所有基准测试,我们仅使用 查询进行优化过程。
5.2. Performance Evaluation (RQ1)
5.2.性能评估(RQ1)
To assess the effectiveness of G-Designer in designing powerful LLM-MA topologies, we conducted evaluations using five instances of gpt-4, with results outlined in Table 1. We can draw two key observations (Obs.):
为了评估 G-Designer 在设计强大的LLM-MA 拓扑结构中的有效性,我们使用 gpt-4 的五个实例进行了评估,结果如表 1 所示。我们可以得出两个关键观察(观察):
Obs. ❶ Meticulously designed multi-agent topology is crucial for collective intelligence. As demonstrated in Table 1, not all multi-agent topologies outperform single-agent reasoning approaches. In some cases, such as the Star and Tree structures, performance even falls short of vanilla gpt-4, with accuracy drops of and , respectively, on the MMLU dataset. However, more customized and adaptive topology designs, like AutoGen, DyLAN, and GPTSwarm, consistently show improvements of on HumanEval, significantly outperforming single-agent baselines such as PHP and CoT. This clearly demonstrates the emergent power of collective intelligence.
观察.❶精心设计的多智能体拓扑对于集体智能至关重要。如表 1 所示,并非所有多智能体拓扑都优于单智能体推理方法。在某些情况下,例如星形和树形结构,性能甚至低于 vanilla gpt-4,分别在 MMLU 数据集上准确率下降为 和 。然而,更定制化和自适应的拓扑设计,如 AutoGen、DyLAN 和 GPTSwarm,在 HumanEval 上持续显示出 的改进,显著优于 PHP 和 CoT 等单智能体基线。这清楚地展示了集体智能的涌现力量。
Obs. ❷ G-Designer is effective in designing powerful LLM-MA topologies. As shown in Table 1, G-Designer achieves the best performance in five out of six benchmarks, and on the GSM8K benchmark, it trails only PHP with a accuracy improvement. On the HumanEval benchmark, G-Designer surpasses MetaGPT, a specialized multi-agent code generation framework, by at pass@1, and outperforms state-of-the-art multi-agent collaboration frameworks like GPTSwarm and DyLAN by margins of .
Overall, G-Designer demonstrates exceptional performance in topology design across a wide range of tasks.
注:❷ G-Designer 在设计强大的 LLM-MA 拓扑结构方面非常有效。如表 1 所示,G-Designer 在六个基准测试中的五个中取得了最佳性能,在 GSM8K 基准测试中,其准确率仅比 PHP 低 。在 HumanEval 基准测试中,G-Designer 在 pass@1 上超越了专门的多代理代码生成框架 MetaGPT,高出 ,并且比最先进的多代理协作框架如 GPTSwarm 和 DyLAN 高出 。总体而言,G-Designer 在广泛的任务中展示了在拓扑设计方面的卓越性能。
5.3. Adaptiveness Evaluation (RQ2)
5.3 适应性评估(RQ2)

图 5. 在 MMLU、HumanEval、GSM8K 和 SVAMP 基准测试中,不同多智能体通信拓扑的性能指标和提示标记消耗的可视化。每个点的直径与其 轴值成比例。
Task-adaptive multi-agent network design not only improves task performance but also regulates the complexity of the topology according to the task’s difficulty. A key benefit of this adaptivity is that it prevents the use of overly complex structures for simple tasks, thus minimizing unnecessary communication costs—in the case of LLM-MA, reducing token consumption. Figure 4 visualizes the different topologies designed by G-Designer for varying query difficulties on the HumanEval and GSM8K benchmarks, while Figure 5 compares the token consumption of G-Designer against various baselines. We summarize several interesting observations:
任务自适应多智能体网络设计不仅提高了任务性能,还能根据任务的难度调节拓扑结构的复杂性。这种自适应的关键好处是防止在简单任务中使用过于复杂的结构,从而最小化不必要的通信成本——在LLM-MA 的情况下,减少令牌消耗。图 4 展示了 G-Designer 为 HumanEval 和 GSM8K 基准测试中不同查询难度设计的不同拓扑结构,而图 5 比较了 G-Designer 与各种基线的令牌消耗。我们总结了几个有趣的观察结果:
Obs. ❸ G-Designer is highly task-aware. As shown in Figure 4, the multi-agent topologies generated by G-Designer are highly dependent on the specific task context and its difficulty. In Case a, despite having five gpt-4 agents available as design resources, G-Designer identified the task of designing a strlen(string) function as relatively simple. It streamlined the topology by removing unnecessary agents, such as “bug fixer” and “test analyst”, and retained only a minimal “Algorithm Designer Programmer” structure to solve the problem. In contrast, for the more complex Case c and Case e, G-Designer crafted a more intricate communication graph. These cases highlight the strong task-aware and task-adaptive capabilities of G-Designer.
观察⑧ G-Designer 具有高度的任务感知能力。如图 4 所示,G-Designer 生成的多智能体拓扑结构高度依赖于特定任务上下文及其难度。在案例 a 中,尽管有五个 gpt-4 智能体作为设计资源可用,G-Designer 仍将设计 strlen(string)函数的任务视为相对简单。它通过移除不必要的智能体,如“bug 修复者”和“测试分析师”,仅保留一个最小的“算法设计师 程序员”结构来解决问题。相比之下,对于更复杂的案例 c 和案例 e,G-Designer 构建了一个更复杂的通信图。这些案例突出了 G-Designer 强大的任务感知和任务适应性能力。
Obs. ❹ G-Designer is a token-saving and economical assistant. Figure 5 illustrates the differences in prompt token consumption between G-Designer and several representative multi-agent designs. We observe that simpler topologies, such as complete graphs and random graphs, consume fewer tokens but show significantly weaker performance. More complex communication structures, like GPTSwarm and DyLAN, achieve superior performance, albeit at the cost of excessive token consumption. For instance, DyLAN’s cost on GSM8K is that of the random graph, reaching a substantial . In contrast, G-Designer elegantly balances both efficiency and task performance, achieving the highest performance across all four benchmarks while maintaining the lowest token cost. For example, on SVAMP, G-Designer surpasses DyLAN by while using only of DyLAN’s token consumption.
观测. ❹ G-Designer 是一个节省令牌且经济的助手。图 5 展示了 G-Designer 与几个代表性多智能体设计在提示令牌消耗方面的差异。我们观察到,如完全图和随机图等简单的拓扑结构消耗的令牌较少,但性能明显较弱。更复杂的通信结构,如 GPTSwarm 和 DyLAN,虽然性能优越,但代价是过度消耗令牌。例如,DyLAN 在 GSM8K 上的成本是随机图的 ,达到一个显著的 。相比之下,G-Designer 精妙地平衡了效率和任务性能,在所有四个基准测试中实现了最高的性能,同时保持了最低的令牌成本。例如,在 SVAMP 上,G-Designer 比 DyLAN 高出 ,而只使用了 DyLAN 令牌消耗的 。
5.4. Robustness Verification (RQ3)
5.4.鲁棒性验证(RQ3)

图 6. 我们比较了在 MMLU 上遭受提示攻击前后,各种多智能体框架的准确率(%)。
In this section, we compare the robustness of different topology designs when subjected to adversarial attacks. Following (Zhuge et al., 2024), we simulate a system prompt attack on one of the five agents, with the results shown in Figure 6. We observe the following:
在本节中,我们比较了不同拓扑结构设计在遭受对抗攻击时的鲁棒性。遵循(Zhuge 等人,2024 年)的方法,我们对五个智能体中的一个进行了系统提示攻击模拟,结果如图 6 所示。我们观察到以下情况:
Obs. ❺ G-Designer is a robust defender against adversarial attacks. As seen in Figure 6, many trivial structures, such as chain or complete graph, experience significant performance degradation under partial system attacks, with drops as high as . Among more sophisticated structures, GPTSwarm, benefiting from its specialized node optimization mechanism, only suffers a minor accuracy decline. However, other methods fare less well, with DyLAN and AutoGen showing accuracy drops of and , respectively. Remarkably, G-Designer demonstrates exceptional robustness against adversarial attacks, maintaining nearly identical performance pre- and post-attack. This resilience can be attributed to its agent encoding capability, which, during optimization, can detect malicious inputs and prune the corresponding edges.
G-Designer 是对抗攻击的强大防御者。如图 6 所示,许多简单结构,如链或完全图,在部分系统攻击下性能显著下降,降幅高达 。在更复杂的结构中,得益于其专门的节点优化机制,GPTSwarm 仅遭受轻微的 精度下降。然而,其他方法表现不佳,DyLAN 和 AutoGen 分别出现 和 的精度下降。值得注意的是,G-Designer 在对抗攻击中表现出非凡的鲁棒性,攻击前后性能几乎相同。这种韧性可归因于其代理编码能力,在优化过程中可以检测恶意输入并修剪相应的边。
5.5. Ablation Study & Sensitivity Analysis (RQ4)
5.5.消融研究 & 敏感性分析(RQ4)

图 7.(左) 对代理数量的敏感性分析 。(右) 在干净和对抗攻击设置下,对两种正则化的消融研究,在 MMLU 基准上进行测试。
Sensitivity Analysis 敏感性分析
We compare the performance of the chain structure, GPTSwarm, and G-Designer across varying numbers of agents . As shown in Figure 7 (Left), with the increase in agent count, the simple chain-style structure exhibits marginal performance improvements and poor scaling capacity. In contrast, G-Designer demonstrates a stronger emergent capability, where the involvement of more agents leads to notable performance gains.
我们比较了链结构、GPTSwarm 和 G-Designer 在不同代理数量下的性能。如图 7(左)所示,随着代理数量的增加,简单的链式结构表现出微小的性能提升和较差的扩展能力。相比之下,G-Designer 展现出更强的涌现能力,更多代理的参与导致显著性能提升。
Ablation Study 消融研究
We report results for two variants of ourmethod: (1) G-Designer w/o SR, which removes the sparsity regularization in Equation 14, and (2) G-Designer w/o Anchor, which excludes the anchor structure . As shown in Figure 7 (Right), the removal of consistently leads to performance degradation, while the absence of sparsity regularization makes the system more vulnerable to adversarial attacks in adversarial settings.
我们报告了两种方法变体的结果:(1)无 SR 的 G-Designer,移除了方程 14 中的稀疏正则化,以及(2)无锚点的 G-Designer,排除了锚结构 。如图 7(右)所示,移除 始终会导致性能下降,而在对抗设置中,缺乏稀疏正则化使得系统更容易受到对抗攻击。
6. Conclusion 6. 结论
In this paper, we first present the LLM-based Multi-agent Communication Protocol (MACP), which aims to provide insightful guidance for designing complex multi-agent systems. Furthermore, we propose an effective, adaptive, and robust LLM-powered multi-agent communication graph designer, termed G-Designer, to facilitate the automated design of collaborative AI systems. G-Designer is highly task-aware, dynamically crafting compact and robust communication topologies based on the complexity of the task at hand. We hope that G-Designer will inspire future research on the emergence of self-organizing and self-evolving collective intelligence.
本文首先提出了基于LLM的 Multi-agent Communication Protocol(MACP),旨在为设计复杂多智能体系统提供有见地的指导。此外,我们提出了一种有效、自适应和鲁棒的LLM驱动的多智能体通信图设计器,称为 G-Designer,以促进协作人工智能系统的自动化设计。G-Designer 具有高度的任务感知能力,根据当前任务复杂性动态构建紧凑且鲁棒的通信拓扑。我们希望 G-Designer 能够激发未来关于自组织和自进化集体智能出现的研究。
Acknowledgements. 致谢。
To Robert, for the bagels and explaining CMYK and color spaces.致罗伯特,感谢你提供的百吉饼以及解释 CMYK 和色彩空间。
References
- (1)
- Bell et al. (1997) Michael GH Bell, Yasunori Iida, et al. 1997. Transportation network analysis. (1997).
- Besta et al. (2023) Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. , arXiv:2308.09687 pages.
- Bondy et al. (1976) John Adrian Bondy, Uppaluri Siva Ramachandra Murty, et al. 1976. Graph theory with applications. Vol. 290. Macmillan London.
- Bouzenia et al. (2024) Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2024. Repairagent: An autonomous, llm-based agent for program repair. arXiv preprint arXiv:2403.17134 (2024).
- Chan et al. (2023) Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. arXiv e-prints (Aug. 2023). arXiv:2308.07201
- Chen et al. (2023b) Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, and Haoyang Zhang. 2023b. Gamegpt: Multi-agent collaborative framework for game development. arXiv preprint arXiv:2310.08067 (2023).
- Chen et al. (2024b) Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024b. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762.
- Chen et al. (2024a) Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, and James Zou. 2024a. Are more llm calls all you need? towards scaling laws of compound inference systems. arXiv preprint arXiv:2403.02419 (2024).
- Chen et al. (2021) Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. , arXiv:2107.03374 pages.
- Chen et al. (2023a) Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan, Yujia Qin, Yaxi Lu, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2023a. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents. arXiv:2308.10848
- Chen et al. (2024c) Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2024c. Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence. arXiv preprint arXiv:2407.07061 (2024).
- Cheng et al. (2024) Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang, Zekai Wang, Feng Yin, Junhua Zhao, and Xiuqiang He. 2024. Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects. CoRR abs/2401.03428 (2024).
- Cobbe et al. (2021) Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv prepring abs/2110.14168 (2021).
- Cohen et al. (2023) Roi Cohen, May Hamri, Mor Geva, and Amir Globerson. 2023. Lm vs lm: Detecting factual errors via cross examination. arXiv preprint arXiv:2305.13281 (2023).
- Du et al. (2023) Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2023. Improving Factuality and Reasoning in Language Models through Multiagent Debate. CoRR abs/2305.14325 (2023).
- Fagiolo et al. (2010) Giorgio Fagiolo, Javier Reyes, and Stefano Schiavo. 2010. The evolution of the world trade web: a weighted-network analysis. Journal of Evolutionary Economics 20 (2010), 479–514.
- Fan et al. (2019) Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In The world wide web conference. 417–426.
- Farahani et al. (2013) Reza Zanjirani Farahani, Elnaz Miandoabchi, Wai Yuen Szeto, and Hannaneh Rashidi. 2013. A review of urban transportation network design problems. European journal of operational research 229, 2 (2013), 281–302.
- Fu et al. (2022) Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. 2022. Complexity-based prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations.
- Gao et al. (2023) Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2023. Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. CoRR abs/2312.11970 (2023).
- Garlaschelli et al. (2007) Diego Garlaschelli, Ticiana Di Matteo, Tomaso Aste, Guido Caldarelli, and Maria I Loffredo. 2007. Interplay between topology and dynamics in the World Trade Web. The European Physical Journal B 57 (2007), 159–164.
- Greene et al. (2010) Derek Greene, Donal Doyle, and Padraig Cunningham. 2010. Tracking the evolution of communities in dynamic social networks. In 2010 international conference on advances in social networks analysis and mining. IEEE, 176–183.
- Hao et al. (2023) Rui Hao, Linmei Hu, Weijian Qi, Qingliu Wu, Yirui Zhang, and Liqiang Nie. 2023. ChatLLM Network: More brains, More intelligence. , arXiv:2304.12998 pages.
- Hendrycks et al. (2021) Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. Proceedings of the International Conference on Learning Representations (ICLR) (2021).
- Holt et al. (2024) Samuel Holt, Max Ruiz Luyten, and Mihaela van der Schaar. 2024. L2MAC: Large Language Model Automatic Computer for Extensive Code Generation. In The Twelfth International Conference on Learning Representations.
- Hong et al. (2023) Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, and Chenglin Wu. 2023. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. , arXiv:2308.00352 pages.
- Hu et al. (2024) Shengchao Hu, Li Shen, Ya Zhang, and Dacheng Tao. 2024. Learning multi-agent communication from graph modeling perspective. arXiv preprint arXiv:2405.08550 (2024).
- Hua et al. (2023) Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. 2023. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227 (2023).
- Huang et al. (2022) Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International conference on machine learning. PMLR, 9118–9147.
- Ishibashi and Nishimura (2024) Yoichi Ishibashi and Yoshimasa Nishimura. 2024. Self-organized agents: A llm multi-agent framework toward ultra large-scale code generation and optimization. arXiv preprint arXiv:2404.02183 (2024).
- Jiang et al. (2023) Dongfu Jiang, Xiang Ren, and Bill Yuchen Lin. 2023. LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 14165–14178.
- Jin et al. (2023) Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, and Jiangtao Gong. 2023. SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model. arXiv:2309.13193
- Khattab et al. (2023) Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al. 2023. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714 (2023).
- Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
- Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations. OpenReview.net.
- Li et al. (2023a) Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023a. CAMEL: Communicative Agents for ”Mind” Exploration of Large Language Model Society. In NeurIPS.
- Li et al. (2024) Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. 2024. More Agents Is All You Need. CoRR abs/2402.05120 (2024).
- Li et al. (2023b) Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. 2023b. Api-bank: A comprehensive benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244 (2023).
- Liang et al. (2023) Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. 2023. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate. CoRR abs/2305.19118 (2023).
- Ling et al. (2017) Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. 2017. Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146 (2017).
- Liu et al. (2022) Yuntao Liu, Yong Dou, Yuan Li, Xinhai Xu, and Donghong Liu. 2022. Temporal dynamic weighted graph convolution for multi-agent reinforcement learning. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 44.
- Liu et al. (2023) Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. 2023. Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization. CoRR abs/2310.02170 (2023).
- Ma et al. (2023) Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, and Dong Yu. 2023. Laser: Llm agent with state-space exploration for web navigation. arXiv preprint arXiv:2309.08172 (2023).
- Ma et al. (2024) Qun Ma, Xiao Xue, Deyu Zhou, Xiangning Yu, Donghua Liu, Xuwen Zhang, Zihan Zhao, Yifan Shen, Peilin Ji, Juanjuan Li, Gang Wang, and Wanpeng Ma. 2024. Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective. CoRR abs/2402.00262 (2024).
- Nakajima (2023) Yohei Nakajima. 2023. BabyAGI. https://github.com/yoheinakajima/babyagi.
- Park et al. (2023) Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In UIST. ACM, 2:1–2:22.
- Patel et al. (2021) Arkil Patel, Satwik Bhattamishra, and Navin Goyal. 2021. Are NLP models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191 (2021).
- Pesce and Montana (2023) Emanuele Pesce and Giovanni Montana. 2023. Learning multi-agent coordination through connectivity-driven communication. Machine Learning 112, 2 (2023), 483–514.
- Piatti et al. (2024) Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, and Rada Mihalcea. 2024. Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents. arXiv preprint arXiv:2404.16698 (2024).
- Qian et al. (2023) Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. 2023. Communicative Agents for Software Development. , arXiv:2307.07924 pages. 25 pages, 9 figures, 2 tables.
- Qian et al. (2024) Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2024. Scaling Large-Language-Model-based Multi-Agent Collaboration. arXiv preprint arXiv:2406.07155 (2024).
- Reimers (2019) N Reimers. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084 (2019).
- Reworkd (2023) Reworkd. 2023. AgentGPT. https://github.com/reworkd/AgentGPT.
- Richards and et al. (2023) Toran Bruce Richards and et al. 2023. Auto-GPT: An Autonomous GPT-4 Experiment. https://github.com/Significant-Gravitas/Auto-GPT.
- Rosenbluth et al. (2024) Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin, and Martin Grohe. 2024. Distinguished In Uniform: Self Attention Vs. Virtual Nodes. arXiv preprint arXiv:2405.11951 (2024).
- Roy and Roth (2016) Subhro Roy and Dan Roth. 2016. Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413 (2016).
- Ruan et al. (2023) Jingqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao, et al. 2023. Tptu: Task planning and tool usage of large language model-based ai agents. In NeurIPS 2023 Foundation Models for Decision Making Workshop.
- Serrano and Boguná (2003) Ma Angeles Serrano and Marián Boguná. 2003. Topology of the world trade web. Physical Review E 68, 1 (2003), 015101.
- Shinn et al. (2023) Noah Shinn, Beck Labash, and Ashwin Gopinath. 2023. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint abs/2303.11366 (2023). https://doi.org/10.48550/arXiv.2303.11366
- Shirzad et al. (2023) Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J Sutherland, and Ali Kemal Sinop. 2023. Exphormer: Sparse transformers for graphs. In International Conference on Machine Learning. PMLR, 31613–31632.
- Sun et al. (2023c) Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, and Lingpeng Kong. 2023c. Corex: Pushing the boundaries of complex reasoning through multi-model collaboration. arXiv preprint arXiv:2310.00280 (2023).
- Sun et al. (2023b) Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, and Mohit Iyyer. 2023b. Pearl: Prompting large language models to plan and execute actions over long documents. arXiv preprint arXiv:2305.14564 (2023).
- Sun et al. (2023a) Xiangguo Sun, Hong Cheng, Bo Liu, Jia Li, Hongyang Chen, Guandong Xu, and Hongzhi Yin. 2023a. Self-supervised hypergraph representation learning for sociological analysis. IEEE Transactions on Knowledge and Data Engineering 35, 11 (2023), 11860–11871.
- Tan et al. (2023) Zhen Tan, Ruocheng Guo, Kaize Ding, and Huan Liu. 2023. Virtual node tuning for few-shot node classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2177–2188.
- Wang et al. (2023) Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv e-prints, Article arXiv:2305.16291 (May 2023), arXiv:2305.16291 pages. arXiv:2305.16291
- Wang et al. (2024) Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model based Autonomous Agents. Front. Comput. Sci. 18 (2024).
- Wang et al. (2020) Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems 33 (2020), 5776–5788.
- Wang et al. (2019) Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. Kgat: Knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 950–958.
- Wang et al. (2023b) Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023b. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.
- Wang et al. (2023a) Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng Ji. 2023a. Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. , arXiv:2307.05300 pages. work in progress.
- Wei et al. (2022) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
- Williams (1992) Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (1992), 229–256.
- Wu et al. (2023) Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.
- Xi et al. (2023) Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huan, and Tao Gui. 2023. The Rise and Potential of Large Language Model Based Agents: A Survey. arxiv preprint abs/2309.07864 (2023).
- Yan et al. (2024) Yikuan Yan, Yaolun Zhang, and Keman Huang. 2024. Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games. arXiv preprint arXiv:2403.17674 (2024).
- Yao et al. (2023a) Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023a. Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
- Yao et al. (2023b) Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2023b. ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations.
- Yin et al. (2023) Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuan-Jing Huang, and Xipeng Qiu. 2023. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 15135–15153.
- Zelikman et al. (2023) Eric Zelikman, Eliana Lorch, Lester Mackey, and Adam Tauman Kalai. 2023. Self-taught optimizer (stop): Recursively self-improving code generation. arXiv preprint arXiv:2310.02304 (2023).
- Zhang et al. (2024) Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Xu Yu, and Tianlong Chen. 2024. Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems. arXiv preprint arXiv:2410.02506 (2024).
- Zhang et al. (2023a) Jintian Zhang, Xin Xu, and Shumin Deng. 2023a. Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124 (2023).
- Zhang et al. (2023b) Jintian Zhang, Xin Xu, and Shumin Deng. 2023b. Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124 (2023).
- Zhang and Chartrand (2006) Ping Zhang and Gary Chartrand. 2006. Introduction to graph theory. Tata McGraw-Hill 2 (2006), 2–1.
- Zhao and Zhang (2024) Kesen Zhao and Liang Zhang. 2024. Causality-Inspired Spatial-Temporal Explanations for Dynamic Graph Neural Networks. In The Twelfth International Conference on Learning Representations.
- Zheng et al. (2023) Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, and Yu Li. 2023. Progressive-Hint Prompting Improves Reasoning in Large Language Models. , arXiv:2304.09797 pages. Tech Report.
- Zhou et al. (2023) Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, and Bin Liu. 2023. Large language model as a policy teacher for training reinforcement learning agents. arXiv preprint arXiv:2311.13373 (2023).
- Zhuang et al. (2023) Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A Rossi, Somdeb Sarkhel, and Chao Zhang. 2023. Toolchain*: Efficient action space navigation in large language models with a* search. arXiv preprint arXiv:2310.13227 (2023).
- Zhuge et al. (2024) Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. 2024. GPTSwarm: Language Agents as Optimizable Graphs. In Forty-first International Conference on Machine Learning.