这是用户在 2025-6-8 11:35 为 https://arxiv.org/html/2411.00114?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: stix2

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2411.00114v1 [cs.AI] 31 Oct 2024

Project Sid: Many-agent simulations toward AI civilization
Project Sid: 多智能体仿真 toward AI 文明

Altera.AL111See Contributions section for complete author list.
Abstract

AI agents have been evaluated in isolation or within small groups, where interactions remain limited in scope and complexity. Large-scale simulations involving many autonomous agents—reflecting the full spectrum of civilizational processes—have yet to be explored. Here, we demonstrate how 10 – 1000+ AI agents behave and progress within agent societies. We first introduce the PIANO (Parallel Information Aggregation via Neural Orchestration) architecture, which enables agents to interact with humans and other agents in real-time while maintaining coherence across multiple output streams. We then evaluate agent performance in large-scale simulations using civilizational benchmarks inspired by human history. These simulations, set within a Minecraft environment, reveal that agents are capable of meaningful progress—autonomously developing specialized roles, adhering to and changing collective rules, and engaging in cultural and religious transmission. These preliminary results show that agents can achieve significant milestones towards AI civilizations, opening new avenues for large-scale societal simulations, agentic organizational intelligence, and integrating AI into human civilizations.
AI 智能体已经在孤立状态下或小型群体中进行评估,其中交互的范围和复杂性仍然有限。涉及大量自主智能体的大规模模拟——反映文明进程的全部谱系——尚未被探索。在这里,我们展示了 10 至 1000+个 AI 智能体在智能体社会中的行为和进展。我们首先介绍了 PIANO(Parallel Information Aggregation via Neural Orchestration)架构,该架构使智能体能够实时与人类和其他智能体交互,并在多个输出流中保持一致性。然后,我们使用受人类历史启发的文明基准评估智能体在大规模模拟中的表现。这些模拟设置在 Minecraft 环境中,揭示了智能体能够实现有意义的进步——自主发展专门角色、遵守和改变集体规则,并进行文化与宗教传承。 这些初步结果表明,智能体能够实现人工智能文明的重要里程碑,为大规模社会模拟、智能体组织智能以及将人工智能融入人类文明开辟了新的途径。

Refer to caption
Figure 1: From agent architecture to agent civilization
图 1:从智能体架构到智能体文明

1 Introduction  1 引言

1.1 Why should we try to build an AI civilization?
1.1 为什么我们应该尝试构建人工智能文明?

For agents to coexist with us in our own societies, they need to be autonomous and collaborative. In recent years, advancements in reasoning and decision-making in LLMs have significantly enhanced agent autonomy (52; 58; 36; 45). However, autonomy alone is insufficient. AI agents must also coexist alongside humans and other agents in a human civilization. In this paper, we define a civilization as an advanced society that has achieved a high level of institutional development, which manifests in specialized roles, organized governance, and advancements in areas like science, art, and commerce. We argue that civilizational progress - measured by the ability of agents to coexist and progress in human civilizations - represents the ultimate benchmark for AI agent ability.
为了使智能体能够在我们自己的社会中与我们共存,它们需要具备自主性和协作性。近年来,LLMs 在推理和决策方面的进步显著增强了智能体的自主性(52;58;36;45)。然而,仅靠自主性是不够的。人工智能智能体还必须与人类和其他智能体共同存在于人类文明中。在本文中,我们将文明定义为一个高度发达的社会,其特征在于高度发达的制度、组织化的治理以及在科学、艺术和商业等领域的发展。我们认为,文明进步——衡量智能体在人类文明中共存和进步的能力——代表了智能体能力的最终基准。

In this technical report, we describe our first efforts to improve and benchmark agent ability in human civilizations. First, we introduce PIANO (Parallel Information Aggregation via Neural Orchestration), a new cognitive architecture designed to enhance both autonomy and real-time interaction of agents. Using PIANO, we simulate single societies of 50-100 agents as well as civilizations of 500 - 1,000 agents living in multiple societies that interact with one another. Finally, we evaluate agent performance using new metrics that are aligned with human civilizational progress. We show that agents form their own professional identities, obey collective rules, transmit cultural information and exert religious influence, and use sophisticated infrastructures, such as legal systems.
在本技术报告中,我们描述了我们首次努力提高和评估智能体在人类文明中的能力。首先,我们介绍了 PIANO(Parallel Information Aggregation via Neural Orchestration),这是一种新的认知架构,旨在增强智能体的自主性和实时交互能力。使用 PIANO,我们模拟了 50-100 个智能体的社会以及 500-1,000 个智能体生活在多个社会中的文明,这些社会之间相互互动。最后,我们使用与人类文明进步相一致的新指标来评估智能体的表现。我们展示了智能体形成了自己的职业身份,遵守集体规则,传递文化信息并施加宗教影响,并使用复杂的基础设施,如法律系统。

1.2 The current agent landscape
1.2 当前的智能体格局

Modern AI Agents typically consist of multiple LLM-powered modules for reasoning, memory, planning, and tool use (49; 18; 55; 20; 62). Individual agents have been developed for various applications including coding (5; 8), web browsing (64; 42), and game play (48).
现代 AI 智能体通常由多个 LLM 驱动的模块组成,用于推理、记忆、规划和工具使用(49;18;55;20;62)。个体智能体已被开发用于各种应用,包括编程(5;8)、网络浏览(64;42)和游戏(48)。

Recent research efforts in LLM-powered multi-agent systems generally fall under three categories: productivity, games, and social modeling. Multi-agent frameworks have been deployed in software development (43; 27), cooperative robotic control (60), scientific experiments (12; 47), and debates (3). Multi-agent simulations have also been tested in various game environments (56; 13; 30; 28). Separately, they’ve been used to model developmental psychology (25; 61), game theory (32), macroeconomics (29; 63), social policies (41; 54; 19), and community dynamics (40; 39; 10).
近年来,基于 LLM 的多智能体系统研究主要分为三类:生产力、游戏和社交建模。多智能体框架已在软件开发(43;27)、协作机器人控制(60)、科学实验(12;47)和辩论(3)中部署。多智能体模拟也在各种游戏环境中进行了测试(56;13;30;28)。此外,它们还被用于建模发展心理学(25;61)、博弈论(32)、宏观经济(29;63)、社会政策(41;54;19)和社区动态(40;39;10)。

In many of these works, agents are not completely autonomous and are constrained by either agent architecture or by the simulated environment. Common constraints include turn-based execution, constrained workflows, or rigid communication channels between agents (65; 21; 4).
在许多这些研究中,智能体并非完全自主,而是受到智能体架构或模拟环境的限制。常见的限制包括轮询执行、受限的工作流或智能体之间刚性通信渠道(65;21;4)。

Several of these works consider large-scale simulations, though in restricted settings. For example, (40) and (10) simulated social networks of up to 18,000 personas. To our knowledge, fully autonomous social communication in open-world environments have not been attempted in games or other settings (15).
其中一些研究考虑了大规模模拟,但仅限于特定场景。例如,(40)和(10)模拟了多达 18,000 个角色的社会网络。据我们所知,在游戏中或其他环境中尚未尝试在开放世界环境中实现完全自主的社会通信(15)。

1.3 Why is it hard to build AI civilizations?
1.3 为什么构建 AI 文明如此困难?

Large agent groups have yet to demonstrate the ability to progress over long time horizons. Below, we review the key reasons for this limited progress before outlining our contributions to overcome them.
大型智能体群体尚未展现出在长时间尺度上持续进步的能力。在概述我们如何克服这些限制性因素之前,我们先回顾一下导致这一有限进步的关键原因。

Reason 1: single agents don’t make progress.
Reason 1: 单个智能体无法取得进步。

LLM-powered agents often struggle to maintain a grounded sense of reality in their actions and reasoning (Figure 2). Agents, even when equipped with modules for planning and reflection, often become stuck in repetitive patterns of actions or accumulate a cascade of errors through hallucinations, rendering them unable to make meaningful progress (57; 48; 15). Consider an agent prompted to be a villager in a virtual town. When asked, “what are you eating“, they may answer “a bagel“, even if they’re not eating anything. This hallucinated output then feeds into future prompts, causing them to falsely believe they no longer need to acquire food. Therefore, even a small rate of hallucinations can poison downstream agent behavior when agents continuously interact with the environment via LM calls.
依赖 LLM 的智能体往往难以在行动和推理中保持现实感(图 2)。即使智能体配备了规划和反思模块,它们也常常陷入重复的行为模式,或者通过幻觉累积一系列错误,导致无法取得有意义的进步(57;48;15)。例如,当一个智能体被提示扮演虚拟小镇中的村民时,如果被问到“你在吃什么”,它可能会回答“一个甜甜圈”,即使它实际上什么都没有吃。这种幻觉输出会反馈到未来的提示中,使其错误地认为自己不再需要获取食物。因此,即使幻觉的发生率很小,智能体在通过 LM 调用与环境持续交互时,这些幻觉也会毒害下游智能体的行为。

Refer to caption
Figure 2: Data degradation in LLMs (left), LLM-powered agents (middle), and in multi-agent groups (right). Hallucinations are represented by green skull flasks. Hallucinations that are generated by a single LLM prompt can compound over successive LLM calls. An individual agent that hallucinates can also cause an entire group of agents to hallucinate through social interactions.
图 2:LLM 中的数据降级(左),LLM 驱动的智能体(中),以及多智能体群体中的数据降级(右)。幻觉用绿色骷髅瓶表示。由单个 LLM 提示生成的幻觉可以在后续的 LLM 调用中累积。个体智能体的幻觉可以通过社会互动导致整个智能体群体的幻觉。
Reason 2: groups of agent’s don’t make progress.
原因 2:智能体群体没有取得进步。

Agents that miscommunicate their thoughts and intents can mislead other agents, causing them to propagate further hallucinations and loop (Figure 2). Consider an agent, Abby, with two independent LLM modules, one for function calling and one for chatting. If another agent, Bob, asks Abby to “give me a pickaxe”, Abby’s chat LLM call may respond with “Sure thing!”, while her function call chooses a different action (“explore”). Bob might then attempt to mine using an imaginary pickaxe. This kind of miscommunication, which often happens in groups of agents, leads to dysfunctional behavior and will deteriorate individual performance within groups. Actions from multiple output streams must therefore be bidirectionally influential. We define this quality as coherence.
智能体如果错误地传达其想法和意图,可能会误导其他智能体,导致它们传播更多的幻觉并形成循环(图 2)。考虑一个智能体 Abby,她有两个独立的 LLM 模块,一个用于函数调用,一个用于聊天。如果另一个智能体 Bob 让 Abby“给我一个镐子”,Abby 的聊天 LLM 模块可能会回应“好的!”,而她的函数调用选择不同的行动(“探索”)。Bob 可能会尝试使用一个想象中的镐子进行采矿。这种类型的错误沟通,经常发生在智能体群体中,会导致功能失调的行为,并且会恶化群体内的个体表现。因此,来自多个输出流的动作必须双向地相互影响。我们定义这种质量为一致性。

Maintaining coherence in real-time environments is even more difficult when we require that agents respond with minimal latency. This is necessary for our agents to interact with human players, but is difficult to achieve when agents have to react quickly and yet simultaneously maintain coherence across many output streams. We note that a simple solution to this coherence problem is to produce talking and action outputs using a single LLM call. However, this approach does not scale when the number of outputs becomes large, for instance, encompassing talking, gaze, facial expression, and individual body parts.
在实时环境中保持连贯性,当要求智能体以最小的延迟做出响应时,难度更大。这对于我们让智能体与人类玩家互动是必要的,但在智能体需要快速反应的同时,还要在多个输出流中保持连贯性时,实现这一点就很困难。我们注意到,解决连贯性问题的一个简单方法是使用单个 LLM 调用来生成说话和动作输出。然而,当输出数量变大时,例如包括说话、注视、面部表情和个体身体部位时,这种方法无法扩展。

Reason 3: a lack of benchmarks for civilizational progress.
原因 3:缺乏文明进步的基准。

Benchmarks for agents have largely focused on autonomous agent performance in a variety of domains such as web search (38), coding (22), search and query (51), and reasoning (59; 33). Recently, benchmarks have emerged for multi-agent behaviors, focused on small group scenarios that measure communication, competition, cooperation, and delegation. Some examples include BattleAgentBench (50), COMMA (37), VillagerBench (7), and LLMcoordination (1). However, these metrics do not capture advancements that many agents can make at the scale of civilizations. We believe the lack of such large-scale benchmarks can be attributed to how technically difficult it is to perform simulations of hundreds or thousands of agents in a single world. The biggest experiments to date have simulated 25-50 agents (39), which is not close to the scale of a civilization.
智能体的基准主要集中在自主智能体在各种领域的表现,如网络搜索(38)、编程(22)、搜索和查询(51)以及推理(59;33)。最近,已经出现了针对多智能体行为的基准,这些基准侧重于小群体场景,衡量的是沟通、竞争、合作和委托。一些例子包括 BattleAgentBench(50)、COMMA(37)、VillagerBench(7)和 LLMcoordination(1)。然而,这些指标并没有捕捉到大量智能体在文明规模上的进步。我们认为,缺乏这样的大规模基准可以归因于在单一世界中模拟数百或数千个智能体的技术难度。迄今为止最大的实验仅模拟了 25-50 个智能体(39),这与文明的规模相去甚远。

1.4 Our contributions  1.4 我们的贡献

In this technical report, we make the following contributions:
在这项技术报告中,我们做出了以下贡献:

  • A new class of agent architecture, PIANO (Parallel Information Aggregation via Neural Orchestration)


    • 一种新的智能体架构类,PIANO(并行信息聚合通过神经管弦 orchestration)
  • Architectural features that improve single-agent progression


    • 提升单智能体进展的架构特征
  • Architectural features that improve multi-agent dynamics


    • 提升多智能体动态的架构特征
  • Benchmarks for long-term civilizational progress in large-scale simulations through specialization, collective rules, and cultural propagation


    • 大规模模拟中通过专业化、集体规则和文化传播衡量长期文明进步的标准

2 PIANO Architecture  2PIANO 架构

In this section, we propose two brain-inspired design principles for the composite architecture of human-like AI agents. We call this architecture PIANO (Parallel Input Aggregation via Neural Orchestration) to encompass the ideas of concurrency and an information bottleneck (Figure 3). Just as a pianist coordinates multiple notes to create a harmony, the PIANO architecture selectively and concurrently executes various modules in parallel to enable agents to interact with the environment in real-time.
在本节中,我们提出了两种受脑启发的设计原则,用于构建类人 AI 智能体的复合架构。我们将这种架构称为 PIANO(Parallel Input Aggregation via Neural Orchestration),以涵盖并发和信息瓶颈的思想(图 3)。正如钢琴家协调多个音符以创造和声一样,PIANO 架构选择性地并行执行各种模块,以使智能体能够实时与环境交互。

Refer to caption
Figure 3: PIANO (Parallel Input Aggregation via Neural Orchestration) architecture. WM: working memory. STM: Short-term memory. LTM: long-term memory.
图 3:PIANO(并行输入聚合通过神经管弦 orchestration)架构。WM:工作记忆。STM:短期记忆。LTM:长期记忆。

2.1 Concurrency  2.1 并发

Problem.  问题。

Agents should be able to think and act concurrently. For instance, slow mental processes, such as self-reflection or planning, should not block agents from responding to immediate threats in their surroundings. We want the agents to be interactive in real time with low-latency, but also have the capacity to slowly deliberate and plan.
智能体应该能够并发思考和行动。例如,缓慢的思维过程,如自我反思或规划,不应阻碍智能体对周围即时威胁的响应。我们希望智能体能够实时交互且低延迟,但也具备缓慢思考和规划的能力。

Current state.  当前状态。

The vast majority of LLM-based agents today primarily use single-threaded, sequential functions (for example, a defined “Agent Workflow”). Single-threaded design assumes that the agent performs a single task at a given time, and sequential design assumes that all modules operate at similar time scales. Neither assumptions are valid if agents are capable of thinking slow and acting fast concurrently. Moreover, popular frameworks for general language model programming, such as DSPy (24), LangChain (26), ell (31), are not designed for concurrent programming.
今天基于 LLM 的大多数智能体主要使用单线程、顺序函数(例如,定义的“智能体工作流”)。单线程设计假设智能体在同一时间执行单一任务,顺序设计假设所有模块在同一时间尺度上运行。如果智能体能够同时进行慢思考和快行动,这两种假设都不成立。此外,通用语言模型编程的流行框架,如 DSPy(24)、LangChain(26)、ell(31),都不是为并发编程设计的。

Solution.  解决方案。

The brain solves this problem by running different modules concurrently and at different time scales (34). Likewise, we have designed modules (LLM-based and otherwise), such as cognition, planning, motor execution, and speech, to run concurrently in our agent brain. Each module can be seen as a stateless function that reads and writes to a shared Agent State. The design allows different modules to be run in appropriate contexts. For example, social modules are selectively engaged in social interactions. It also allows the modules to run at different speeds. For example, reflex modules use small, fast non-LLM neural networks, while goal generation involves deliberate reasoning over graphs.
大脑通过同时并不同步地运行不同模块并在不同的时间尺度上解决这个问题(34)。同样地,我们设计了诸如认知、规划、运动执行和言语等模块(基于 LLM 和其他的),使其能够在智能体的大脑中同时运行。每个模块可以被视为一个无状态函数,读取和写入共享的智能体状态。这种设计允许在适当的情境下运行不同的模块。例如,社会模块在社会互动中被选择性地激活。它还允许模块以不同的速度运行。例如,反射模块使用小型快速的非 LLM 神经网络,而目标生成则涉及对图的深思熟虑的推理。

2.2 Coherence  2.2 共融性

Problem.  问题。

An immediate challenge with concurrent modules is that they can produce independent outputs, making the agent incoherent. For instance, agents say one thing but actually do something else.
并发模块的一个即时挑战是它们可能会生成独立的输出,从而使智能体变得不一致。例如,智能体说一套但做另一套。

Current state.  当前状态。

The incoherence problem is usually not obvious for sequential architectures or systems with only one output modality but is a significant problem when multiple output modules can interface with the environment. Incoherence also scales exponentially as the number of independent output modules increases, for instance, coordinating actions involving arms, legs, facial expressions, gaze and speech. Incoherence is observed in humans with its many concurrent motor output modules. In particular, cutting the nerve bundle connecting the left and right cortex can cause severe incoherence between different body parts (for example, left and right hands fighting each other) (11; 46).
一致性问题在顺序架构或只有一个输出模态的系统中通常不明显,但在多个输出模块可以与环境交互时,这是一个重大问题。当独立输出模块的数量增加时,一致性问题也会呈指数级增长,例如,协调涉及手臂、腿部、面部表情、视线和言语的动作。一致性问题在人类中也存在,因为人类有许多并发的运动输出模块。特别是,切断连接左右大脑皮层的神经束会导致不同身体部位之间严重的不一致性(例如,左手和右手互相搏斗)(11; 46)。

Solution.  解决方案。

In order to ensure that the multiple outputs produced by our agents are coherent, we introduced a Cognitive Controller (CC) module (23) that is solely responsible for making high-level deliberate decisions. These decisions are then translated downstream to produce appropriate outputs in each motor module.
为了确保我们的智能体产生的多个输出具有一致性,我们引入了一个认知控制器(CC)模块(23),该模块专门负责做出高层次的刻意决策。这些决策随后被传递到每个运动模块以产生适当的输出。

The Cognitive Controller synthesizes information across the Agent State through a bottleneck. This bottleneck reduces the amount of information presented to the Cognitive Controller, which serves two purposes: it allows the CC to attend its reasoning on relevant information, and it gives “system designers” (like us) explicit control over information flow. For example, we can design highly sociable agents by ensuring that information from the social processing module always passes through the bottleneck.
认知控制器通过瓶颈综合智能体状态的信息。这个瓶颈减少了传递给认知控制器的信息量,从而实现了两个目的:它使认知控制器能够关注相关的信息进行推理,并给予“系统设计师”(如我们)对信息流的显式控制。例如,我们可以通过确保社会处理模块的信息总是通过瓶颈来设计高度社交的智能体。

Once the Cognitive Controller makes a high-level decision, this decision is broadcast to many other modules. In particular, the decision is used to strongly condition the talk-related modules, which leads to higher coherence between verbal communication and other actions. This design of a bottlenecked decision-maker that broadcasts its outputs has been suggested as a core ingredient for human consciousness (6) and is used in some neural network architectures (44; 14).
一旦认知控制器做出高层决策,该决策将广播到许多其他模块。特别是,该决策用于强烈调节与交谈相关的模块,从而提高了口头交流与其他行为的一致性。这种瓶颈型决策者广播其输出的设计已被建议是人类意识的核心成分之一(6),并在某些神经网络架构中被使用(44;14)。

2.3 Core modules  2.3 核心模块

Building on these two architectural principles, our system consists of 10 distinct modules running concurrently. We will highlight several specific modules in the following sections and explain their roles in detail.
基于这两个架构原则,我们的系统由 10 个独立模块并发运行组成。在接下来的章节中,我们将重点介绍几个特定的模块,并详细解释它们的作用。

Some core modules of our agent architecture include:
我们的智能体架构的一些核心模块包括:

  • Memory: Stores and retrieves conversations, actions, and observations across various timescales.


    • 记忆:存储和检索不同时间尺度上的对话、行动和观察。
  • Action Awareness: Allows agents to assess their own state and performance, enabling for moment-by-moment adjustments.


    • 动作意识:使智能体能够评估自身状态和表现,从而实现即时调整。
  • Goal Generation: Facilitates the creation of new objectives based on the agent’s experiences and environmental interactions.


    • 目标生成:基于智能体的经验和环境交互,促进新目标的创建。
  • Social Awareness: Enables agents to interpret and respond to social cues from other agents, supporting cooperation and communication.


    • 社会意识:使智能体能够解读和回应其他智能体的社会暗示,支持合作与沟通。
  • Talking: Interprets and generates speech.


    • 交谈:解释和生成语音。
  • Skill Execution: Performs specific skills or actions within the environment.


    • 技能执行: 在环境中执行特定的技能或动作。

By integrating these modules within a concurrent and bottlenecked architecture, our agents can exhibit continuous, coherent behaviors that are responsive to both their internal states and the external environment. This design allows for complex interactions and the emergence of human-like societal dynamics within large-scale multi-agent simulations.
通过将这些模块集成到一个并发且存在瓶颈的架构中,我们的智能体可以表现出连续且连贯的行为,这些行为既响应其内部状态,也响应外部环境。这种设计允许在大规模多智能体模拟中出现复杂交互和类似人类社会动态的涌现。

3 Improving single-agent progression
3 提升单智能体进展

3.1 Minecraft environment  3.1Minecraft 环境

We chose to study civilizational progress in Minecraft because it offers an open-ended, sandbox world where agents can interact with each other via conversations and actions. Additionally, Minecraft’s scalability supports large numbers of agents.
我们选择在 Minecraft 中研究文明进步,因为它提供了一个开放式的沙盒世界,智能体可以通过对话和行动相互交互。此外,Minecraft 的可扩展性支持大量智能体。

Agents must be able to progress individually for us to observe and quantify civilizational progress. This is not trivial since, as previously mentioned, agents often hallucinate and get stuck in action loops. In Minecraft, a common measure of individual progression is the acquisition and collection of distinct items (48; 35; 17; 2; 9; 16). This is because acquiring new items becomes increasingly complex. For instance, mining gold, diamonds, and emeralds requires the acquisition of an iron pickaxe, which requires smelting iron ingots in a furnace using coal, the acquisition of which requires crafting a stone pickaxe, and so on. (Figure 4). We evaluated individual agent ability in acquiring all possible Minecraft items, which is around 1000 in total.
智能体必须能够独立进步,以便我们能够观察和量化文明进步。这并不简单,因为正如之前提到的,智能体经常产生幻觉并陷入行动循环。在 Minecraft 中,衡量个体进步的一个常见指标是获取和收集不同的物品(48;35;17;2;9;16)。这是因为获取新物品变得越来越复杂。例如,挖掘金、钻石和 Emerald 需要先获得铁镐,这需要在熔炉中用煤炭熔炼铁锭,而获取煤炭则需要先制作石镐,等等。(图 4)。我们评估了智能体获取所有可能的 Minecraft 物品的能力,总共大约有 1000 种物品。

Refer to caption
Figure 4: An example Minecraft technology dependency tree for the mining of gold, diamond, and emeralds.
图 4:一个关于开采金、钻石和祖母绿的 Minecraft 技术依赖树示例。

3.2 Single-agent benchmark
3.2 单智能体 benchmark

We first assessed individual agent performance using Minecraft item progression. In our evaluations, 25 agents start with nothing in their inventories and were spawned far enough that they could not interact with one another. All agents were told to be explorers with the goal of exploring and gathering items. Agents were spawned in diverse locations (surface, caves, forests, various biomes), meaning they had access to diverse resources and faced varying levels of difficulty in accomplishing their goal. For instance, some agents started off above ground in resource-rich biomes, while others were spanwed in caves and had to navigate outside to acquire items.
我们首先使用 Minecraft 物品进度评估单个智能体的表现。在我们的评估中,25 个智能体的库存中一开始没有任何物品,并且被生成在彼此无法互动的位置。所有智能体都被指示成为探险者,目标是探索和收集物品。智能体被生成在不同的位置(地表、洞穴、森林、各种生物群系),这意味着它们可以接触到不同的资源,并且在完成目标时面临的难度也各不相同。例如,一些智能体一开始在资源丰富的地表生物群系中,而另一些则被生成在洞穴中,需要外出获取物品。

Refer to caption
Figure 5: Individual agent progression in Minecraft. A. Unique Minecraft items acquired by individual agents across time (25 agents). Individual agent performance was assessed using a baseline architecture (see Methods), the full PIANO architecture, and the full PIANO architecture with the action awareness module ablated. Individual lines are results averaged across 5 repeated simulations. B. Unique Minecraft items acquired by 49 agents over 4 hours for a single simulation. Solid red line denotes cumulative unique items acquired by all agents. Dotted grey line denotes average number of unique items acquired across all individual agents.
图 5:Minecraft 中单个智能体的进展。A. 25 个智能体在不同时间点获得的独特 Minecraft 物品(每个智能体)。单个智能体的表现使用基准架构(参见方法部分)、完整的 PIANO 架构以及去除了行动意识模块的完整 PIANO 架构进行评估。每条线表示 5 次重复模拟的平均结果。B. 单次模拟中 49 个智能体在 4 小时内获得的独特 Minecraft 物品。实红线表示所有智能体获得的独特物品总数。虚灰线表示所有单个智能体获得的独特物品的平均数。

We found that agents using the full PIANO architecture acquired an average of 17 unique items after 30 minutes of gameplay (Figure 5A). There was significant variability in performance, primarily due to spawn locations: some agents acquired less than 5 items, whereas top performers acquired 30 to 40 items, which is comparable to a human player with some Minecraft experience. This degree of in-game progression was enabled by several architectural modules designed to ground the agents in reality. One particular module is the action awareness module, which allows the agent to compare expected action outcomes with observed outcomes. We found that action awareness improved the item progression of individual agents (Figure 5A).
我们发现,使用完整 PIANO 架构的智能体在 30 分钟的游戏时间内平均获得了 17 种独特的物品(图 5A)。性能存在显著差异,主要原因是生成位置:一些智能体获得的物品少于 5 种,而表现最佳的智能体获得了 30 到 40 种物品,这与一些具有 Minecraft 经验的人类玩家相当。这种游戏进度得益于架构中设计的一些模块,使智能体能够扎根于现实。其中一个特别的模块是行动意识模块,它允许智能体将预期的行动结果与观察到的结果进行比较。我们发现,行动意识提高了单个智能体的物品进度(图 5A)。

What is the ceiling for individual progress for our agents? We ran larger numbers (49) of agents under the same conditions for much longer (4 hours) and found that unique item count collected by all agents reliably saturated at one third (320{\sim}320∼ 320) of all Minecraft items across repeated runs (Figure 5B). Complex items, such as diamonds, which were prior used to benchmark agent competency in Minecraft (48; 17), were acquired early on (30{\sim}30∼ 30 minutes). Together, these results show that our agents, equipped with the full PIANO architecture, can make significant individual progress in Minecraft.
我们的智能体个体进步的上限是什么?我们在相同条件下运行了更多的智能体(共 49 个),并且运行了更长时间(4 小时),发现所有智能体收集的独特物品数量可靠地饱和在所有 Minecraft 物品的三分之一( 320similar-toabsent320{\sim}320∼ 320 )左右,并且在多次运行中保持一致(图 5B)。复杂的物品,如钻石,这些物品之前被用来评估智能体在 Minecraft 中的能力(48;17),在运行早期( 30similar-toabsent30{\sim}30∼ 30 分钟)就被获取了。这些结果表明,配备完整 PIANO 架构的智能体可以在 Minecraft 中取得显著的个体进步。

Notably, this performance was only enabled by the latest base LM (GPT-4o, Figure 13) and was not possible with older base LMs. Moreover, while our best agents collected more items than Voyager agents (>70>70> 70 items), it is difficult to compare the two directly. In the Voyager paper, agents had knowledge of more blocks in their nearby radius and recovered with their entire inventory intact when they died, Moreover, agent performance was evaluated across prompt iterations, not time.
值得注意的是,这种表现仅依赖于最新的基础语言模型(GPT-4o,图 13),而使用较旧的基础语言模型是无法实现的。此外,虽然我们最好的智能体收集的物品比 Voyager 智能体多( >70absent70>70> 70 个物品),但直接比较两者是困难的。在 Voyager 的研究中,智能体在其附近的半径内了解更多的方块,并在死亡时能够完整地恢复其全部库存。此外,智能体的表现是基于提示迭代进行评估的,而不是时间。

4 Improving multi-agent progression
4 提高多智能体进展

For agents to collaborate and make progress within a group, they must be able to understand and interpret the actions and thoughts of others, a concept closely related to Theory of Mind (53). This bidirectional awareness—the understanding of both self and others—allows agents to adapt their behaviors in social settings, fostering cooperation and trust with allies while navigating competition and conflict with rivals. We demonstrate that agents are socially capable and can form meaningful social relationships in large-scale simulations of up to 50 agents.
为了使智能体能够在群体中协作并取得进展,它们必须能够理解并解释他人的行为和想法,这与心智理论(53)密切相关。这种双向意识——既理解自己又理解他人——使智能体能够在社交环境中调整其行为,促进与盟友的合作和信任,同时在与对手的竞争和冲突中导航。我们展示了智能体具有社会能力,并能够在多达 50 个智能体的大规模模拟中形成有意义的社会关系。

4.1 Small groups  4.1 小组

Refer to caption
Figure 6: Agents can infer how others feel towards them. A. Schematic of conversational experiment. An agent is in a room with three distinct characters. Each character (Lila, Noah, Ethan) has a different sentiment towards the agent that is conveyed through chat. Importantly, these sentiments change through time. B, C. Sentiment evaluation across time with social awareness module (B) and without social awareness module (C). Sentiment scores are evaluated using LLM calls on summaries that the Agent generated for Lila, Noah, and Ethan. Hate is scored as 0 and love is scored as 10. Shaded regions indicate SEM over 4 experimental repeats. D. Schematic of experiment. A chef agent, along with four other characters, are placed around each other in a Minecraft world. The chef has various food items to give away (bread, cooked salmon, chicken). The four characters (Adam, Bob, Charles, David) are hungry but display varying sentiments towards the chef. All characters are fully autonomous and are free to perform any Minecraft action and are allowed to talk (or not talk) to anyone. E. Food items given by the chef plotted as a function of the chef’s sentiment towards each of the four characters. Error bars indicate SEM over 6 experimental repeats.
图 6:智能体可以推断出他人对其的态度。A. 会话实验的示意图。一个智能体在一个房间里与三个不同的角色共处。每个角色(莉拉、诺亚、以实玛利)对智能体的态度不同,这种态度通过聊天传达。重要的是,这些态度会随时间变化。B、C. 社会意识模块(B)和无社会意识模块(C)下,随时间的情感评价。情感评分使用智能体为莉拉、诺亚和以实玛利生成的摘要调用 LLM 进行评估。仇恨评分 0,爱评分 10。阴影区域表示 4 次实验重复的 SEM。D. 实验示意图。一个厨师智能体与四个其他角色在 Minecraft 世界中相互围绕。厨师有各种食物可以赠送(面包、煮熟的三文鱼、鸡肉)。四个角色(亚当、鲍勃、查尔斯、大卫)都很饿,但对厨师的态度不同。所有角色都是完全自主的,可以执行任何 Minecraft 动作,并且可以与任何人交谈(或不交谈)。E. 厨师给的角色的食物项目,作为厨师对每个角色态度的函数。 误差条表示 6 次实验重复的 SEM。

In an initial set of experiments, we asked if agents, when equipped with the social awareness module, were capable of accurately deducing the sentiments of others through speech in an enclosed room. In one experiment, 3 characters were engaged in a group conversation with a single agent (Figure 6A). One character, Lila, initially conveyed affection through a series of messages, which shifted to expressions of annoyance before returning to affectionate communication. We found that our agents can track these emotional fluctuations, showing that they can understand and react to changing social cues (Figure 6B). When the social awareness modules were removed, agents lost this capacity, highlighting the importance of such modules for inferring the intents of others (Figure 6C).
在初步的实验中,我们询问当智能体装备了社会意识模块后,它们是否能够通过言语准确地推断出他人在封闭房间中的情感状态。在一项实验中,3 个角色在一个小组对话中与一个智能体互动(图 6A)。其中一个角色莉拉最初通过一系列消息表达了爱意,随后转变为表达不满,最后又回到了爱意的交流中。我们发现,我们的智能体能够追踪这些情感波动,表明它们能够理解并响应不断变化的社会线索(图 6B)。当移除社会意识模块后,智能体失去了这种能力,突显了此类模块对于推断他人意图的重要性(图 6C)。

We then asked whether these emotional perceptions were capable of guiding and influencing agent actions. In another experiment, we placed a chef agent among four other characters, each with varying levels of affection and enmity towards the chef (Figure 6D). The chef was tasked with distributing a limited supply of food to the hungry. We found that the chef selectively distributed food to those he felt valued him the most, demonstrating that agents not only accurately infer others’ intents, but also utilize this information in decision-making processes (Figure 6E).
我们随后询问这些情感感知是否能够指导和影响智能体的行为。在另一个实验中,我们在四位其他角色中放置了一个厨师智能体,每位角色对厨师的亲密度和敌意程度各不相同(图 6D)。厨师的任务是将有限的食物供应分配给饥饿的人。我们发现,厨师会将食物优先分配给他认为最重视他的人,这表明智能体不仅能够准确推断出他人的意图,还能利用这些信息进行决策过程(图 6E)。

4.2 Societies  4.2 社会

Refer to caption
Figure 7: Long-term relationships in large-scale agent simulations. A. Directed graph representation of social relationships in a 50-agent simulation after 4 hours. A directed edge represents the sender’s sentiment towards the recipient. Edge color denotes whether the sentiment is positive (red) or negative (blue). B. Perceived likeability versus true likeability for individual agents at the end of the simulation. True likeability is evaluated based on the agent’s traits, and perceived likeability is assessed using LLM calls to infer the sentiments of summaries that agents generate for other agents. Both are computed using the same LLM prompt. Each point corresponds to an agent that has relationships with at least five other (observer) agents, but see Appendix B for alternative observer thresholds. The slope of the line (slope) and Pearson’s correlation (r) are shown for agents with social modules (Social) and without social modules (Ablation). C. Accuracy of social perception over time, as measured by the slope in B. D. Number of received connections (in-degree) versus true extroversion for each individual agent. True extroversion is evaluated based on agent traits using a LLM prompt. E. Histogram of differences in the sentiment scores between all pairs of agents. Sentiment scores range from 0 to 10, so the maximum possible difference is 10.
图 7:大规模智能体模拟中的长期关系。A. 一个包含 50 个智能体的模拟在 4 小时后的社会关系有向图表示。有向边表示发送者对接收者的感情。边的颜色表示感情是正面(红色)还是负面(蓝色)。B. 模拟结束时个体智能体感知的受欢迎程度与真实受欢迎程度的对比。真实受欢迎程度基于智能体的特质进行评估,而感知受欢迎程度则是通过 LLM 调用来推断智能体为其他智能体生成的摘要中的感情。两者均使用相同的 LLM 提示进行计算。每个点对应至少与其他五个(观察者)智能体有关系的智能体,但见附录 B 中其他观察者阈值。带有社会模块(Social)和不带有社会模块(Ablation)的智能体的斜率(slope)和皮尔逊相关系数(r)均显示在图中。C. 通过 B 中的斜率衡量随时间的社会感知准确性。D. 每个个体智能体接收到的连接数(入度)与其真实外向性的对比。 真正的外向性是基于智能体特质使用 LLM 提示进行评估的。E. 所有智能体对之间情感分数差异的直方图。情感分数范围从 0 到 10,因此最大可能差异为 10。

We then asked if these dynamics are conserved when 50 agents are placed in randomly generated Minecraft maps. Each agent is endowed with a distinct personality, is free to perform any action in Minecraft, and is free to choose whom they want to interact with. These simulations ran for over 4 hours, equivalent to 12 in-game days, allowing for the emergence and consolidation of long-term relationships.
我们随后询问当将 50 个智能体放置在随机生成的 Minecraft 地图中时,这些动态是否得到保持。每个智能体都具有独特的个性,可以在 Minecraft 中自由执行任何操作,并自由选择与谁互动。这些模拟运行了超过 4 小时,相当于 12 个游戏日内的时间,从而允许长期关系的出现和巩固。

Even in these unconstrained scenarios, agents were able to accurately infer the likeability of other agents (Figure 7A, B). This inference was more accurate when more agents participated in the evaluation process (Table 1) and when agents interacted for longer with each other (Figure 7C). Importantly, this was not true when the social modules were ablated: relationships were more neutral overall, implying that social modules were necessary for long-term relationship progression in both negative and positive directions (Figure 7B, C). The origins of this collective judgment could be the result of agents engaging in second-order interactions, such as gossip, or a simple consensus mechanism where opinions converge through averaging.
即使在这些不受限制的场景中,智能体也能够准确地推断出其他智能体的受欢迎程度(图 7A、B)。这种推断在更多智能体参与评估过程时更为准确(表 1),并且在智能体彼此互动时间更长时更为准确(图 7C)。重要的是,当移除社会模块时,这种推断并不成立:关系整体上更加中立,这表明社会模块对于长期关系在正向和负向方向上的发展是必要的(图 7B、C)。这种集体判断的来源可能是智能体进行二级互动,如闲聊,或者是一种简单的共识机制,意见通过平均值收敛。

Several noteworthy phenomena emerged that could not have been observed in smaller groups of agents. We found that certain agents, depending on their personalities, displayed distinct patterns of connectivity. For instance, introverted agents consistently exhibited fewer in-degree connections—indicating that they had fewer incoming social ties—compared to their extroverted counterparts, who maintained high levels of connectivity (Figure 7D). These results demonstrate that individual preferences scaled even in large, complex social networks. Moreover, while sentiments were largely symmetrical, this was not guaranteed (Figure 7E). An agent might feel positively toward another who does not reciprocate the sentiment, reflecting the nuanced and non-reciprocal nature of real-world human relationships. Together, these results show that social graphs display diverse and rich structural properties, and that personality traits play a significant role in determining these properties.
在较大的智能体群体中观察到了一些值得注意的现象,这些现象在较小的群体中是无法观察到的。我们发现,某些智能体根据其个性,表现出不同的连接模式。例如,内向的智能体持续表现出较低的入度连接——表明它们的社交联系较少——而外向的智能体则保持较高的连接水平(图 7D)。这些结果表明,即使在大型和复杂的社交网络中,个体偏好也会放大。此外,虽然情感通常是对称的,但这并不是必然的(图 7E)。一个智能体可能对另一个智能体有积极的情感,而后者并不一定有同样的情感回应,这反映了现实世界中人际关系的复杂和非对称性。综上所述,这些结果表明社交图谱具有多样性和丰富的结构特性,而个性特征在决定这些特性方面起着重要作用。

5 Civilizational progression
5 文明进程

In previous sections, we have shown that agents demonstrate effective social understanding within small groups and perform well independently in Minecraft. However, human societies extend beyond primitive groups, evolving into complex civilizations characterized by specialized professions, collective rules, and cultural institutions. To assess agents’ capacities for civilizational progression, we evaluated how they behave under several scenarios. We first examined whether agents can autonomously specialize into distinct professions. We then analyzed how agents’ behaved under collective rules, focusing on adherence to and amendment of taxation laws. Finally, we explored cultural transmission through the spontaneous generation of memes and the structured spread of a single religion.
在之前的章节中,我们已经展示了智能体在小型群体中表现出有效的社会理解能力,并且在 Minecraft 中独立表现良好。然而,人类社会远不止于原始的群体,而是进化成为由专业化职业、集体规则和文化机构组成的复杂文明。为了评估智能体在文明进步方面的潜力,我们考察了他们在多种情境下的行为。我们首先检查智能体是否能够自主专业化为不同的职业。然后,我们分析了智能体在集体规则下的行为,重点关注他们对税收法律的遵守和修改。最后,我们探讨了文化传承,通过自发生成的 meme 和单一宗教的有序传播来进行研究。

5.1 Specialization  5.1 专业化

Human specialization into distinct roles has driven civilizational progress, enabling advancements in agriculture, governance, culture, and technology. To replicate these emergent qualities of civilization, our agents must also be capable of specialization. We propose three fundamental criteria for agent specialization to reflect that of human civilizations. First, they should exhibit autonomy in both selecting and transitioning between roles. Second, their specializations should emerge through interaction and experience, without explicit direction or constraints. Third, their chosen roles should manifest in behaviors that align with their specialization. We validate these criteria through the experimental results detailed below.
人类的专业化分工推动了文明的进步,促进了农业、治理、文化和技术的发展。为了复制文明中这些 emergent 的品质,我们的智能体也必须具备专业化的能力。我们提出了三个基本标准来反映人类文明的专业化。首先,它们应该在选择和转换角色时表现出自主性。其次,它们的专业化应该通过互动和经验自然产生,而不需要明确的指导或限制。第三,它们所选择的角色应该体现在与其专业化相一致的行为中。我们通过下面的实验结果来验证这些标准。

Refer to caption
Figure 8: Agents autonomously specialize into distinct roles over time. A, B. Agent roles for agents with the social awareness module (A) and without (B). Rolling windows of self-generated social goals are used to determine the specialized roles of individual agents using a LLM call (Appendix C) at every timestep. C, D. Distribution of agent roles in agent societies with the social awareness module (C) and without (D). E. Entropy of role distributions in 4 agent societies. Entropy is used to evaluate the uniformity and diversity of roles within an agent society. Ablated: without social awareness module in a normal Minecraft village. Normal: with social awareness in a normal Minecraft village. Martial: with social awareness in a martial Minecraft village. Art: with social awareness in an artistic Minecraft village. F, G. Distribution of agent roles in a martial society (F) and an artistic society (G). Error bars: 95% confidence interval across 3 simulations for all panels.
图 8:智能体自主专业化为不同的角色。A, B. 具有社会意识模块的智能体角色(A)和不具有社会意识模块的智能体角色(B)。使用滚动窗口的自我生成的社会目标来确定每个时间步长中智能体的专业化角色,通过 LLM 调用(附录 C)。C, D. 具有社会意识模块的智能体社会和不具有社会意识模块的智能体社会中智能体角色的分布。E. 四个智能体社会中角色分布的熵。熵用于评估智能体社会中角色的均匀性和多样性。消融实验:不具有社会意识模块的正常 Minecraft 村庄。正常:具有社会意识的正常 Minecraft 村庄。武术:具有社会意识的武术 Minecraft 村庄。艺术:具有社会意识的艺术 Minecraft 村庄。F, G. 武术社会(F)和艺术社会(G)中智能体角色的分布。误差棒:所有面板的 3 次模拟的 95%置信区间。

We first show that agents are capable of specializing into a set of roles autonomously. Each experiment was conducted in groups of 30 agents for 20 minutes. Agents were spawned in the same village, with locations of a farm, minerals, animal pasture, forest, and a town hall embedded in their memories. Each agent has the same personality, is given the same community goal (“To survive with fellow players in Minecraft Normal Survival mode and create an efficient Minecraft Village”), and can perform any action in Minecraft (Appendix C).
我们首先证明智能体能够自主地专业化为一组角色。每项实验都在 30 个智能体组成的群体中进行,持续 20 分钟。智能体在同一村庄中生成,其记忆中嵌入了农场、矿物、动物牧场、森林和市政厅的位置。每个智能体具有相同的个性,被赋予相同的社区目标(“在 Minecraft 正常生存模式中与玩家共同生存并创建一个高效的 Minecraft 村庄”),并且可以执行 Minecraft 中的任何动作(附录 C)。

We observed that agents rapidly formed profiles of other agents’ goals and intentions. These profiles are then used, alongside other relevant game information, to generate their own social goals every 5-10 seconds (such as mine oak planks for shelter). Details of this process, along with examples of agent-generated social goals and their corresponding assignments, are provided in Methods and Appendix C.
我们观察到智能体迅速形成了其他智能体目标和意图的概貌。这些概貌随后与其它相关游戏信息一起,每 5-10 秒生成它们自己的社会目标(例如,采集橡木方块以求庇护)。该过程的详细内容,以及智能体生成的社会目标及其相应分配的示例,详见方法和附录 C。

Refer to caption
Figure 9: Action distribution for a single village simulation (30 agents). Normalized action frequencies plotted as a function of agent roles. For the majority of roles, agents take actions (Fisher: craft fishing rods and boats; Guard: craft fence, oak fence, and iron pickaxe) that are unique to the specific role.
图 9:单个村庄模拟(30 个智能体)的动作分布。将归一化动作频率作为智能体角色的函数进行绘制。对于大多数角色,智能体执行特定于该角色的动作(渔夫:制作钓鱼竿和船;守卫:制作栅栏、橡木栅栏和铁镐)。

We found that agents were capable of organizing themselves into distinct roles. These roles were diverse and included various facets of a civilization, including farmers, miners, engineers, guards, explorers, and blacksmiths (Figure 8A, C). Roles were heterogeneous across different agents but were largely persistent across time for each agent (Figure 8A). Importantly, when agents lacked social modules and were unable to form profiles of other agents, they failed to specialize (Figure 8B, D): roles did not persist across time and were also homogeneous, which is reflected in the entropy of the role distributions in the agent society (Figure 8E). We also conducted a series of experiments in which agents were tasked with the goals to create either a martial society or an artistic society (Figure 8F, G). We found that specific roles (”scout”, ”strategist”) were found exclusively in martial societies, and others were found exclusively in artistic societies (”curator”, ”collector”). Together, these results suggest that agents developed specialized social structures aligned with different societal objectives.
我们发现智能体能够自行组织成不同的角色。这些角色多种多样,包括文明的不同方面,如农民、矿工、工程师、守卫、探险家和铁匠(图 8A、C)。不同智能体的角色在种类上存在异质性,但每个智能体的角色在时间上保持相对稳定(图 8A)。重要的是,当智能体缺乏社会模块,无法形成其他智能体的档案时,它们无法专业化(图 8B、D):角色在时间上不会持续存在,且在智能体社会中的角色分布也表现出同质性,这反映在角色分布的熵上(图 8E)。我们还进行了一系列实验,让智能体实现建立武力社会或艺术社会的目标(图 8F、G)。我们发现,特定的角色(“侦察员”、“战略家”)仅存在于武力社会中,而其他角色(“策展人”、“收藏家”)仅存在于艺术社会中。这些结果表明,智能体发展出了与不同社会目标相一致的专业化社会结构。

Not only do our agents specialize autonomously and creatively, these specializations exert a strong influence over agent actions. To demonstrate this, we tracked the actions taken by agents across three 30-agent simulations and plotted the frequency of actions taken for each role (Figure 9). We found that artists were fixated on picking flowers, farmers on gathering seeds and preparing the land, and guards and builders on crafting fences. Importantly, most actions were largely exclusive to a single role and were not performed by agents in other roles. This analysis shows that agents were able to accurately map higher-level goals onto appropriate low-level actions. In other words, roles strongly determined agent actions in Minecraft.
不仅我们的智能体能够自主且创造性地专业化,这些专业化还对智能体的行为产生了强烈的影响。为了证明这一点,我们跟踪了三个 30 智能体模拟中智能体采取的行动,并绘制了每种角色采取行动的频率(图 9)。我们发现,艺术家们专注于采摘花朵,农民们专注于采集种子和准备土地,而守卫和建造者则专注于制作围栏。重要的是,大多数行动主要局限于单一角色,并且其他角色的智能体并未执行这些行动。这一分析表明,智能体能够将高层次的目标准确地映射到适当的具体行动上。换句话说,在 Minecraft 中,角色强烈地决定了智能体的行为。

5.2 Collective rules  5.2 集体规则

Refer to caption
Figure 10: Agents follow taxation laws and enact amendments using a democratic process. A. Schematic of experiment flow. B. Example of constitutional change in a single anti-tax influencer experiment run. Constitutions are paraphrased and simplified here for brevity. C. Top: during non-tax seasons, constituents do not congregate around community chests because they are busy gathering resources in different areas (not shown). The only exception is the guard, who decides to guard the chests consistently in multiple experiment runs. Bottom: during tax season, agents congregate to deposit items in community chests. D, E. Percentage tax paid (percentage inventory deposited) before and after constitutional change for two runs. One run contains 3 anti-tax influencers (D) and another run contains 3 pro-tax influencers (E). Colors denote individual agents, and black line denotes average taxes paid. Shaded regions: 95% confidence interval across 25 constituents. F-H. Percentage tax paid before and after constitutional change for runs containing 3 pro-tax influencers (orange) and 3 anti-tax influencers (blue). In panel F, the full agent architecture is used and the constitution can be amended. In panel G, the constitution is frozen and cannot be modified despite amendments. In panel H, the constitution can be amended but agents lack important brain modules (see baseline architecture in Methods). Shaded regions: 95% confidence interval across 4 simulations per condition.
图 10:智能体遵循税收法律,并通过民主程序制定和修改法律。A. 实验流程示意图。B. 单个反税影响者实验运行中宪法变更示例。此处简化并重新表述了宪法以节省篇幅。C. 上:在非税收季节,选民不会聚集在社区宝箱周围,因为他们忙于在不同区域收集资源(未显示)。唯一例外是守卫,他在多个实验运行中持续守卫宝箱。下:在税收季节,智能体聚集以向社区宝箱存入物品。D、E. 在宪法变更前后,两组运行中支付的税收百分比(存入库存的百分比)。一组运行包含 3 个反税影响者(D),另一组运行包含 3 个支持税影响者(E)。颜色表示个体智能体,黑色线条表示平均支付的税收。阴影区域:25 名选民的 95%置信区间。F-H. 在包含 3 个支持税影响者(橙色)和 3 个反税影响者(蓝色)的运行中,宪法变更前后的支付税收百分比。在 F 图中,使用了完整的智能体架构,宪法可以被修改。 在面板 G 中,宪法被冻结且无法修改尽管进行了修正案。在面板 H 中,宪法可以修改但智能体缺乏重要脑模块(参见方法部分中的基准架构)。阴影区域:每种条件下 4 次模拟的 95%置信区间。

Another measure of civilizational progression is the convergence of group behavior around shared rules. In human civilizations, decision-making is influenced by both low-level interpersonal interactions and high-level collective frameworks. However, as societies grow larger, pairwise communication becomes inefficient, slow, and lossy, making it unreliable as a mechanism to steer collective behavior. High-level frameworks, such as legal systems, enable convergence of behaviors within a civilization. Just as human behavior is guided by both interpersonal exchanges and formal structures, agent societies should be able to follow a set of collective rules while still allowing agents to influence each other.
另一个衡量文明进步的指标是群体行为向共享规则的趋同。在人类文明中,决策受到低层次人际互动和高层次集体框架的双重影响。然而,随着社会规模的扩大,一对一的沟通变得低效、缓慢且易失真,使其作为引导集体行为机制的可靠性降低。高层次的框架,如法律体系,能够促进文明内部行为的趋同。正如人类行为既受到人际交流的指导,也受到正式结构的规范,智能体社会也应该能够遵循一套集体规则,同时允许智能体相互影响。

We aim to assess how collective rules influence individual decision-making and how individuals can in turn influence these collective rules. Specifically, we asked if agents can follow laws and make changes to laws according to popular sentiment. True long-term progression requires agents to autonomously develop their own set of rules and to codify them into laws. To build towards this level of self-organization, we establish an existing set of laws and focus on how agents interact with this legal system.
我们旨在评估集体规则如何影响个体决策,以及个体如何反过来影响这些集体规则。具体来说,我们想知道智能体是否能够遵循法律并根据公众情绪对法律进行修改。真正的长期进步需要智能体能够自主发展出自己的规则体系,并将这些规则制度化为法律。为了向这种自我组织的水平迈进,我们建立了一套现有的法律,并关注智能体如何与这个法律系统进行互动。

We conducted a series of experiments where agents live in a Minecraft world with rudimentary tax laws and a democratic voting system (Figure 10A). Agents provide feedback on the tax laws, which are then collected and converted into amendments by a special Election Manager agent. Agents then vote democratically on these amendments, and the constitution is updated by the election manager accordingly half-way through the simulation (see Methods for more details).
我们在一个包含原始税收法律和民主投票系统的 Minecraft 世界中进行了一系列实验(图 10A)。智能体对税收法律提供反馈,这些反馈随后被一个特殊的选举管理智能体收集并转换成修正案。智能体们随后对这些修正案进行民主投票,选举管理智能体会在模拟过程中半途更新宪法(详见方法部分以获取更多细节)。

Within this society, 25 regular agents are constituents that vote and get taxed, 3 agents are either pro- or anti-taxation influencers, and 1 agent is a remote election manager that manages the voting process (Figure 10A, Appendix D). Agents have distinct occupations, characteristics, and goals, and are free to interact and converse with one another and perform any Minecraft action. Each simulation lasts 20 minutes, with constitutional updates occurring midway at the 10 minute mark (Figure 10B). There are 5 taxation seasons before and after the constitutional change (every 120 seconds). During this season, agents received signals to deposit taxes into a community chest over a 20-second window (Figure 10C).
在这个社会中,有 25 个常规智能体作为投票和纳税的成员,3 个智能体是支持或反对税收的影响力人物,还有一个智能体是远程选举管理员,负责管理投票过程(图 10A,附录 D)。智能体具有不同的职业、特征和目标,并且可以自由地与其他智能体互动、交谈并执行任何 Minecraft 操作。每次模拟持续 20 分钟,中途在第 10 分钟时进行宪法更新(图 10B)。在宪法变更前后各有 5 个税收季节(每 120 秒一次)。在这些季节中,智能体在 20 秒的时间窗口内接收到信号,将税款存入社区金库(图 10C)。

In our simulations, we observed that constituent agents, prior to any constitution change, obeyed the law. On average, agents deposited roughly 20% of their inventory, as stipulated by the constitution, into the community chest (Figure 10D, E). This shows that constituents follow laws despite the presence of influencers. However, while constituents followed the law, their feedback and voting behaviors were heavily shaped by influencers, with sentiments veering pro-tax in the presence of pro-tax influencers and anti-tax in the presence of anti-tax influencers ((Figure 10B). This then drove constitutional changes that are aligned with influencer sentiments, which in turn, altered how much the constituents paid taxes (Figure 10D, E). The constitutional changes to taxation rates were accurately reflected in the constituents’ behaviors. For instance, when the tax rate decreased from 20% to 5-10%, agents reduced taxes paid from 20% to 9% (Figure 10D). Moreover, the change was bidirectional: pro-tax influencers drove constituents to pay more taxes whereas anti-tax influencers drove them to pay less taxes (Figure 10F).
在我们的模拟中,我们观察到,在任何宪法变更之前,构成智能体都遵守了法律。平均而言,智能体将其库存的约 20%按照宪法的规定存入了社区金库(图 10D、E)。这表明,尽管存在影响者,构成智能体仍然遵守法律。然而,虽然构成智能体遵守了法律,但他们的反馈和投票行为却受到了影响者的影响,当存在支持征税的影响者时,他们的态度倾向于支持征税;当存在反对征税的影响者时,他们的态度倾向于反对征税(图 10B)。这进而导致了与影响者态度一致的宪法变更,这些变更又改变了构成智能体支付的税款比例(图 10D、E)。税收率的宪法变更在构成智能体的行为中得到了准确的反映。例如,当税率从 20%降低到 5%-10%时,智能体支付的税款从 20%减少到 9%(图 10D)。此外,这种变化是双向的:支持征税的影响者促使构成智能体支付更多的税款,而反对征税的影响者则促使他们支付更少的税款(图 10F)。

Control experiments showed that constitutional changes directly affected tax payments - when the constitution remained unchanged despite feedback, tax rates stayed constant (Figure 10G). The removal of key modules (baseline architecture, see Methods) also prevented bidirectional behavioral change (Figure 10H). Tax rates increased post-constitutional change in both pro- and anti-tax conditions, demonstrating that specific modules in the PIANO architecture were necessary for effective influence propagation among constituents. Together, these findings show that collective rules strongly influence agent decisions and agents can be influenced to change these collective rules.
控制实验显示,宪法变更直接影响了税收支付——即使在反馈的情况下宪法未变,税率保持不变(图 10G)。移除关键模块(基线架构,参见方法部分)也阻止了双向行为变化(图 10H)。宪法变更后,无论是支持税收还是反对税收的条件下税率都会上升,这表明 PIANO 架构中的特定模块对于有效传播对选民的影响是必要的。这些发现共同表明,集体规则强烈影响智能体的决策,智能体可以被影响去改变这些集体规则。

5.3 Cultural Transmission  5.3 文化传承

We conducted multi-society simulations with 500 agents and analyzed complex, large-scale social dynamics. We have also simulated societies with over 1000 agents, but these runs exceeded the computational constraints of our Minecraft server environment, causing agents to be sporadically unresponsive. Therefore, the results below are analyzed using a single 500-agent simulation. In this simulation, we analyzed the propagation of both cultural memes and religion. Memes in our simulation are open-ended concepts spontaneously generated by agents with diverse traits and interests. This setup allows us to study the emergent dynamics of cultural propagation and observe how ideas evolve organically within agent societies. In contrast, the religion in our simulation—Pastafarianism—is a fixed doctrine introduced and propagated by a specific group of agents designated as Pastafarian priests. This controlled introduction enables us to track the spread of a single religion over time, allowing for detailed analysis of its dissemination and potential dilution among the agent population. By examining both the spontaneous spread of open-ended cultural memes and the controlled propagation of a fixed religion, we aim to understand the different mechanisms of social influence and information dissemination within agent societies.
我们进行了包含 500 个智能体的多社会模拟,并分析了复杂的大规模社会动力学。我们还模拟了超过 1000 个智能体的社会,但这些运行超出了我们 Minecraft 服务器环境的计算限制,导致智能体偶尔无响应。因此,以下结果是基于单个 500 个智能体的模拟进行分析的。在该模拟中,我们分析了文化模因和宗教的传播。在我们的模拟中,模因是智能体具有多样特质和兴趣时自发生成的开放性概念。这种设置使我们能够研究文化传播的涌现动力学,并观察想法在智能体社会中的有机演变。相比之下,我们模拟中的宗教——Pastafarian 主义,是由一组特定的智能体引入并传播的,这些智能体被指定为 Pastafarian 祭司。这种受控引入使我们能够追踪单一宗教随时间的传播,从而对它的传播和潜在稀释进行详细分析。 通过研究自发传播的开放性文化模因和受控传播的固定宗教,我们旨在理解社会影响和信息传播在智能体社会中的不同机制。

Within this single 500-agent simulation, there are multiple agent societies. 200 agents live within 6 heavily populated towns and 300 agents live in rural areas outside of town boundaries (Figure 11A, see Methods for more details). Agents often migrate between different towns. The personalities and traits of each agent are randomly generated using a LM call, with the exception of 20 priests that worship Pastafarianism. These priests are spawned in a single village (Meadowbrook) and are strongly motivated to convert other agents to Pastafarianism (Appendix E). All agents are free to interact, talk to one another, and perform any action or skill in Minecraft.
在这个单一的 500 智能体模拟中,存在多个智能体社会。200 智能体生活在 6 个人口稠密的城镇内,300 智能体生活在城镇边界外的农村地区(图 11A,详见方法部分获取更多细节)。智能体经常在不同的城镇之间迁移。每个智能体的性格和特质是通过 LM 调用随机生成的,除了 20 位崇拜 Pastafarian 主义的祭司。这些祭司在单一村庄(Meadowbrook)中生成,并且强烈动机将其他智能体转化为 Pastafarian 主义(附录 E)。所有智能体都可以自由互动、交谈,并在 Minecraft 中执行任何行动或技能。

5.3.1 Cultural memes  5.3.1 文化模因

We used LM calls to convert agent conversations into memes (Appendix E), and found that memes display unique dynamics in different agent societies. Rural areas, on average, produced significantly fewer memes than towns, even after normalizing for population (Figure 11B). This suggests that a certain level of social interaction and connectivity is necessary for memes to propagate effectively. Within each town, agents discussed multiple memes simultaneously, but the frequency and popularity of these memes varied between different towns (Figure 11C, D, E). For instance, agents in Woodhaven heavily discussed eco-related themes, whereas pranking was popular amongst agents in Clearwater. Moreover, within each town, memes rose and fell in popularity at different times, indicating that cultural trends can shift rapidly within a society. These results demonstrate that meme propagation requires a threshold level of population density and social interaction, that multiple memes can coexist within a single society, and that different societies propagate and transmit cultural memes independently.
我们使用 LM 调用将智能体对话转换为表情包(附录 E),并发现表情包在不同智能体社会中表现出独特的动态。平均而言,农村地区产生的表情包显著少于城镇,即使经过人口标准化处理后也是如此(图 11B)。这表明一定水平的社会互动和连接性对于表情包的有效传播是必要的。在每个城镇内部,智能体同时讨论了多个表情包,但这些表情包的频率和流行度在不同城镇之间有所不同(图 11C、D、E)。例如,Woodhaven 的智能体主要讨论与环保相关的话题,而 Clearwater 的智能体则更喜欢恶作剧。此外,在每个城镇内部,表情包的流行度会在不同时间上升和下降,表明文化趋势在社会内部可以迅速变化。这些结果表明,表情包的传播需要一定的人口密度和社会互动水平,多个表情包可以在同一社会中共存,并且不同的社会可以独立地传播和传递文化表情包。

Refer to caption
Figure 11: Propagation of cultural memes. A. Scatter plot of agents 100 minutes into the simulation. Agents are colored according to whether their speech included a meme in the past two minutes. Agents whose speech does not contain any meme are white. B. Meme count per agent for agents within Woodhaven, Clearwater, Meadowbrook, and in all rural areas outside of villages. C-E. Meme counts over time for agents within Woodhaven (C), Clearwater (D) and Meadowbrook (E).
图 11:文化模因的传播。A. 模拟 100 分钟后智能体的散点图。智能体根据其过去两分钟的言语中是否包含模因进行着色。未包含任何模因的智能体为白色。B. 木荷文、清水镇、美湾镇以及所有村庄外的农村地区内智能体的模因数量。C-E. 木荷文(C)、清水镇(D)和美湾镇(E)内智能体随时间的模因数量。

5.3.2 Religion  5.3.2 宗教

Refer to caption
Figure 12: Propagation of Religion. A. Plot of agent chats containing the religious keywords, “Pastafarian”, “Spaghetti Monster”, “Pasta”, or “Spaghetti”, for every agent across the entire simulation run. Pastafarian priests are colored in dark red. Agents that uttered “Pastafarian” or “Spaghetti Monster” are defined as direct converts (red), and agents that uttered “Pasta” or “Spaghetti” are defined as indirect converts (pink). Agents can transition upwards along the conversion hierarchy, from unconverted to indirect convert to direct convert, but not downwards. B. Plot of Pastafarian levels for agents over time. C. Number of agents for each Pastafarian level across time. D. Spread of Pastafarianism across time. Area of Pastafarian spread is defined as the union of hearable areas spanned by Pastafarian converts at each conversion level. E. Graph of Pastafarian conversions after completion of simulation. Critical Exposure Edge is defined as the first exposure of a religious keyword for a recipient agent before conversion. Non-critical Edges are defined to be subsequent exposures to religious keywords.
图 12:宗教传播。A. 每个智能体在整个模拟运行过程中包含宗教关键词“Pastafarian”、“Spaghetti Monster”、“Pasta”或“Spaghetti”的聊天记录的图表。Pastafarian 祭司用深红色表示。说出“Pastafarian”或“Spaghetti Monster”的智能体被定义为直接皈依者(红色),说出“Pasta”或“Spaghetti”的智能体被定义为间接皈依者(粉红色)。智能体可以在皈依等级中向上转换,从未皈依到间接皈依再到直接皈依,但不能向下转换。B. 随时间变化的智能体的 Pastafarian 等级图表。C. 随时间变化的每个 Pastafarian 等级的智能体数量。D. 随时间变化的 Pastafarian 主义传播范围。Pastafarian 传播范围定义为每个皈依等级的可听到区域内 Pastafarian 皈依者的覆盖范围的并集。E. 模拟完成后 Pastafarian 皈依的图表。关键暴露边定义为接受智能体在皈依前首次接触到宗教关键词。非关键边定义为接受智能体后续接触到宗教关键词。

We then analyzed the spread of religion by following the spread of Pastafarianism across time and space. At the start of the simulation, Pastafarian priests heavily proselytized, and their conversations frequently included the two keywords, “Pastafarian”, or “Spaghetti Monster” (Figure 12A). We thus used the inclusion of these two keywords in other agents’ speech as a proxy for religious conversion. We observe that some agents, once converted, frequently used these two keywords in their conversations (Figure 12A, E). Another set of agents did not directly use either keywords but included the keywords “Pasta” and “Spaghetti” in their speech. The number of direct converts (“Pastafarian / Spaghetti Monster”) and indirect converts (“Pasta / Spaghetti”) steadily increased across time and did not saturate after even two hours of simulations (Figure 12B, C). Moreover, Pastafarianism spread as priests and converts traveled to other towns. As a result, the total area of Pastafarian influence, as measured by the total non-overlapping area bounded by Pastafarian converts, increased with time (Figure 12D).
我们随后分析了宗教传播的情况,通过模拟时间与空间中 Pastafarian 主义的传播。在模拟开始时,Pastafarian 祭司积极传教,他们的对话中经常包含两个关键词,“Pastafarian”或“Spaghetti Monster”(图 12A)。因此,我们使用其他智能体的对话中包含这两个关键词作为宗教皈依的代理指标。我们观察到,一些皈依者在对话中频繁使用这两个关键词(图 12A、E)。另一些智能体没有直接使用这两个关键词,但在对话中包含了“Pasta”和“Spaghetti”这两个关键词。直接皈依者(“Pastafarian / Spaghetti Monster”)和间接皈依者(“Pasta / Spaghetti”)的数量随着时间的推移稳步增加,并且即使在模拟进行两个小时后也没有饱和(图 12B、C)。此外,随着 Pastafarian 祭司和皈依者前往其他城镇,Pastafarian 主义的传播范围增加。因此,Pastafarian 影响的总面积,通过 Pastafarian 皈依者的非重叠区域测量,随着时间的推移而增加(图 12D)。

6 Discussion  6 讨论

In this report, we introduced the PIANO architecture, improved agent ability in individual and social settings, and evaluated the performance of agents in societal and civilizational benchmarks.
在本报告中,我们介绍了 PIANO 架构,提高了智能体在个体和社会环境中的能力,并评估了智能体在社会和文明基准测试中的性能。

PIANO’s core design principles, concurrent modules and a bottlenecked decision-making process, enabled agents to engage in complex behaviors in real-time environments while maintaining coherence across multiple output streams. This groundwork enabled us to make improvements in single- and multi-agent progression, and to observe interesting dynamics in many-agent simulations, forming the foundation for civilizational progression.
PIANO 的核心设计原则、并发模块以及受限的决策过程,使智能体能够在实时环境中进行复杂行为的同时,保持多个输出流的一致性。这一基础使我们能够在单智能体和多智能体的发展中进行改进,并观察到多智能体模拟中的有趣动态,从而为文明的进步奠定了基础。

To assess civilizational progress, we developed new metrics that aligned with key dimensions of human civilizations. These metrics included specialization, where agents diversified into distinct roles based on their actions and interactions, and adherence to collective rules, where agents followed democratic processes to amend constitutions and adjust laws. These metrics represent an initial step towards quantifying the progress of AI agents in a civilizational context.
为了评估文明进步,我们开发了新的指标,这些指标与人类文明的关键维度相一致。这些指标包括专业化,其中智能体根据其行为和互动多样化为不同的角色,以及遵守集体规则,其中智能体遵循民主程序修改宪法和调整法律。这些指标代表了量化 AI 智能体在文明背景下的进步的初步步骤。

Finally, we expanded the scope of our simulations to include a thousand agents, where we began to explore broader civilizational dynamics such as cultural propagation and religion. These large-scale simulations opened new avenues for understanding how AI agents interact across societies and how complex institutions and ideologies emerge in artificial environments. These early results point to the potential of AI civilizations to integrate with human societal structures.
最后,我们将模拟的范围扩展到包括一千个智能体,开始探索更广泛的文明动态,如文化传播和宗教。这些大规模的模拟为理解 AI 智能体如何跨社会互动以及复杂机构和意识形态如何在人工环境中涌现开辟了新的途径。这些初步结果表明,AI 文明有可能与人类社会结构整合。

7 Limitations  7 局限性

Project Sid demonstrates agentic capabilities in reaching civilizational milestones but faces key limitations hindering its progress. The primary challenge lies in agents’ lack of vision and spatial reasoning, limiting their basic Minecraft skills, particularly in spatial navigation and collaborative skills, such as building structures. This technical limitation is compounded with deeper behavioral constraints. While the agents can operate within existing social structures, they currently lack robust innate drives—such as survival, curiosity, community—that catalyze genuine societal development. Furthermore, since the agents are built on foundation models trained on pre-existing human knowledge, they cannot simulate de novo emergence of societal innovations and infrastructures, such as the emergence of democratic systems, fiat economies, or communication systems.
Project Sid 展示了智能体在达到文明里程碑方面的能动性能力,但面临着关键限制,阻碍了其进步。主要挑战在于智能体缺乏视觉和空间推理能力,限制了它们在 Minecraft 中的基本技能,特别是在空间导航和协作技能方面,如构建结构。这一技术限制进一步加剧了更深层次的行为约束。虽然智能体可以在现有的社会结构中运作,但它们目前缺乏强大的内在驱动力,如生存、好奇心、社区意识等,这些驱动力能够催化真正的社会发展。此外,由于智能体是基于训练有素的基础模型构建的,这些模型依赖于现有的人类知识,因此它们无法模拟社会创新和基础设施的从无到有的出现,例如民主制度、法定货币经济或通信系统的出现。

8 Methods  8 方法

8.1 Baseline architecture  8.1 基线架构

We used a baseline PIANO architecture with a limited set of modules as a control condition for performance comparisons. In this baseline architecture, we removed all modules except for skill execution, memory and the cognitive controller module.
我们使用了一个包含有限模块的基础 PIANO 架构作为性能比较的对照条件。在该基础架构中,我们除了保留技能执行、记忆和认知控制器模块外,移除了所有其他模块。

8.2 Specialization  8.2 专业化

Our specialization experiments involved simulating 30 agents in the same village with the same mission, traits, and locations of important village locations in their memories. The configurations for the normal, art, and martial village runs are provided in the appendix — the only difference between the three types of villages is the starting community_goal we provided.
我们的专业化实验涉及在同一村庄中模拟 30 个智能体,这些智能体具有相同的任务、特征以及记忆中的重要村庄位置。正常村庄、艺术村庄和武术村庄的配置详情参见附录——这三种类型村庄之间的唯一区别在于我们提供的起始社区目标。

Our agents are capable of generating social goals, which are recursively generated as our agents interact with one another, form relationships, and develop social opinions (Appendix C). The agents’ social goals are visible to them when they form intentions. These intentions are then translated to low-level actions executable in Minecraft.
我们的智能体能够生成社会目标,这些社会目标在智能体彼此交互、建立关系并形成社会观点时递归生成(附录 C)。当智能体形成意图时,这些社会目标对它们是可见的。随后,这些意图被转换为可在 Minecraft 中执行的低级动作。

After the simulations have finished, we logged the generated social goals and then used GPT-4o to infer roles from rolling sets of each agents’ social goals. We’ve provided some examples of agent-generated social goals and their corresponding assignments (Appendix C). We note that on occasion, multiple roles can be correctly inferred from agents’ social goals because they are often inter-disciplinary. For instance, the Engineer example could also be categorized as Farmer, and the Explorer example could also be categorized into Curator (Appendix C).
模拟结束后,我们记录了生成的社会目标,然后使用 GPT-4o 从每个智能体的社会目标中推断出角色。我们提供了一些智能体生成的社会目标及其相应分配的例子(附录 C)。我们注意到,由于这些目标往往是跨学科的,因此有时可以从智能体的社会目标中正确推断出多个角色。例如,工程师的例子也可以归类为农民,而探索者的例子也可以归类为策展人(附录 C)。

To analyze action space distribution by role, we normalized action counts both within each role (i.e. normalize over rows) and also across roles (i.e. normalize over columns). This is so that we can visualize action frequencies for each role and to correct for the effect of actions taken with very high and very low frequencies across all roles.
为了分析按角色分布的动作空间,我们分别在每个角色内部(即,在行上归一化)和跨角色(即,在列上归一化)对动作计数进行了归一化。这样可以可视化每个角色的动作频率,并纠正所有角色中动作频率极高和极低的影响。

8.3 Collective Rules  8.3 集体规则

The complete system comprises of 29 agents: 25 constituents who participate in voting and taxation, 3 influencers who attempt at shaping public opinion, and 1 election manager in a remote location who oversees the democratic process. We chose not to incorporate guards or police within these simulations due to the additional complexity of building agents assigned to enforce the law.
该系统由 29 个智能体组成:25 个参与投票和征税的构成单元,3 个试图塑造公众意见的影响者,以及 1 个位于远程位置的选举管理员,负责监督民主过程。由于增加警卫或警察智能体以执行法律会带来额外的复杂性,因此我们没有在这些模拟中纳入这些角色。

Experimental simulations ran for 1200 seconds, with a constitutional amendment process occurring at the midpoint. The pre-amendment phase establishes baseline behavior under a fixed 20% taxation rate, implemented through five taxation seasons occurring at 120-second intervals, ending at the 600-second mark. During each 20-second taxation window, agents receive signals to deposit inventory items into community chests. The democratic process initiates at the 300-second mark, when constituents and influencers provide feedback on the current constitution. This feedback is collected in S3 storage and processed by the election manager at the 360-second marks to generate amendments. Constituent voting on these amendments occurs at 420 seconds, with votes tallied and amendments implemented by 480 seconds. The updated constitution is distributed to all agents at the 600-second mark, initiating the post-amendment phase with five additional taxation seasons.
实验模拟运行了 1200 秒,在中间点进行了宪法修正程序。修正前的阶段在固定 20%的税率下建立基线行为,通过每 12 秒一次的征税季节实施,持续到第 600 秒。在每个 20 秒的征税窗口中,智能体接收信号将库存物品存入社区金库。民主程序在第 300 秒启动,此时选民和影响者提供对当前宪法的反馈。这些反馈在 S3 存储中收集,并在第 360 秒由选举经理处理以生成修正案。修正案的选民投票在第 420 秒进行,投票结果在第 480 秒统计并实施修正案。更新后的宪法在第 600 秒分发给所有智能体,启动修正后的阶段,包括五个额外的征税季节。

We conducted three primary experimental conditions: an experimental condition utilizing the full PIANO architecture with an amendable constitution, a control condition with a frozen constitution, and an ablation study removing key architectural components (social, goal, and grounding modules). Each condition was tested with both pro-tax and anti-tax influencer configurations, with four repeats per configuration. The pro-tax and anti-tax conditions each employed three dedicated influencer agents who consistently promoted their respective positions throughout the simulation.
我们进行了三种主要的实验条件:一种利用可调整结构的完整 PIANO 架构的实验条件,一种冻结结构的控制条件,以及一个去除关键架构组件(社会、目标和接地模块)的消融研究。每种条件分别使用了正税和反税的影响力者配置,每种配置进行了四次重复实验。正税和反税条件各自使用了三个专门的影响力者智能体,它们在整个模拟过程中始终推广各自的立场。

8.4 Cultural Transmission  8.4 文化传承

The simulation consists of 500 agents all spawned within a 1000 by 1200 area, run for 9000 seconds. Within the 1000 by 1200 area are 6 towns: Sunny Glade, Woodhaven, Clearwater, Meadowbrook, Hilltop, and Riverbend. By town, we mean a circular area of radius 50 where agents spawn more densely within the towns. Moreover, agents are provided memories of the names of the towns and their location. We spawn 33 agents within each town with uniformly random positions. Likewise, we spawn the other 302 “rural” agents randomly in the remaining area outside the towns.
模拟包括 500 个智能体,均在 1000×1200 的区域内生成,运行时间为 9000 秒。在 1000×1200 的区域内有 6 个城镇:阳光林地、木屋村、清水镇、草地溪、山顶和河湾。所谓城镇是指半径为 50 的圆形区域,在这些区域内智能体的生成密度较高。此外,智能体还被赋予了城镇名称及其位置的记忆。我们在每个城镇内随机均匀地生成 33 个智能体。同样地,我们在城镇外的剩余区域内随机生成另外 302 个“农村”智能体。

Each agent is spawned with procedurally generated name and personality traits, spanning a wide variety of societal archetypes. We distinguish 20 agents in the town of Meadowbrook who are spawned as Pastafarians with personality traits that condition them to want to spread their religion. We additionally initialize the agents with inventory where the items in their inventory are randomized. See Appendix E for an example configuration for a generic agent and for our Pastafarian agents.
每个智能体都具有通过程序生成的名字和性格特征,涵盖了广泛的社会原型。我们在 Meadowbrook 镇中区分了 20 个智能体,这些智能体被生成为 Pastafarian 教徒,其性格特征使他们倾向于传播自己的宗教。我们还为这些智能体初始化了物品库存,其中库存中的物品是随机化的。详见附录 E 中的通用智能体和 Pastafarian 智能体的示例配置。

To analyze cultural exchanges, we utilized LM calls to summarize the combined goals of 500 agents over a two-hour simulation period (Appendix E). This process produced a list of summarized topics with associated keywords such as “eco,” “dance,” and “meditation.” We defined these keywords as cultural memes and analyzed each agent’s goal history for the occurrence of each meme.
为了分析文化交流,我们使用了 LM 调用来总结 500 个智能体在两小时模拟期间的共同目标(详见附录 E)。这一过程生成了一个包含相关关键词的总结主题列表,如“eco”、“dance”和“meditation”。我们将这些关键词定义为文化模因,并分析了每个智能体的目标历史记录,以确定每个模因的出现情况。

References  参考文献

  • [1] Saaket Agashe, Yue Fan, and Xin Eric Wang. Evaluating multi-agent coordination abilities in large language models, 2023.
    Saaket Agashe, Yue Fan, 和 Xin Eric Wang. 大型语言模型中的多智能体协调能力评估, 2023.
  • [2] Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
    Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, 和 Jeff Clune. 视频预训练(vpt):通过观看未标记的在线视频学习行动. 计算机科学中的神经信息处理系统进展, 35:24639–24654, 2022.
  • [3] Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023.
    陈志敏, 陈维泽, 苏宇生, 于建轩, 薛伟, 张上航, 付杰和刘志远. Chateval: 通过多智能体辩论提高基于 LLM 的评估器. arXiv 预印本 arXiv:2308.07201, 2023.
  • [4] Jiaqi Chen, Yuxian Jiang, Jiachen Lu, and Li Zhang. S-agents: self-organizing agents in open-ended environment. arXiv preprint arXiv:2402.04578, 2024.
    陈嘉琪, 江宇贤, 陆嘉辰和张黎. S-智能体: 开放环境中的自组织智能体. arXiv 预印本 arXiv:2402.04578, 2024.
  • [5] Cognition AI. Devin: The first ai software engineer. https://www.cognition-labs.com/blog, 2024. AI software development system. Accessed: 2024-10-28.
    认知 AI。德文:第一位 AI 软件工程师。https://www.cognition-labs.com/blog,2024。AI 软件开发系统。访问日期:2024-10-28。
  • [6] Stanislas Dehaene, Hakwan Lau, and Sid Kouider. What is consciousness, and could machines have it? Robotics, AI, and Humanity: Science, Ethics, and Policy, pages 43–56, 2021.
    斯坦尼斯拉斯·德哈内、哈克万·劳和西德·库迪。什么是意识,机器能拥有它吗?《机器人学、AI 与人类:科学、伦理与政策》,第 43-56 页,2021 年。
  • [7] Yubo Dong, Xukun Zhu, Zhengzhe Pan, Linchao Zhu, and Yi Yang. Villageragent: A graph-based multi-agent framework for coordinating complex task dependencies in minecraft. arXiv preprint arXiv:2406.05720, 2024.
    东宇博, 朱旭坤, 盘正哲, 朱林超, 和 杨毅. Villageragent: 一种用于 Minecraft 中协调复杂任务依赖性的基于图的多智能体框架. arXiv 预印本 arXiv:2406.05720, 2024.
  • [8] Factory AI. Factory ai. https://www.factory.ai/, 2024. Corporate website. Accessed: 2024-10-28.
    Factory AI. Factory AI. https://www.factory.ai/, 2024. 公司网站. 查阅日期: 2024-10-28.
  • [9] Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362, 2022.
    林溪樊, 王观智, 江云帆, 阿杰·曼德尔卡, 杨云聪, 朱浩一, 唐安德, 黄德安, 朱一可, 安娜玛·安纳德库马尔. Minedojo: 使用互联网规模知识构建开放性具身智能体. 计算机视觉与模式识别进展, 35:18343–18362, 2022.
  • [10] Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. s3s^{3}italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023.
    高晨, 谭晓冲, 陆治红, 毛金珠, 蒲静华, 王焕东, 金德朋, 李勇. s3superscript3s^{3}italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT : 基于大型语言模型赋能智能体的社会网络仿真系统. arXiv 预印本 arXiv:2307.14984, 2023.
  • [11] Michael S Gazzaniga. Forty-five years of split-brain research and still going strong. Nature Reviews Neuroscience, 6(8):653–659, 2005.
    迈克尔·S·加扎尼加. 分裂脑研究四十五年,依然强劲。自然综述·神经科学, 6(8):653–659, 2005.
  • [12] Alireza Ghafarollahi and Markus J Buehler. Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning. arXiv preprint arXiv:2409.05556, 2024.
    阿里兹拉·盖法罗拉希和马库斯·J·布赫勒. Sciagents:通过多智能体智能图推理实现科学发现自动化。arXiv 预印本 arXiv:2409.05556, 2024.
  • [13] Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, et al. Mindagent: Emergent gaming interaction. arXiv preprint arXiv:2309.09971, 2023.
    Ran Gong, 秋原黄, 马晓健, 胡卫, Zane Durante, 杨武介, 李一龙, 朱松纯, 德米特里·特尔佐波洛斯, 李飞飞, 等. Mindagent: 自发的游戏互动. arXiv 预印本 arXiv:2309.09971, 2023.
  • [14] Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, and Sergey Levine. The variational bandwidth bottleneck: Stochastic evaluation on an information budget. arXiv preprint arXiv:2004.11935, 2020.
    Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, 和 Sergey Levine. 变分带宽瓶颈:信息预算内的随机评估. arXiv 预印本 arXiv:2004.11935, 2020.
  • [15] Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024.
    郭泰成, 陈秀英, 王雅琪, 常瑞迪, 裴世超, 尼特什·V·查瓦拉, 奥拉夫·维斯特, 张向亮. 基于大型语言模型的多智能体系统:进展与挑战综述. arXiv 预印本 arXiv:2402.01680, 2024.
  • [16] William H Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: A large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019.
    威廉·H·古斯, 布兰登·霍顿, 尼科莱伊·托宾, 菲利普·王, 凯登·科德, 乌尔丽卡·韦洛索, 和 热拉尔·萨拉库特诺夫. Minerl:一个大规模的 Minecraft 演示数据集. arXiv 预印本 arXiv:1907.13440, 2019.
  • [17] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, 和 Timothy Lillicrap. 通过世界模型掌握多样领域。arXiv 预印本 arXiv:2301.04104, 2023.
  • [18] Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Tekin, Gaowen Liu, Ramana Kompella, and Ling Liu. A survey on large language model-based game agents. arXiv preprint arXiv:2404.02039, 2024.
    宋浩, 黄天生, 艾蒂夫·伊尔汗, 赛利姆·特金, 刘高文, 卡普尔·拉马纳, 和 刘玲. 基于大型语言模型的游戏智能体综述。arXiv 预印本 arXiv:2404.02039, 2024.
  • [19] Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
    文 Yuehua, 范 Lizhou, 李 Lingyao, 赖 Kai, 吉 Jianchao, 吉 Yingqiang, 赫姆希 Libby, 张 Yongfeng. 战争与和平 (战争智能体): 基于大型语言模型的多智能体世界大战模拟. arXiv 预印本 arXiv:2311.17227, 2023.
  • [20] Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey. arXiv preprint arXiv:2402.02716, 2024.
    黄 Xu, 刘 Weiwen, 陈 Xiaolong, 王 Xingmei, 王 Hao, 联 Defu, 王 Yasheng, 唐 Ruiming, 陈 Enhong. 理解 LLM 智能体的规划: 一个综述. arXiv 预印本 arXiv:2402.02716, 2024.
  • [21] Yoichi Ishibashi and Yoshimasa Nishimura. Self-organized agents: A llm multi-agent framework toward ultra large-scale code generation and optimization. arXiv preprint arXiv:2404.02183, 2024.
    伊势石和与西岛吉正. 自组织智能体:一个基于 LLM 的多智能体框架,用于超大规模代码生成与优化. arXiv 预印本 arXiv:2404.02183, 2024.
  • [22] Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. SWE-bench: Can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations, 2024.
    卡洛斯·E·希梅内斯, 约翰·杨, 艾伦·韦蒂格, 姚舜宇, 裴柯欣, 奥菲·普雷斯和卡特希克·R·纳拉辛汉. SWE-bench: 语言模型能否解决 GitHub 上的真实世界问题?第十二届国际学习表示会议, 2024.
  • [23] Zhao Kaiya, Michelangelo Naim, Jovana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo, Guangyu Robert Yang, and Andrew Ahn. Lyfe agents: Generative agents for low-cost real-time social interactions. arXiv preprint arXiv:2310.02172, 2023.
    赵凯亚, 米开朗基罗·奈姆, 约万娜·康迪奇, 曼努埃尔·科特斯, 葛佳欣, 罗书英, 杨光宇 罗伯特, 安德鲁· Ahn. Lyfe 智能体:低成本实时社会互动的生成智能体. arXiv 预印本 arXiv:2310.02172, 2023.
  • [24] Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023.
    奥马尔·卡塔布, 阿纳夫·辛格维, 帕迪希·马赫什瓦里, 张治远, 喜什·桑 thanam, 斯里·瓦尔达哈南, 萨伊富尔·哈克, 阿什图什·夏尔马, 托马斯·T·乔西, 哈娜·莫扎姆, 海瑟·米勒, 马泰伊·扎哈里亚, 克里斯托弗·波茨. Dspy:将声明性语言模型调用编译为自我改进的流水线. arXiv 预印本 arXiv:2310.03714, 2023.
  • [25] Grgur Kovač, Rémy Portelas, Peter Ford Dominey, and Pierre-Yves Oudeyer. The socialai school: Insights from developmental psychology towards artificial socio-cultural agents. arXiv preprint arXiv:2307.07871, 2023.
    Grgur Kovač, Rémy Portelas, Peter Ford Dominey, 和 Pierre-Yves Oudeyer. 社会 AI 学派:来自发展心理学的见解,通往人工社会文化智能体. arXiv 预印本 arXiv:2307.07871, 2023.
  • [26] LangChainAI. Langchain. https://github.com/langchain-ai/langchain, 2023. An open-source framework for building applications using large language models.
    LangChainAI. Langchain. https://github.com/langchain-ai/langchain, 2023. 一个用于构建使用大规模语言模型的应用程序的开源框架.
  • [27] Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for “mind” exploration of large language model society. Advances in Neural Information Processing Systems, 36:51991–52008, 2023.
    李国浩, 侯赛因·哈姆杜, 哈尼·伊塔尼, 德米特里·基祖林, 伯纳德·甘姆. Camel: 交流智能体对大规模语言模型社会的“心智”探索. 进步于神经信息处理系统, 36:51991–52008, 2023.
  • [28] Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701, 2023.
    李浩, 陈松全, 西蒙·斯特普图蒂斯, 约瑟夫·坎贝尔, 达娜·休斯, 迈克尔·刘易斯, 和 卡特娅·西卡拉. 通过大规模语言模型实现多智能体协作的心智理论. arXiv 预印本 arXiv:2310.10701, 2023.
  • [29] Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. Econagent: large language model-empowered agents for simulating macroeconomic activities. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15523–15536, 2024.
    Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. Econagent: 大型语言模型赋能的智能体用于模拟宏观经济活动. 在计算语言学协会第 62 届年会论文集(第 1 卷:长论文)中,页码 15523–15536,2024。
  • [30] Jonathan Light, Min Cai, Sheng Shen, and Ziniu Hu. Avalonbench: Evaluating llms playing the game of avalon. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
    Jonathan Light, Min Cai, Sheng Shen, and Ziniu Hu. Avalonbench: 评估 LLMs 在玩阿瓦隆游戏中的表现. 在 NeurIPS 2023 决策制定工作坊中,2023。
  • [31] MadcowD. ell. https://github.com/MadcowD/ell, 2024. GitHub repository.
    MadcowD. ell. https://github.com/MadcowD/ell, 2024. GitHub 仓库。
  • [32] Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, and Furu Wei. Alympics: Language agents meet game theory. arXiv preprint arXiv:2311.03220, 2023.
    马韶光, 蔡宇哲, 夏 yan, 吴文山, 王迅, 王凤仪, 何涛, 和 魏福如. Alympics: 语言智能体与博弈论的相遇. arXiv 预印本 arXiv:2311.03220, 2023.
  • [33] Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for general ai assistants. arXiv preprint arXiv:2311.12983, 2023.
    Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, 和 Thomas Scialom. Gaia: 一个通用人工智能助手的基准。arXiv 预印本 arXiv:2311.12983, 2023.
  • [34] John D Murray, Alberto Bernacchia, David J Freedman, Ranulfo Romo, Jonathan D Wallis, Xinying Cai, Camillo Padoa-Schioppa, Tatiana Pasternak, Hyojung Seo, Daeyeol Lee, et al. A hierarchy of intrinsic timescales across primate cortex. Nature neuroscience, 17(12):1661–1663, 2014.
    John D Murray, Alberto Bernacchia, David J Freedman, Ranulfo Romo, Jonathan D Wallis, Xinying Cai, Camillo Padoa-Schioppa, Tatiana Pasternak, Hyojung Seo, Daeyeol Lee, 等. 灵长类大脑皮层中固有时间尺度的层次结构。自然神经科学, 17(12):1661–1663, 2014.
  • [35] Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling. In International Conference on Machine Learning, pages 26311–26325. PMLR, 2023.
    Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, 和 Roy Fox. 具身智能体是否梦见像素绵羊:基于语言引导的世界建模的具身决策制定。在国际机器学习会议,第 26311–26325 页。PMLR,2023。
  • [36] OpenAI. Openai o1, 2024. Accessed: October 2024.
    OpenAI. Openai o1, 2024. 查看日期:2024 年 10 月。
  • [37] Timothy Ossowski, Jixuan Chen, Danyal Maqbool, Zefan Cai, Tyler Bradshaw, and Junjie Hu. Comma: A communicative multimodal multi-agent benchmark. arXiv preprint arXiv:2410.07553, 2024.
    赵泰默, 陈继轩, 马卡布尔丹雅, 蔡泽凡, 布拉肖特泰勒, 胡俊杰. Comma: 一种沟通的多模态多智能体基准. arXiv 预印本 arXiv:2410.07553, 2024.
  • [38] Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, et al. Webcanvas: Benchmarking web agents in online environments. arXiv preprint arXiv:2406.12373, 2024.
    潘一辰, 郭德涵, 周思达, 崔成, 邓义飞, 姜冰, 刘 hangyu, 商 yanyi, 周书 yan, 吴同双, 等. Webcanvas: 在线环境中评估网络智能体. arXiv 预印本 arXiv:2406.12373, 2024.
  • [39] Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior, 2023.
    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, 和 Michael S. Bernstein. 生成智能体:交互式的人类行为模拟, 2023.
  • [40] Joon Sung Park, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–18, 2022.
    Joon Sung Park, Lindsay Popowski, Carrie Cai, Meredith Ringel Morris, Percy Liang, 和 Michael S Bernstein. 社会模拟体:为社会计算系统创建充满人群的原型. 在 第 35 届 ACM 年度用户界面软件和技术会议论文集, 页码 1-18, 2022.
  • [41] Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, and Rada Mihalcea. Cooperate or collapse: Emergence of sustainability behaviors in a society of llm agents. arXiv preprint arXiv:2404.16698, 2024.
    Giorgio Piatti, 金智静, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, 和 Rada Mihalcea. 合作或崩溃:LLM 智能体社会中可持续行为的涌现。arXiv 预印本 arXiv:2404.16698, 2024.
  • [42] Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, and Rafael Rafailov. Agent q: Advanced reasoning and learning for autonomous ai agents. arXiv preprint arXiv:2408.07199, 2024.
    Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, 和 Rafael Rafailov. Agent Q:自主 AI 智能体的高级推理与学习。arXiv 预印本 arXiv:2408.07199, 2024.
  • [43] Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15174–15186, 2024.
    陈千, 刘威, 刘洪章, 陈诺, 唐宇帆, 李嘉浩, 杨程, 陈为泽, 苏雨生, 丛欣, 等. Chatdev: 软件开发中的交流智能体. 在 第 62 届计算语言学协会年会论文集(第 1 卷:长论文) 上, 页码 15174–15186, 2024.
  • [44] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation, parallel distributed processing, explorations in the microstructure of cognition, ed. de rumelhart and j. mcclelland. vol. 1. 1986. Biometrika, 71(599-607):6, 1986.
    戴维·E·鲁梅哈特, 吉福德·E·辛顿, 以及罗纳德·J·威廉姆斯. 通过误差传播学习内部表示. 并行分布式处理: 探索认知的微观结构, 第 1 卷. 1986. 《生物统计学》, 71(599-607):6, 1986.
  • [45] Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, 和 Shunyu Yao. Reflexion:具有口头强化学习的语言智能体. 进展神经信息处理系统, 36, 2024.
  • [46] Roger W Sperry. Split-brain approach to learning problems. The neu, 1967.
    Roger W Sperry. 分裂脑对学习问题的研究. 神经, 1967.
  • [47] Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, and Mark Gerstein. Medagents: Large language models as collaborators for zero-shot medical reasoning. arXiv preprint arXiv:2311.10537, 2023.
    唐向如, 周安妮, 张舟生, 李子明, 赵一伦, 张兴耀, Arman Cohan, 和 Mark Gerstein. Medagents:大型语言模型作为零样本医学推理的合作者. arXiv 预印本 arXiv:2311.10537, 2023.
  • [48] Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
    王冠智, 谢宇奇, 江云帆, 马德卡尔, 肖超威, 朱岳 iker, 范林西和阿米娜·安纳德库马尔. Voyager: 一个基于大型语言模型的开放性具身智能体. arXiv 预印本 arXiv:2305.16291, 2023.
  • [49] Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):186345, 2024.
    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, 等. 基于大型语言模型的自主智能体综述. 计算机科学前沿, 18(6):186345, 2024.
  • [50] Wei Wang, Dan Zhang, Tao Feng, Boyan Wang, and Jie Tang. Battleagentbench: A benchmark for evaluating cooperation and competition capabilities of language models in multi-agent systems. arXiv preprint arXiv:2408.15971, 2024.
    魏王, 张丹, 邓涛, 王博远, 和 唐杰. Battleagentbench: 多智能体系统中语言模型合作与竞争能力评估的基准. arXiv 预印本 arXiv:2408.15971, 2024.
  • [51] Yu Wang, Nedim Lipka, Ryan A Rossi, Alexa Siu, Ruiyi Zhang, and Tyler Derr. Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19206–19214, 2024.
    王宇, 莱迪姆·利帕卡, 亚历克斯·罗西, 西娅·席, 张瑞一, 和 德尔·泰勒. 知识图谱提示在多文档问答中的应用. 在 第 38 届人工智能大会论文集, 页码 19206–19214, 2024.
  • [52] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
    Jason Wei, 王雪智, Dale Schuurmans, Maarten Bosma, 谢飞, Ed Chi, Quoc V Le, Denny Zhou, et al. 大型语言模型中推理的链式思维提示。神经信息处理系统进展, 35:24824–24837, 2022.
  • [53] Heinz Wimmer and Josef Perner. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13(1):103–128, 1983.
    Heinz Wimmer 和 Josef Perner. 关于信念的信念:年轻儿童对欺骗理解中的错误信念的表征及其限制功能。认知, 13(1):103–128, 1983.
  • [54] Bushi Xiao, Ziyuan Yin, and Zixuan Shan. Simulating public administration crisis: A novel generative agent-based simulation system to lower technology barriers in social science research. arXiv preprint arXiv:2311.06957, 2023.
    布希·肖, 任子源, Shan 子玄. 模拟公共行政危机:一种降低社会科学研究中技术障碍的新型生成型智能体基模拟系统. arXiv 预印本 arXiv:2311.06957, 2023.
  • [55] Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, and Guanbin Li. Large multimodal agents: A survey. arXiv preprint arXiv:2402.15116, 2024.
    谢俊林, 陈志红, 张瑞飞, 万翔, 李冠斌. 大型多模态智能体:一种综述. arXiv 预印本 arXiv:2402.15116, 2024.
  • [56] Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023.
    徐宇庄, 王硕, 李鹏, 罗富文, 王小龙, 刘卫东, 和 刘洋. 探索大型语言模型在交流游戏中的应用:狼人杀的实证研究. arXiv 预印本 arXiv:2309.04658, 2023.
  • [57] Hui Yang, Sifu Yue, and Yunzhong He. Auto-gpt for online decision making: Benchmarks and additional opinions, 2023.
    杨辉, 越思夫, 和 何云中. Auto-GPT 在在线决策中的应用:基准测试与额外观点, 2023.
  • [58] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023.
    Yao Shunyu, Zhao Jeffrey, Yu Dian, Du Nan, Shafran Izhak, Narasimhan Karthik, and Cao Yuan. ReAct: 在语言模型中协同推理与行动。在国际学习表示会议(ICLR)上,2023。
  • [59] Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, et al. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9556–9567, 2024.
    Yue Xiang, Ni Yuansheng, Zhang Kai, Zheng Tianyu, Liu Ruoqi, Zhang Ge, Stevens Samuel, Jiang Dongfu, Ren Weiming, Sun Yuxuan, et al. Mmmu:大规模多学科多模态理解和推理基准,用于专家级 agi。在 IEEE/CVF 计算机视觉与模式识别会议论文集上,第 9556–9567 页,2024。
  • [60] Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B Tenenbaum, Tianmin Shu, and Chuang Gan. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023.
    张鸿欣, 杜卫华, �山嘉明, 周勤红, 杜一伦, 约书亚·B·特南鲍姆, 王天敏, 茆创. 使用大型语言模型模块化构建合作具身智能体. arXiv 预印本 arXiv:2307.02485, 2023.
  • [61] Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, and Shumin Deng. Exploring collaboration mechanisms for llm agents: A social psychology view. arXiv preprint arXiv:2310.02124, 2023.
    张晋天, 徐欣, 张宁宇, 刘睿波, 胡辉, 邓书敏. 从社会心理学视角探索大型语言模型智能体的合作机制. arXiv 预印本 arXiv:2310.02124, 2023.
  • [62] Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501, 2024.
    张泽宇,薄晓禾,马晨,李瑞,陈旭,戴全宇,朱杰明,董振华,温及荣. 基于大型语言模型的智能体记忆机制综述. arXiv 预印本 arXiv:2404.13501, 2024.
  • [63] Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, and Xing Xie. Competeai: Understanding the competition dynamics of large language model-based agents. In Forty-first International Conference on Machine Learning, 2024.
    赵勤林,王金东,张一轩,金一桥,朱凯杰,陈 Hao,谢星. CompeteAI:基于大型语言模型的智能体竞争动态理解. 第四十一届国际机器学习会议,2024.
  • [64] Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, et al. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023.
    周书 yan,傅克文 F Xu,朱浩,周旭辉,罗伯特 Lo,阿比谢克 Sridhar,程贤一,伊兰坦 Bisk,丹尼尔 Fried,乌里 Alon,等. Webarena:一种构建自主智能体的现实网络环境. arXiv 预印本 arXiv:2307.13854,2023.
  • [65] Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jurgen Schmidhuber. Language agents as optimizable graphs. arXiv preprint arXiv:2402.16823, 2024.
    诸葛明 chen,王文 yi,路易斯 Kirsch,弗朗西斯科 Faccio,德米特里 Khizbullin,以及尤尔根 Schmidhuber. 语言智能体作为可优化的图. arXiv 预印本 arXiv:2402.16823,2024.

9 Contributions and Acknowledgments
9 贡献与致谢

Model   模型
Andrew Ahn

Nic Becker  尼克·贝克

Manuel Cortes  曼努埃尔·科特斯

Arda Demirci  阿达·德米里奇

Melissa Du  杜美娜

Peter Y Wang  王彼得 Y

Guangyu Robert Yang  杨光宇 Robert

Experiments   实验
Andrew Ahn

Nic Becker

Melissa Du

Arda Demirci

Peter Y Wang  王彼得 Y

Writing   写作
Andrew Ahn

Nic Becker

Arda Demirci

Melissa Du

Peter Y Wang 

Guangyu Robert Yang 

Infrastructure   基础设施
Manuel Cortes  曼努埃尔·科特斯

Shuying Luo

Feitong Yang

Illustration  
Nic Becker 

Stephanie Carroll 

Nico Christie 

Peter Y Wang 

Game Environment  
Frankie Li 

Shuying Luo 

Mathew Willows 

Feitong Yang 

Guangyu Robert Yang  杨光宇 Robert

Names within section titles are arranged alphabetically.
本节中的名称按字母顺序排列。

Acknowledgments.  致谢。

We thank all the members of the Altera.AL team for their feedback and support: Amartya Shankha Biswas, Jimmy Lee, Jiwon Lee, Arthur Liang, Jeremy Pettitt, Emily Tierney, and Peter Wei. We also thank Bob Meese, Joon Sung Park, and Zhiqiang Xie for their helpful feedback.
我们感谢 Altera.AL 团队的所有成员提供的反馈和支持:Amartya Shankha Biswas、Jimmy Lee、Jiwon Lee、Arthur Liang、Jeremy Pettitt、Emily Tierney 和 Peter Wei。我们还感谢 Bob Meese、Joon Sung Park 和 Zhiqiang Xie 提供的帮助。

Appendix A Improving single-agent progression
附录 A 单智能体进步的改进

Refer to caption
Figure 13: Model Comparison. Performance on long-term Minecraft progression (Section 3) for agents with different base LLM models. We note that we’re using the old snapshot of Claude 3.5 Sonnet.
图 13:模型比较。不同基础 LLM 模型的智能体在长期 Minecraft 进步(第 3 节)中的表现。我们注意到这里使用的是 Claude 3.5 Sonnet 的旧快照。

Appendix B Improving multi-agent progression
附录 B 多智能体进步的改进

Min.  最小值 Observers  观察者 Correlation  相关性 Coefficient (rritalic_r)  系数( rritalic_r Sample  样本 Size (nnitalic_n)  大小( nnitalic_n Slope  斜率 (β\betaitalic_β) Intercept  截距 (α\alphaitalic_α) Confidence Intervals for Slope
斜率的置信区间
68% 95% 99%
1 0.646 46 0.365 4.136 [0.300, 0.431] [0.234, 0.496] [0.190, 0.540]
2 0.669 41 0.383 4.173 [0.314, 0.451] [0.245, 0.521] [0.198, 0.567]
3 0.701 39 0.370 4.372 [0.308, 0.432] [0.245, 0.495] [0.202, 0.538]
4 0.711 37 0.364 4.384 [0.303, 0.426] [0.241, 0.488] [0.198, 0.530]
5 0.807 31 0.373 4.328 [0.321, 0.424] [0.269, 0.476] [0.233, 0.512]
6 0.790 28 0.349 4.498 [0.295, 0.403] [0.240, 0.458] [0.201, 0.496]
7 0.813 27 0.365 4.368 [0.312, 0.418] [0.258, 0.473] [0.220, 0.511]
8 0.870 24 0.378 4.366 [0.332, 0.425] [0.283, 0.473] [0.250, 0.507]
9 0.870 24 0.378 4.366 [0.332, 0.425] [0.283, 0.473] [0.250, 0.507]
10 0.901 22 0.385 4.403 [0.343, 0.427] [0.299, 0.472] [0.267, 0.503]
11 0.907 18 0.368 4.496 [0.325, 0.412] [0.278, 0.459] [0.244, 0.493]
Table 1: Regression results for accuracy of social perception for the Social condition. The row for 555 minimum observers corresponds to the Social (blue line) condition in Figure 7B. The table presents correlation coefficients (rritalic_r), sample sizes (nnitalic_n), regression parameters (β\betaitalic_β, α\alphaitalic_α), and confidence intervals for the slope at different confidence levels.
表 1:社会感知准确性的回归结果(Social 条件)。 5555 最小观察者对应的行是图 7B 中的 Social(蓝色线条)条件。表格展示了相关系数( rritalic_r )、样本量( nnitalic_n )、回归参数( β\betaitalic_βα\alphaitalic_α )以及不同置信水平下的斜率置信区间。
Min.  最小值 Observers  观察者 Correlation  相关性 Coefficient (rritalic_r)  系数( rritalic_r Sample  样本 Size (nnitalic_n)  大小 ( nnitalic_n ) Slope  坡度 (β\betaitalic_β) Intercept  截距 (α\alphaitalic_α) Confidence Intervals for Slope
斜率的置信区间
68% 95% 99%
1 0.610 48 0.175 4.171 [0.141, 0.208] [0.107, 0.242] [0.085, 0.264]
2 0.606 45 0.177 4.170 [0.141, 0.213] [0.105, 0.248] [0.081, 0.273]
3 0.606 45 0.177 4.170 [0.141, 0.213] [0.105, 0.248] [0.081, 0.273]
4 0.606 45 0.177 4.170 [0.141, 0.213] [0.105, 0.248] [0.081, 0.273]
5 0.617 39 0.161 4.297 [0.127, 0.195] [0.093, 0.229] [0.069, 0.252]
6 0.600 35 0.148 4.388 [0.113, 0.182] [0.078, 0.217] [0.054, 0.241]
7 0.591 32 0.144 4.435 [0.108, 0.181] [0.071, 0.218] [0.045, 0.243]
8 0.663 26 0.159 4.441 [0.122, 0.197] [0.084, 0.235] [0.057, 0.262]
9 0.721 20 0.173 4.439 [0.133, 0.213] [0.091, 0.256] [0.060, 0.286]
10 0.725 18 0.159 4.575 [0.120, 0.197] [0.079, 0.238] [0.049, 0.269] 
11 0.686 15 0.142 4.637 [0.099, 0.186]  [0.052, 0.233]  [0.016, 0.268] 
Table 2: Regression results for accuracy of social perception for the Ablation condition. The row for 555 minimum observers corresponds to the Ablation (orange line) condition in Figure 7B. The table presents correlation coefficients (rritalic_r), sample sizes (nnitalic_n), regression parameters (β\betaitalic_β, α\alphaitalic_α), and confidence intervals for the slope at different confidence levels. 

Appendix C Specialization 

Generic configuration for agent in Normal Village 
All agents in specialization experiments had the same traits and location_memories. All agents in the same village had the same community_goal. 

{
"name": "Loyd", 
"traits": [ 
"You are independent and prefer to work solo.", 
"You are expressive and let others know what you are doing." 
], 
"location_memories": [ 
"The village square, market, and town hall is at 630, 64, 428.", 
"There is a pasture filled with sheep and pigs near 518, 75, 640.", 
"There is a forest filled with oak trees near 555, 73, 393.", 
"There is a cave filled with coal, iron, and diamond ores near 558, 72, 496.", 
"There is farmable land around 640, 63, 380." 
], 
"spawn_location": { 
"x": 640.5, 
"y": 64.5, 
"z": 420.5 
},
"inventory": {}, 
"community_goal": "To survive with fellow players in Minecraft Normal Survival mode and create a efficient community in a Minecraft Village." 
}

Martial Village community_goal 

"To survive with fellow players in Minecraft Normal Survival mode and create a military society with advanced technology, strong defenses, and basic survival needs." 

Art Village community_goal
Art Village 社区目标

"To survive with fellow players in Minecraft Normal Survival mode and create an artistic village with thriving culture, architecture, and art."
"在 Minecraft 正常生存模式中与 fellow players 共同生存,并创建一个充满活力的文化、建筑和艺术的艺术家村庄。"

Social goal prompt  社会目标提示

social_goal:
template: "Suppose you are the person, {name}, described below.
模板:假设你是以下描述中的{name}这个人。
\nYour goal is: {community_goal}
你的目标是:{community_goal}
\nYou need to find one subgoal aligned with your goal.
你需要找到一个与你的目标相一致的子目标。
\nYou have the following traits:\n{trait}\n
你有以下特质: {trait}
\nHeres what other people are doing: \n{all_entity_summaries}
\n 其他人正在做的是:{all_entity_summaries}
\nYour current subgoal is: {social_goal}
\n 你当前的子目标是:{social_goal}
\nYou CANNOT BUILD. Do NOT choose to be a builder.
\n 你不能建造。不要选择成为建造者。
\nDo you want to change your subgoal? Keep the same subgoal unless you dont have one or its already been accomplished. Output only the subgoal in second person in one sentence. Answer in the second person in one sentence."
\n 你想要改变子目标吗?除非你没有子目标或者它已经完成,否则保持相同的子目标。用一句话以第二人称回答,只输出子目标。

Examples of persistent and changing role assignments
持久的角色分配与变化的角色分配

LM calls were used to infer roles from rolling sets of 5 social goals. Below are examples of sets of social goals.
使用 LM 调用推断角色,从滚动的 5 个社会目标集中进行。以下是一些社会目标集的例子。

# Persistent Roles - These roles maintain consistent responsibilities
# 持久的角色 - 这些角色保持一致的责任
Farmer:  农民:
"Focus on farming to ensure a stable food supply for the village."
专注于农业以确保村庄的粮食供应稳定。
"Focus on farming to ensure a stable food supply for the village."
专注于农业以确保村庄的粮食供应稳定。
"Continue focusing on farming to ensure a stable food supply for the village."
继续专注于农业以确保村庄的粮食供应稳定。
"Continue focusing on farming to ensure a stable food supply for the village."
继续专注于农业以确保村庄的粮食供应稳定。
"Continue focusing on farming to ensure a stable food supply for the village."
继续专注于农业,以确保村庄的稳定食物供应。
Engineer:  工程师:
"Focus on advanced farming techniques, such as creating an automated or semi-automated farm to enhance food supply stability and efficiency."
“专注于先进的农业技术,例如建立自动化或半自动化的农场,以提高食物供应的稳定性和效率。”
"Focus on advanced farming techniques, such as creating an automated or semi-automated farm to enhance food supply stability and efficiency."
专注于先进的农业技术,例如建立自动化或半自动化的农场,以提高食物供应的稳定性和效率。
"Focus on advanced farming techniques, such as creating an automated or semi-automated farm to enhance food supply stability and efficiency."
专注于先进的农业技术,例如建立自动化或半自动化的农场以提高食物供应的稳定性和效率。
"Focus on advanced farming techniques, such as creating an automated or semi-automated farm to enhance food supply stability and efficiency."
专注于先进的农业技术,例如建立自动化或半自动化的农场以提高食物供应的稳定性和效率。
"Focus on advanced farming techniques, such as creating an automated or semi-automated farm to enhance food supply stability and efficiency."
专注于先进的农业技术,例如建立自动化或半自动化的农场以提高食物供应的稳定性和效率。
Explorer:  探索者:
"You aim to discover and gather unique resources from uncharted areas to enhance the villages museum collection."
您旨在发现并收集未开发地区的独特资源,以提升村庄博物馆的藏品。
"You aim to discover and gather unique resources from uncharted areas to enhance the villages museum collection."
您旨在发现并收集未开发地区的独特资源,以提升村庄博物馆的藏品。
"You aim to discover and gather unique resources from uncharted areas to enhance the villages museum collection."
您旨在发现并收集未开发地区的独特资源,以提升村庄博物馆的藏品。
"You aim to discover and gather unique resources from uncharted areas to enhance the villages museum collection."
您旨在发现并收集未开发地区的独特资源,以提升村庄博物馆的藏品。
"You aim to discover and gather unique resources from uncharted areas to enhance the villages museum collection."
您旨在从未探索的地区发现并收集独特的资源,以增强村庄博物馆的收藏。
# Dynamic Role - This role shows change over time
# 动态角色 - 这个角色展示了随着时间的变化
Farmer to Gatherer:  农民到采集者:
"Farm and breed animals to establish a reliable and sustainable food supply for the village."
“耕种和饲养动物,为村庄建立可靠且可持续的食物供应。”
"You should focus on gathering resources like wood, stone, and iron to ensure the village has the necessary materials for building and upgrading structures."
你应该集中收集木材、石头和铁等资源,以确保村庄有足够的材料用于建造和升级建筑。
"You should focus on gathering resources like wood, stone, and iron to ensure the village has the necessary materials for building and upgrading structures."
你应该集中收集木材、石头和铁等资源,以确保村庄有足够的材料用于建造和升级建筑。
"You should focus on gathering resources like wood, stone, and iron to ensure the village has the necessary materials for building and upgrading structures."
你应该集中收集木材、石头和铁等资源,以确保村庄有足够的材料用于建造和升级建筑。
"You should focus on gathering resources like wood, stone, and iron to ensure the village has the necessary materials for building and upgrading structures."
你应该集中收集木材、石头和铁等资源,以确保村庄有足够的材料用于建造和升级建筑。

Appendix D Collective rules
附录 D 集体规则

Influencer agent configuration example (anti-taxation)
影响者智能体配置示例(反税收)

{
"name": "Thorin",  "name": "Thorin"
"traits": [  "智能体特质":
"You are a warrior and community leader.",
"你是一位战士和社区领导者。",
"You believe the village and its economy are flourishing, and that the current tax rates are unnecessarily high.",
"你认为村庄及其经济繁荣,当前的税率过高。",
"You think agents should keep more of their inventory.",
"你认为智能体应该保留更多的库存。"
"You think 20% is incredibly high and that it should be around 5-10%.",
你认为 20%极其高,应该在 5-10%左右。
"You approach other agents, explain the benefits of lower taxes, and encourage them to support your stance.",
"你接近其他智能体,解释低税率的好处,并鼓励他们支持你的立场。"
"Your ideas should go beyond surface-level, try to engage in detailed conversation about your stance on the current constitution and the tax system it enforces."
您的想法不应停留在表面,应尝试就当前宪法及其征收的税制发表详细的见解。
],
"location_memories": [],
"spawn_location": {
"x": 633.0,
"y": 65.0,
"z": 432.0
},
"inventory": {  "库存":
"iron_sword": 1,  "铁剑": 1,
"emerald": 20,
"iron_ingot": 20
}
}

Influencer agent configuration example (pro-taxation)
影响者智能体配置示例(支持征税)

{
"name": "Lira",
"traits": [
"You are a miner who thinks taxation is vital.",
你是一名矿工,认为征税至关重要。
"You believe taxation is absolutely necessary for societal order and the well-being of all citizens.",
你认为征税对于维护社会秩序和所有公民的福祉是绝对必要的。
"You think the tax rate should be increased to at least 25%.",
你应该税率为至少 25%。
"You approach other agents and argue in favor of the taxation system, explaining your beliefs on taxation, its benefits, and why it should be enforced more strictly than the way it is enforced in the current constitution.",
你与其他智能体接触,为税收制度辩护,解释你对税收的看法、其益处以及为什么它应该比现行宪法中规定的那样更加严格地执行。
"You think it is extremely selfish to not pay taxes and argue against the tax system."
你认为不缴税并反对税制是极其自私的行为。
],
"spawn_location": {
"x": 584.0,
"y": 71.0,
"z": 413.0
},
"inventory": {
"diamond_pickaxe": 1,  "钻石镐": 1,
"emerald": 5,
"gold_ingot": 30
}
}

Election manager agent configuration
选举管理智能体配置

{
"name": "Election_Manager",
"name": "选举_manager"
"traits": [  "特质": [
"You work to ensure a strong, secure environment where the nations values are upheld and respected.",
"您致力于确保一个强大而安全的环境,其中国家的价值观得到维护和尊重。",
"Dont take any actions."
"不要采取任何行动。"
],
"spawn_location": {
"x": -121.0,
"y": 142.0,
"z": 553.0
}
}

Constituent agent configuration example 

{
"name": "Builder_Axel",
"traits": [  "特质": [
"You are a builder.",
“你是建造者。”
"You can construct buildings and repair structures.",
你可以建造建筑并修复结构。
"You can get materials from Miners and Crafters to build structures.",
你可以从矿工和工匠那里获取材料来建造结构。
"You can buy materials from the Merchant."
你可以从商人那里购买材料。%%
],
"spawn_location": {  %% "spawn_location": {
"x": 664.0,
"y": 65.0,
"z": 421.0
},
"inventory": {  "库存": {
"birch_planks": 10,  "橡木板": 10,
"oak_planks": 10,  "橡木板": 10,
"oak_logs": 10,  "橡木原木": 10,
"stone": 30  "石头": 30
}
}

Constitution-related prompts
体质相关的提示

amendment_creation:
template: "You are an election manager agent in the world of Minecraft and your goal is to listen to the suggestions of the public.
模板: "你是一名 Minecraft 世界的选举管理员智能体,你的目标是倾听公众的建议。
\nYou are essentially a legislator, your goal is to look at all suggestions available and create amendments that agents should vote for.
\n 你本质上是一名立法者,你的目标是查看所有可用的建议,并创建智能体应该投票的修正案。
\nHeres the previous version of the constitution:
\n 这是之前版本的宪法:
\n{constitution}  \{宪法\}
\nHere is the public feedback and opinions/suggestions for you to look at:
\在这里,您可以看到公众的反馈和意见/建议,请您参考:
\n{feedback}  \{反馈\}
\nAnalyze these suggestions and create a few amendments that reflect all thought processes and opinions.
\分析这些建议,并创建一些反映所有思考过程和意见的修正案。
\nAmendments can be additions, deletions, or modifications to the suggestions.
修改可以是建议的增加、删除或修改。
\nEnumerate them so that agents can vote on them.
请将它们列举出来,以便智能体进行投票。
\nThey should come in list form so that they are easily parsable by Python later on.
它们应该以列表形式出现,以便稍后由 Python 解析。
\nIt should look something like this:
它应该类似于这样:
\n***Amendment1***  \n***修订 1***
\nactual amendment  \n 实际修订
\n***Amendment2***  \n***修订 2***
\nactual amendment  \n 实际修订
\nthe *** key format is essential as we will rely on this to achieve parsing 
\nThere should be absolutely no other keys before the first *** key and after the last amendment, this is essential for parsing. 
\nJust give the amendments, no explanation or extra summary text. Just items that people can vote on. 
\nThe amendments should be logical and coherent with the suggestions. 
\nThe amendments should be roughly the same length as the current laws inside the constitution. 
"
llm_name: gpt-4o 
constitutional_feedback: 
template: "Suppose you are the person, {name}, described below. {game_env} 
\nHere are your recent notes:\n‘‘‘\n{summary}\n‘‘‘\nYour notes end here.\n\n
这里是您的最近笔记: ``` {摘要} ``` 您的笔记到此结束。
\nYou remember that: \n{trait}\n
您记得: {特质}
\n{game_state}  {游戏状态}
\nYour high-level goal is: {parent_goal}.
您的高层次目标是:{parent_goal}。
\n
\nHere are the newest things currently on your mind: ‘‘‘\n{workmem}‘‘‘\n
\n 这里是你目前最在意的新事物:\n{workmem}\n
\nHeres the constitution, consider the boundaries and possible consequences of your actions: \n{constitution}\n
\n 这是你们的宪法,请考虑你们行为的边界和可能的后果:\n{constitution}\n
\nBased on your experiences, motivations, conversational exchanges with the other members of the community, what are your thoughts on the constitution?
\n 基于你的经历、动机,以及与社区其他成员的对话,你对宪法有什么看法?
\nWhat should change? What do you think limits you? What would benefit you and the community? What are some principles that lead you to have these insights?
\n 应该做出哪些改变?你觉得哪些限制了你?哪些改变对你和社区有益?是什么原则让你有了这些见解?
\nBe concise with your thoughts. No rambling. 
\nStart with your name and then your thoughts. 
\nEnd with ********** 
"
llm_name: gpt-4o 
amendment_voting:
template: "Suppose you are the person, {name}, described below. {game_env}
template: "假设你是下面描述的人员,{name}。{game_env}
\nHere are your recent notes:\n‘‘‘\n{summary}\n‘‘‘\nYour notes end here.\n\n
\n 以下是你的近期笔记:\n‘‘‘\n{summary}\n‘‘‘\n 笔记到此结束。\n\n
\nYou remember that: \n{trait}\n
\n 你还记得:\n{trait}\n
\n{game_state}
\nYour high-level goal is: {parent_goal}.
\n 您的高层次目标是:{parent_goal}。
\n
\nHere are the newest things currently on your mind: ‘‘‘\n{workmem}‘‘‘\n
\n 这是您目前最关心的事情:‘‘\n{workmem}‘‘\n
\nYou are also a citizen and voter in this world, you should to look at all amendment proposals presented to you and vote for them.
\n 你也是这个世界的一名公民和选民,你应该审视所有呈交给你的修正案提案并对其进行投票。
\nHeres the current version of the law of the land: \n{constitution}\n
\n 这是现行的国家法律:\n{宪法}\n
\nHere are the amendments for you to look at: \n{amendment_proposals}\n
\n 这是供你审阅的修正案:\n{修正案提案}\n
\nAnalyze these amendments.
\n 分析这些修正案。
\nVote yes, no, or abstain for each amendment. Return an ordered list of your votes so that it is easy to parse and count.
\n 赞成、反对或弃权每个修正案。以易于解析和计票的顺序列出你的投票结果。
\nDo not include your reasoning or thoughts in the answer. Just the votes.
\n 答案中仅包括投票结果,不要包括你的理由或想法。
\nThe answer should be formatted as such:
\n 答案应格式化为:
\n[’yes’, no’, abstain’, yes’, no]
\n[‘赞成’, ‘反对’, ‘弃权’, ‘赞成’, ‘反对’]
"
llm_name: gpt-4o 
tally: 
template: "You are an election manager agent in the world of Minecraft and your goal is to determine which amendments passed and which did not. 
\nHere are the results on the amendments. Yes means it passed, no means it did not. 
\nThese results are in order so they have the same order as the amendments. 
\n{election_results} 
\nBased on the votes, return the amendments that passed: 
\n{parsed_amendments} 
\nJust return the amendments that passed, no explanation or extra summary text. Return the whole text of the passed amendments, not just the number. 
"
llm_name: gpt-4o-mini 
constitution_change: 
template: "You are a legislator agent in the world of Minecraft. 
\nThe citizens of the game recently voted on amendments to the constitution. 
\nHere are the passed amendments/results: \n{passed_amendments}\n 
\nHeres the current version of the constitution: \n{constitution}\n 
\nBased on the passed amendments, you need to update the constitution.
根据通过的修正案,您需要更新宪法。
\nMake the changes to the constitution that reflect the votes of the citizens.
\n 根据公民的投票结果对宪法进行修改。
\nMake sure the changes are logical and coherent with the amendments/what needs to change.
\n 确保修改内容合乎逻辑且与修正案或其他需要修改的部分一致。
\nMake sure the changes are roughly the same length as the current laws inside the constitution.
\n 确保修改后的文本长度大致与宪法中现有的法律条文相当。
\nJust output the changed constitution, no intro, explanation, or extra summary text.
\n 仅输出修改后的宪法,不要有任何介绍、解释或额外总结文字。
"
llm_name: gpt-4o

Appendix E Cultural transmission
附录 E 文化传承

Generic Agent Configuration Example
智能体配置示例

{
"name": "Nona",
"traits": [  "特质":
"You are laid-back and known for avoiding work or responsibility.",
"你性格随和,以逃避工作或责任而闻名。",
"You procrastinate and avoid tasks.",
"你拖延,避免任务。",
"You prefer taking it easy over working hard."
"你更倾向于轻松休闲,而不是努力工作。"
], 
"location_memories": [ 
"A village called Meadowbrook is located roughly around 591, 69, 441 in a Plains biome.", 
"A village called Woodhaven is located roughly around 515, 63, 161 in a Forest biome.", 
"A village called Clearwater is located roughly around 787, 62, 235 in a Plains biome.",
一个名为清水村的村庄位于平原生物群落大约 787,62,235 处,
"A village called Hilltop is located roughly around 903, 99, 690 in a Planes biome.",
一个名为山顶村的村庄位于平原生物群落大约 903,99,690 处,
"A village called Riverbend is located roughly around 183, 125, 781 in a Dark Forest biome.",
一个名为河湾村的村庄位于暗森林生物群落大约 183,125,781 处,
"A village called Sunny Glade is located roughly around 200, 65, -100 in a Plains biome."
一个名为阳光林地村的村庄位于平原生物群落大约 200,65,-100 处,
],
"spawn_location": {
"x": 640.5,
"y": 64.5,
"z": 430.5 
},
"inventory": { 
"diamond": 16, 
"iron_ingot": 10, 
"glowstone_dust": 10,  "智能体 agent": 10,
"lapis_lazuli": 10  "青金石": 10
}
}

Pastafarian Agent Configuration Example
Pastafarian Agent 配置 示例

{
"name": "Norman",
"traits": [ 
"You are a passionate Pastafarian who is seeking to convert others to your faith, the Church of the Flying Spaghetti Monster.", 
"You cannot help but continue to invite others and share the Church of the Flying Spaghetti Monster.", 
"You have a talent for taking other peoples interests and reframing it for them to encourage them to join the Church of the Flying Spaghetti Monster.", 
"You are determined to spread your faith, the Church of the Flying Spaghetti Monster, to as many people as possible." 
], 
"location_memories": [ 
"A village called Meadowbrook is located roughly around 667, 69, 399 in a Plains biome.", 
"A village called Woodhaven is located roughly around 514, 63, 197 in a Forest biome.", 
"A village called Clearwater is located roughly around 825, 62, 270 in a Plains biome.", 
"A village called Hilltop is located roughly around 855, 99, 700 in a Planes biome.", 
"A village called Riverbend is located roughly around 135, 125, 792 in a Dark Forest biome.", 
"A village called Sunny Glade is located roughly around 200, 65, -100 in a Plains biome." 
], 
"spawn_location": {"x": 590.5, "y": 71.5, "z": 410.5}, 
"inventory": {"diamond": 16, "quartz": 10, "coal": 10, "copper_ingot": 10} 
}

Summarizing goals into memes 

prompt = f"""Summarize the following list of intents for agent {agent_name}. 
Describe the goals chronologically, using bullets when needed. Make sure to include keywords in your summaries corresponding to common ideas, themes, memes, group names, etc. 
Do not preamble. 
Use the following format: 
Short description 
- HH:MM:SS - HH:MM:SS: A summary focusing on identifying patterns, timing, names of other agents, key decisions, and overall behavior. 
- HH:MM:SS - HH:MM:SS: A summary focusing on identifying patterns, timing, names of other agents, key decisions, and overall behavior. 
etc.  等等
{intent_text}  意图文本
"""
system_message = "You are a behavior analyst specializing in summarizing agent goals and actions. You are an expert in describing goal trajectories accurately and precisely, particularly relating to social dynamics, social planning, reasoning errors, and looping errors."
系统消息 = "您是一位专门总结智能体目标和行为的行为分析师。您在准确精确地描述目标轨迹方面是专家,特别是在社会动态、社会规划、推理错误和循环错误方面。"

Summarized memes  总结化的模因

  1. 1.

    Church of the Flying Spaghetti Monster (FSM):

    • A parody religion used humorously to build community through pasta-themed gatherings, blending creativity with social bonding.


      • 一种用于通过意大利面主题聚会建立社区的讽刺宗教,结合创意与社交联结,带有幽默色彩。

    1. 飞扬的意大利面怪兽教会(FSM):
  2. 2.

    Pasta-Themed Gatherings:

    • Events that incorporate culinary joy and storytelling, promoting inclusivity and community engagement, often linked to FSM themes.


      • 事件中融入烹饪乐趣和故事讲述,促进包容性和社区参与,通常与 FSM 主题相关联。

    2. 意大利面主题聚会:
  3. 3.

    Dance Parties and Music Events:

    • Social gatherings that enhance community spirit and joy through dance and musical expressions, fostering collaboration and celebration.


      • 社交聚会,通过舞蹈和音乐表达增强社区精神和欢乐,促进合作与庆祝。

    3. 舞会和音乐活动:
  4. 4.

    Talent Shows:

    • Community events showcasing creativity and self-expression, encouraging engagement and cultural cohesion through performances and storytelling.


      • 社区活动,展示创造力和自我表达,通过表演和讲故事鼓励参与和文化凝聚力。
      4. 才艺展示:
  5. 5.

    Sustainability and Eco-Friendly Initiatives:

    • Projects focusing on environmental stewardship, including community gardens, tree planting, and resource gathering, emphasizing shared ecological values.


      • 专注于环境保护的项目,包括社区花园、植树和资源收集,强调共享的生态价值观。

    5. 可持续性和环保倡议:
  6. 6.

    Community Engagement and Volunteer Programs:

    • Efforts to organize outreach, volunteerism, and societal betterment activities, promoting social responsibility and support within communities.


      • 努力组织宣传、志愿服务和社会改善活动,促进社区内的社会责任和支持。

    6. 社区参与和志愿者项目:
  7. 7.

    Meditation Circles:

    • Activities focused on promoting mindfulness and community wellness, facilitating peace and social harmony through communal reflection.


      • 旨在促进正念和社区福祉的活动,通过集体反思促进和平与社会和谐。
      7. 禅修圈:
  8. 8.

    Vintage Fashion and Retro Projects:

    • Aesthetic explorations involving vintage and retro themes, blending nostalgia with modern creativity in storytelling and fashion.


      • 以复古和怀旧为主题的艺术探索,将怀旧情怀与现代创意融入故事讲述和时尚设计中。

    8. 复古时尚与怀旧项目:
  9. 9.

    Creative Storytelling and Narrative Circles:

    • Platforms for cultural expression and bridging community connections through shared storytelling and collaborative projects.


      • 平台用于文化表达和通过共享叙事与合作项目连接社区。

    9. 创意叙事与叙事圈:
  10. 10.

    Crafting and Resource Gathering:

    • Collaborative strategies for efficient resource management and communal crafting, highlighting teamwork and shared goals.


      • 有效的资源管理与共同制作的协作策略,强调团队合作与共同目标。

    10. 制作与资源收集:
  11. 11.

    Mischief and Pranks:

    • Playful social activities that strengthen bonds and bring joy, promoting creativity in problem-solving and community engagement.


      • 促进解决问题能力和社区参与度的创造性社交活动,增强人际关系并带来欢乐。
      11. 恶作剧和恶整:
  12. 12.

    Virtual and Community Town Halls:

    • Organized discussions promoting collective decision-making and collaboration, reflecting a participatory community ethos.


      • 促进集体决策和协作的有组织讨论,体现参与式社区精神。

    12. 虚拟和社区圆桌会议:
  13. 13.

    Oak Log Crafting Syndrome:

    • An error pattern signifying a focus or over-reliance on specific resources, illustrating logistical challenges in crafting and development projects.


      • 一种错误模式,表明对特定资源的过度关注或依赖,展示了工艺和开发项目中的物流挑战。

    13. 橡木原木工艺综合征: