Generative Agents: Interactive Simulacra of Human Behavior 生成代理:交互式人类行为模拟器
Joon Sung ParkStanford University 斯坦福大学Stanford, USA 美国斯坦福大学joonspk@stanford.eduMeredith Ringel Morris 梅雷迪思-林格尔-莫里斯Google DeepMind 谷歌 DeepMindSeattle, WA, USA 美国华盛顿州西雅图merrie@google.com
Joseph C. O'Brien 约瑟夫-C-奥布莱恩Stanford University 斯坦福大学Stanford, USA 美国斯坦福大学jobrien3@stanford.edu
Carrie J. CaiGoogle Research 谷歌研究Mountain View, CA, USA 美国加利福尼亚州山景城cjcai@google.comMichael S. Bernstein 迈克尔-伯恩斯坦Stanford University 斯坦福大学Stanford, USA 美国斯坦福大学msb@cs.stanford.edu
Figure 1: Generative agents are believable simulacra of human behavior for interactive applications. In this work, we demonstrate generative agents by populating a sandbox environment, reminiscent of The Sims, with twenty-five agents. Users can observe and intervene as agents plan their days, share news, form relationships, and coordinate group activities. 图 1:生成式代理是互动应用中可信的人类行为模拟。在这项工作中,我们用二十五个代理填充了一个沙盒环境,让人联想到《模拟人生》,从而展示了生成代理。用户可以观察和干预代理的日常计划、分享新闻、建立关系和协调小组活动。
authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two 作者会写作;他们会形成观点、注意到对方并发起对话;他们会在计划第二天的工作时回忆和反思过去的日子。为了使生成式代理成为可能,我们描述了一种架构,它扩展了一个大型语言模型,以存储代理使用自然语言的完整经验记录,将这些记忆逐渐合成为更高层次的反思,并动态检索以规划行为。我们将生成式代理实例化,以填充一个受《模拟人生》启发的交互式沙盒环境,最终用户可在该环境中使用自然语言与由 25 个代理组成的小镇进行交互。在一项评估中,这些生成式代理产生了可信的个体行为和突发社会行为。例如,从一个用户指定的概念开始,即一个代理想要举办一个情人节派对,代理们就会在接下来的两个星期里自主地发出派对邀请。
days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture-observation, planning, and reflection-each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior 在这些日子里,他们结识了新朋友,互相约对方参加聚会,并协调在正确的时间一起参加聚会。我们通过分析证明,我们的代理架构的各个组成部分--观察、规划和反思--都对代理行为的可信度做出了至关重要的贡献。通过将大型语言模型与计算交互式代理相融合,这项工作引入了可实现可信人类行为模拟的架构和交互模式。
CCS CONCEPTS 综合传播战略概念
Human-centered computing Interactive systems and tools; Computing methodologies Natural language processing. 以人为本的计算 交互式系统和工具; 计算方法 自然语言处理。
KEYWORDS 关键词
Human-AI interaction, agents, generative AI, large language models 人机交互、代理、生成式人工智能、大型语言模型
ACM Reference Format: ACM 参考格式:
Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. In The 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23), October 29-November 01, 2023, San Francisco, CA, USA. ACM, New York, NY, USA, 22 pages. https //doi.org/10.1145/3586183.3606763 Joon Sung Park、Joseph C. O'Brien、Carrie J. Cai、Meredith Ringel Morris、Percy Liang 和 Michael S. Bernstein。2023.生成式代理:人类行为的交互式模拟。In The 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23), October 29-November 01, 2023, San Francisco, CA, USA.ACM, New York, NY, USA, 22 pages. https //doi.org/10.1145/3586183.3606763
1 INTRODUCTION 1 引言
How might we craft an interactive artificial society that reflects believable human behavior? From sandbox games such as The Sims to applications such as cognitive models [23] and virtual environments [10, 59], for over four decades, researchers and practitioners have envisioned computational agents that can serve as believable proxies of human behavior. In these visions, computationallypowered agents act consistently with their past experiences and react believably to their environments. Such simulations of human behavior could populate virtual spaces and communities with realistic social phenomena [27, 80], train people on how to handle rare yet difficult interpersonal situations [44, 52, 94], test social science theories [12, 46], craft model human processors for theory and usability testing , power ubiquitous computing applications [31] and social robots [10, 14], and underpin non-playable game characters [59,85] that can navigate complex human relationships in an open world. 我们该如何打造一个反映可信人类行为的交互式人工社会?从《模拟人生》等沙盒游戏到认知模型[23]和虚拟环境[10, 59]等应用,四十多年来,研究人员和实践者们一直在设想能作为人类行为可信代理的计算代理。在这些设想中,具有计算能力的代理将按照其过去的经验行事,并对环境做出可信的反应。这样的人类行为模拟可以让虚拟空间和社区充满逼真的社会现象[27, 80],训练人们如何处理罕见而困难的人际关系[44, 52, 94],测试社会科学理论[12, 46],为理论和可用性测试制作模型人类处理器 ,为无处不在的计算应用[31]和社交机器人[10, 14]提供动力,并为非游戏角色[59,85]提供支持,使其能够在开放世界中处理复杂的人际关系。
However, the space of human behavior is vast and complex [85, 108]. Despite striking progress in large language models [18] that can simulate human behavior at a single time point [39, 80], fully general agents that ensure long-term coherence would be better suited by architectures that manage constantly-growing memories as new interactions, conflicts, and events arise and fade over time while handling cascading social dynamics that unfold between multiple agents. Success requires an approach that can retrieve relevant events and interactions over a long period, reflect on those memories to generalize and draw higher-level inferences, and apply that reasoning to create plans and reactions that make sense in the moment and in the longer-term arc of the agent's behavior. 然而,人类行为的空间是巨大而复杂的 [85, 108]。尽管能在单个时间点模拟人类行为的大型语言模型[18]取得了引人注目的进展[39, 80],但确保长期一致性的完全通用代理最好采用能管理不断增长的记忆的架构,因为新的互动、冲突和事件会随着时间的推移而出现和消失,同时还要处理多个代理之间展开的级联社会动态。要想取得成功,就需要有一种方法,能够在很长一段时间内检索相关事件和互动,对这些记忆进行反思,以归纳和得出更高层次的推论,并应用这种推理来制定计划和做出反应,从而使代理的行为在当下和更长远的弧线中都具有意义。
In this paper, we introduce generative agents-agents that draw on generative models to simulate believable human behavior-and demonstrate that they produce believable simulacra of both individual and emergent group behavior. Generative agents draw a wide variety of inferences about themselves, other agents, and their environment; they create daily plans that reflect their characteristics and experiences, act out those plans, react, and re-plan when appropriate; they respond when the end user changes their environment or commands them in natural language. For instance, generative agents turn off the stove when they see that their breakfast is burning, wait outside the bathroom if it is occupied, and stop to chat when they meet another agent they want to talk to. A society full of generative agents is marked by emergent social dynamics where new relationships are formed, information diffuses, and coordination arises across agents. 在本文中,我们介绍了生成式代理--利用生成式模型模拟可信人类行为的代理--并证明它们能产生可信的个人行为和突发群体行为模拟。生成式代理可对自身、其他代理和环境做出各种推断;它们可制定反映自身特征和经验的日常计划,执行这些计划,做出反应,并在适当时重新制定计划;当终端用户改变环境或使用自然语言下达指令时,它们会做出反应。例如,生成式代理会在看到早餐烧焦时关闭炉灶,在浴室有人时在浴室外等待,并在遇到想交谈的其他代理时停下来聊天。 一个充满创生型代理的社会,其特点是新关系的形成、信息的传播和代理间的协调等社会动态的出现。
To enable generative agents, we describe an agent architecture that stores, synthesizes, and applies relevant memories to generate believable behavior using a large language model. Our architecture comprises three main components. The first is the memory stream, a long-term memory module that records, in natural language, a comprehensive list of the agent's experiences. A memory retrieval model combines relevance, recency, and importance to surface the records needed to inform the agent's moment-to-moment behavior. The second is reflection, which synthesizes memories into higherlevel inferences over time, enabling the agent to draw conclusions about itself and others to better guide its behavior. The third is planning, which translates those conclusions and the current environment into high-level action plans and then recursively into detailed behaviors for action and reaction. These reflections and plans are fed back into the memory stream to influence the agent's future behavior. 为了实现生成式代理,我们描述了一种代理架构,它可以存储、合成和应用相关记忆,从而使用大型语言模型生成可信的行为。我们的架构由三个主要部分组成。首先是记忆流,这是一个长期记忆模块,用自然语言记录了代理的全部经历。记忆检索模型将相关性、再现性和重要性结合在一起,以显示出代理每时每刻行为所需的记录。其次是反思,即随着时间的推移,将记忆综合为更高层次的推论,使代理能够得出关于自身和他人的结论,从而更好地指导自己的行为。第三是计划,它将这些结论和当前环境转化为高层次的行动计划,然后递归为详细的行动和反应行为。这些思考和计划会反馈到记忆流中,以影响代理的未来行为。
This architecture suggests applications in multiple domains, from role-play and social prototyping to virtual worlds and games. In social role-play scenarios (e.g., interview preparation), a user could safely rehearse difficult, conflict-laden conversations. When prototyping social platforms, a designer could go beyond temporary personas to prototype dynamic, complex interactions that unfold over time. For this paper, we focus on the ability to create a small, interactive society of agents inspired by games such as The Sims. By connecting our architecture to the ChatGPT large language model [77], we manifest a society of twenty-five agents in a game environment. End users can observe and interact with these agents. If an end user or developer wanted the town to host an in-game Valentine's Day party, for example, traditional game environments would require scripting tens of characters' behavior manually. We demonstrate that, with generative agents, it is sufficient to simply tell one agent that she wants to throw a party. Despite many potential points of failure-the party planner must remember to invite other agents to the party, attendees must remember the invitation, those who remember must decide to actually show up, and moreour agents succeed. They spread the word about the party and then 这种架构可应用于多个领域,从角色扮演和社会原型设计到虚拟世界和游戏。在社交角色扮演场景(如面试准备)中,用户可以安全地演练困难的、充满冲突的对话。在设计社交平台原型时,设计者可以超越临时角色,设计出随着时间推移而展开的动态、复杂的互动原型。在本文中,我们将重点讨论受《模拟人生》等游戏的启发,创建一个由代理组成的小型互动社会的能力。 通过将我们的架构连接到 ChatGPT 大型语言模型 [77],我们在游戏环境中展示了一个由 25 个代理组成的社会。最终用户可以观察这些代理并与之互动。举例来说,如果最终用户或开发者希望小镇在游戏中举办情人节派对,传统的游戏环境需要手动编写数十个角色的行为脚本。我们证明,使用生成式代理,只需告诉一个代理她想举办一场派对即可。尽管存在许多潜在的失败点--派对策划者必须记得邀请其他代理参加派对,参加者必须记得邀请函,记得邀请函的代理必须决定实际到场等等,但我们的代理还是成功了。他们传播了派对的消息,然后
show up, with one agent even asking another on a date to the party, all from a single user-generated seed suggestion. 一位代理商甚至约了另一位代理商参加聚会,而这一切都源于用户的一个种子建议。
We conducted two evaluations of generative agents: a controlled evaluation to test whether the agents produce believable individual behaviors in isolation, and an end-to-end evaluation where the agents interacted with each other in open-ended ways over two days of game time to understand their stability and emergent social behaviors. In the technical evaluation, we leverage a methodological opportunity to evaluate an agent's knowledge and behavior by "interviewing" it in natural language to probe the agents' ability to stay in character, remember, plan, react, and reflect accurately. We compared several ablations that limit agents' access to memory, reflection, and planning. We observe that each of these components is critical to strong performance across these interview tasks. Across the technical and end-to-end evaluation, the most common errors arose when the agent failed to retrieve relevant memories, fabricated embellishments to the agent's memory, or inherited overly formal speech or behavior from the language model. 我们对生成式代理进行了两次评估:一次是受控评估,以测试代理是否能单独产生可信的个体行为;另一次是端到端评估,即代理在两天的游戏时间内以开放式方式相互影响,以了解它们的稳定性和新出现的社会行为。在技术评估中,我们利用一个方法机会,通过用自然语言 "采访 "代理来评估代理的知识和行为,以探究代理保持角色、记忆、计划、反应和准确反映的能力。我们比较了几种限制代理进行记忆、反思和计划的方法。我们发现,在这些访谈任务中,每一个环节对于取得优异成绩都至关重要。在整个技术评估和端到端评估中,最常见的错误出现在以下情况:代理未能检索相关记忆、对代理的记忆进行编造修饰,或者从语言模型中继承了过于正式的语言或行为。
In sum, this paper makes the following contributions: 总之,本文做出了以下贡献:
Generative agents, believable simulacra of human behavior that are dynamically conditioned on agents' changing experiences and environment. 生成式代理,可信的人类行为模拟,可根据代理不断变化的经验和环境进行动态调整。
A novel architecture that makes it possible for generative agents to remember, retrieve, reflect, interact with other agents, and plan through dynamically evolving circumstances. The architecture leverages the powerful prompting capabilities of large language models and supplements those capabilities to support longer-term agent coherence, the ability to manage dynamically evolving memory, and recursively produce higher-level reflections. 一种新颖的架构使生成式代理能够记忆、检索、反思、与其他代理互动,并通过动态演化的环境进行规划。该架构利用了大型语言模型的强大提示功能,并对这些功能进行了补充,以支持长期的代理一致性、管理动态演化记忆的能力以及递归生成更高层次反思的能力。
Two evaluations, a controlled evaluation and an end-to-end evaluation, that establish causal effects of the importance of components of the architecture, as well as identify breakdowns arising from, e.g., improper memory retrieval. 两个评估,一个是控制评估,一个是端到端评估,以确定架构各组成部分重要性的因果效应,并查明因记忆检索不当等原因造成的故障。
Discussion of the opportunities and ethical and societal risks of generative agents in interactive systems. We argue that these agents should be tuned to mitigate the risk of users forming parasocial relationships, logged to mitigate risks stemming from deepfakes and tailored persuasion, and applied in ways that complement rather than replace human stakeholders in design processes. 讨论交互系统中生成式代理的机遇以及伦理和社会风险。我们认为,应当对这些代理进行调整,以降低用户形成寄生社会关系的风险,对其进行记录,以降低深度伪造和定制说服所带来的风险,并在设计过程中以补充而非取代人类利益相关者的方式加以应用。
2 RELATED WORK 2 相关工作
In this section, we reflect on the prior literature in human-AI interaction and situate, within its canon, the agenda of building believable proxies of human behavior. This agenda, once hailed as a north star in the interaction, game, and artificial intelligence communities , has remained challenging due to the complexity of human behavior [17, 108]. We synthesize this research to suggest that large language models, though not sufficient by themselves, open up a new angle for creating believable agents when leveraged using the appropriate architecture. 在本节中,我们将对人类与人工智能交互方面的前人文献进行反思,并将建立可信的人类行为代理这一议程置于文献的正文之中。这一议程曾被誉为交互、游戏和人工智能界的北极星 ,但由于人类行为的复杂性,这一议程仍然充满挑战[17, 108]。我们综合这些研究成果,认为大型语言模型虽然本身并不足够,但如果利用适当的架构,将为创建可信的代理打开一个新的视角。
2.1 Human-AI Interaction 2.1 人机交互
Interactive artificial intelligence systems aim to combine human insights and capabilities in computational artifacts that can augment their users [4,30]. A long line of work has explored ways to enable users to interactively specify model behavior. For instance, Crayons demonstrated an early vision of interactive machine learning, allowing non-expert users to train classifiers [30]. Further work helped to articulate how end users might describe their classification goals to the system through examples [34] or demonstration [32]. Recent advancements have extended these explorations to deep learning [63] and prompt-based authoring [50, 67, 106]. 交互式人工智能系统旨在将人类的洞察力和能力结合到可增强用户能力的计算人工制品中[4,30]。长期以来,人们一直在探索如何让用户以交互方式指定模型行为。例如,Crayons 展示了交互式机器学习的早期愿景,允许非专业用户训练分类器[30]。进一步的工作有助于阐明最终用户如何通过实例[34]或演示[32]向系统描述他们的分类目标。最近的进展将这些探索扩展到了深度学习[63]和基于提示的创作[50, 67, 106]。
Meanwhile, a persistent thread of research has advanced the case for language- and agent-based interaction in human-computer interaction. Formative work such as SHRDLU [103] and ELIZA [102] demonstrated the opportunities and the risks associated with natural language interaction with computing systems. As research progressed, it became evident that autonomous agents could offer new metaphors for delegation and interaction [68], but the boundaries of delegation between humans and agents have remained the subject of ongoing debate and refinement [47, 89, 90]. Recently, this technology has reached a level of stability that enables agents to interact via natural language in large and complex online social environments (e.g., [55]). Natural language interaction provides a novel modality that can enhance user abilities in domains such as photo editing and code editing [88]. 与此同时,在人机交互中,基于语言和代理的交互研究也在不断深入。SHRDLU[103]和ELIZA[102]等形成性研究表明了与计算系统进行自然语言交互的机遇和风险。随着研究的深入,自主代理显然可以为授权和交互提供新的隐喻[68],但人类与代理之间的授权界限仍是不断争论和完善的主题[47, 89, 90]。最近,这项技术已经达到了一定的稳定水平,使代理能够在大型复杂的在线社交环境中通过自然语言进行交互(例如,[55])。自然语言交互提供了一种新颖的模式,可以增强用户在照片编辑 和代码编辑等领域的能力[88]。
We convene these threads of work to show that we can now create agents that proxy human behavior for interactive systems, and interact with them using natural language. In doing so, this work reopens the door to examining foundational human-computer interaction questions around cognitive models such as GOMS and Keystroke-Level Model (KLM) [22, 23], around prototyping tools [80], and around ubiquitous computing applications [26, 31, 101]. 我们汇集了这些研究成果,旨在证明我们现在可以创建代理,为交互系统代理人类行为,并使用自然语言与之交互。这样,这项工作重新打开了一扇大门,让我们可以围绕认知模型(如 GOMS 和按键级模型 (KLM) [22, 23])、原型设计工具 [80] 以及泛在计算应用 [26, 31, 101],研究基本的人机交互问题。
2.2 Believable Proxies of Human Behavior 2.2 人类行为的可信代用指标
Prior literature has described believability, or believable agents, as a central design and engineering goal. Believable agents are designed to provide an illusion of life and present a facade of realism in the way they appear to make decisions and act on their own volition, similar to the characters in Disney movies [10, 96]. These agents can populate and perceive an open world environment like the one we inhabit [10, 59], and strive to behave in ways that exhibit emergent behaviors grounded in social interactions with users or other agents with the aim of becoming believable proxies of our behavior in hypothetical simulations of individuals and communities . Historically, these agents were developed in the context of intelligent game non-player characters (NPCs) [59, 85]. Creating NPCs with believable behavior, if possible, could enhance player experiences in games and interactive fictions by enabling emergent narratives and social interactions with the agents [109]. However, more importantly, game worlds provide increasingly realistic representations of real-world affordances, and as observed by Laird and van Lent in 2001, these simulated worlds offer accessible testbeds for developers of believable agents to finesse the agents' cognitive capabilities without worrying about implementing robotics in the real world or creating simulation environments from scratch . 先前的文献将可信度或可信代理描述为设计和工程的核心目标。可信代理的设计目的是提供一种生命的幻觉,并以类似于迪斯尼电影中的角色那样,根据自己的意愿做出决定和采取行动的方式呈现出一种真实感[10, 96]。这些代理可以填充和感知类似于我们居住的开放世界环境 [10, 59],并努力表现出基于与用户或其他代理的社会互动的新兴行为,目的是在个人和社区的假设模拟 中成为我们行为的可信代理。从历史上看,这些代理是在智能游戏非玩家角色(NPC)的背景下开发的 [59, 85]。如果可能的话,创建具有可信行为的 NPC,可以通过促成叙事 和与代理的社会互动,增强玩家在游戏和互动小说中的体验 [109]。然而,更重要的是,游戏世界提供了越来越逼真的现实世界能力,正如 Laird 和 van Lent 在 2001 年所观察到的那样,这些模拟世界为可信代理的开发者提供了可利用的试验平台,使他们能够完善代理的认知能力,而不必担心在现实世界中实施机器人技术或从头开始创建模拟环境 。
A diverse set of approaches to creating believable agents emerged over the past four decades. In implementation, however, these approaches often simplified the environment or dimensions of agent 在过去的四十年里,出现了一系列不同的方法来创建可信的代理。然而,在实施过程中,这些方法往往简化了代理的环境或维度。
behavior to make the effort more manageable [17, 73]. Rule-based approaches, such as finite-state machines [91, 97] and behavior trees [41, 54, 82] account for the brute force approach of humanauthoring the agent's behavior [71]. They provide a straightforward way of creating simple agents that is still the most dominant approach today , and can even handle rudimentary social interactions, as shown in games such as Mass Effect [13] and The Sims [7] series. Nonetheless, manually crafting behavior that can comprehensively address the breadth of possible interactions in an open world is untenable. This means that the resulting agent behaviors may not fully represent the consequences of their interactions [70-72], and cannot perform new procedures that were not hard-coded in their script . On the other hand, prevalent learning-based approaches for creating believable agents, such as reinforcement learning, have overcome the challenge of manual authoring by letting the agents learn their behavior, and have achieved superhuman performance in recent years in games such as AlphaStar for Starcraft [99] and OpenAI Five for Dota 2 [11]. However, their success has largely taken place in adversarial games with readily definable rewards that a learning algorithm can optimize for. They have not yet addressed the challenge of creating believable agents in an open world [40, 74, 91]. 行为,使工作更易于管理[17, 73]。基于规则的方法(如有限状态机 [91, 97] 和行为树 [41, 54, 82])采用了人工创作代理行为的蛮力方法 [71]。它们提供了一种创建简单代理的直接方法,这仍然是当今最主流的方法 ,甚至可以处理基本的社会互动,如《质量效应》[13] 和《模拟人生》[7] 系列游戏所示。然而,在开放世界中手动制作能够全面应对各种可能交互的行为是站不住脚的。这意味着生成的代理行为可能无法完全代表其交互的后果[70-72],也无法执行脚本 中未硬编码的新程序。另一方面,目前流行的基于学习的创建可信代理的方法(如强化学习)通过让代理学习自己的行为,克服了手动编写的挑战,近年来在《星际争霸》的 AlphaStar [99] 和《Dota 2》的 OpenAI Five [11] 等游戏中取得了超人的表现。不过,他们的成功主要是在对抗性游戏中取得的,这些游戏的奖励很容易确定,学习算法可以对其进行优化。他们尚未解决在开放世界中创建可信代理的难题[40, 74, 91]。
Cognitive architectures in computation, pioneered by Newell, aimed to build the infrastructure for supporting a comprehensive set of cognitive functions [76] that suited the all-encompassing nature of believable agents held in its original vision. They fueled some of the earliest examples of believable agents. For instance, Quakebot-SOAR [60] and ICARUS [25, 64] generated NPCs in firstperson shooter games, while TacAir-SOAR [81] generated pilots in aerial combat training simulations. The architectures used by these agents differed (Quakebot- and TacAir-SOAR relied on SOAR [61], while ICARUS relied on its own variation that was inspired by SOAR and ACT-R [6]), but they shared the same underlying principle [62]. They maintained short-term and long-term memories, filled these memories with symbolic structures, and operated in perceive-plan-act cycles, dynamically perceiving the environment and matching it with one of the manually crafted action procedures [58, 97]. Agents created using cognitive architectures aimed to be generalizable to most, if not all, open world contexts and exhibited robust behavior for their time. However, their space of action was limited to manually crafted procedural knowledge, and they did not offer a mechanism through which the agents could be inspired to seek new behavior. As such, these agents were deployed mostly in non-open world contexts such as first-person shooter games [25, 60] or blocks worlds [64]. 由纽厄尔首创的计算认知架构,旨在为支持一整套认知功能建立基础设施[76],以适应其最初设想中可信代理的全面性。它们为一些最早的可信代理实例提供了动力。例如,Quakebot-SOAR [60] 和 ICARUS [25, 64] 生成了第一人称射击游戏中的 NPC,而 TacAir-SOAR [81] 生成了空战训练模拟中的飞行员。这些代理所使用的架构各不相同(Quakebot 和 TacAir-SOAR 依赖于 SOAR [61],而 ICARUS 则依赖于受 SOAR 和 ACT-R [6]启发的自身变体),但它们具有相同的基本原理 [62]。它们保持短期和长期记忆,用符号结构填充这些记忆,并以感知-计划-行动的循环方式运行,动态地感知环境并将其与人工制作的行动程序之一相匹配[58, 97]。使用认知架构创建的代理旨在适用于大多数(如果不是所有)开放世界环境,并在当时表现出强大的行为能力。然而,它们的行动空间仅限于人工制作的程序知识,而且它们没有提供一种机制来激发代理寻求新的行为。因此,这些代理大多部署在非开放世界环境中,如第一人称射击游戏[25, 60]或积木世界[64]。
Today, creating believable agents as described in its original definition remains an open problem [85, 108]. Many have moved on, arguing that although current approaches for creating believable agents might be cumbersome and limited, they are good enough to support existing gameplay and interactions [24, 75, 108]. Our argument is that large language models offer an opportunity to re-examine these questions, provided that we can craft an effective architecture to synthesize memories into believable behavior. We offer a step toward such an architecture in this paper. 如今,按照最初的定义创建可信代理仍是一个未决问题[85, 108]。许多人认为,尽管目前创建可信代理的方法可能比较繁琐和有限,但它们足以支持现有的游戏和交互[24, 75, 108]。我们的观点是,大型语言模型为我们提供了一个重新审视这些问题的机会,前提是我们能设计出一种有效的架构,将记忆合成为可信的行为。在本文中,我们将向这样的架构迈出一步。
2.3 Large Language Models and Human Behavior 2.3 大型语言模型与人类行为
Generative agents leverage a large language model to power their behavior. The key observation is that large language models encode a wide range of human behavior from their training data [15, 18]. If prompted with a narrowly defined context, the models can be used to generate believable behavior. Recent work has demonstrated the efficacy of this approach. For instance, social simulacra used a large language model to generate users that would populate new social computing systems to prototype their emergent social dynamics [80]. This approach used a prompt chain to generate short natural language descriptions of personas and their behaviors as they appear in the system being prototyped. Other empirical studies have replicated existing social science studies [46], political surveys [92], and generated synthetic data [39]. Large language models have also been used to generate interactive human behavior for users to engage with. In gaming, for instance, these models have been employed to create interactive fiction [37] and text adventure games [21]. With their ability to generate and decompose action sequences, large language models have also been used in planning robotics tasks [48]. For example, when presented with a task, such as picking up a bottle, the model is prompted to break down the task into smaller action sequences, such as heading to the table where the bottle is located and picking it up. 生成式代理利用大型语言模型为其行为提供动力。主要的观察结果是,大型语言模型从其训练数据中编码了广泛的人类行为[15, 18]。如果在狭义的语境下进行提示,这些模型就可以用来生成可信的行为。最近的研究已经证明了这种方法的有效性。例如,社交模拟使用一个大型语言模型来生成用户,这些用户将填充到新的社交计算系统中,从而形成新出现的社交动态原型[80]。这种方法使用提示链 来生成简短的角色自然语言描述及其在原型系统中的行为。其他实证研究还复制了现有的社会科学研究[46]、政治调查[92],并生成了合成数据[39]。大型语言模型也被用于生成供用户参与的交互式人类行为。例如,在游戏中,这些模型被用来创建互动小说 [37] 和文字冒险游戏 [21]。大型语言模型具有生成和分解动作序列的能力,因此也被用于规划机器人任务[48]。例如,当遇到捡起瓶子等任务时,模型会被提示将任务分解成更小的动作序列,如走向瓶子所在的桌子并将其捡起。
We posit that, based on the work summarized above, large language models can become a key ingredient for creating believable agents. The existing literature largely relies on what could be considered first-order templates that employ few-shot prompts or chain-of-thought prompts [100]. These templates are effective in generating behavior that is conditioned solely on the agent's current environment (e.g., how would a troll respond to a given post, what actions would a robot need to take to enter a room given that there is a door). However, believable agents require conditioning not only on their current environment but also on a vast amount of past experience, which is a poor fit (and as of today, impossible due to the underlying models' limited context window) using first-order prompting. Recent studies have attempted to go beyond first-order prompting by augmenting language models with a static knowledge base and an information retrieval scheme [53] or with a simple summarization scheme [104]. This paper extends these ideas to craft an agent architecture that handles retrieval where past experience is dynamically updated at each time step and mixed with agents' current context and plans, which may either reinforce or contradict each other. 我们认为,基于上文总结的工作,大型语言模型可以成为创建可信代理的关键要素。现有文献在很大程度上依赖于可被视为一阶的模板,这些模板采用的是少量提示 或思维链提示[100]。这些模板在生成仅以代理当前环境为条件的行为时非常有效(例如,巨魔会如何回应给定的帖子,机器人需要采取什么行动才能进入房间,因为房间里有一扇门)。然而,可信的代理不仅需要以当前环境为条件,还需要以大量的过往经验为条件,这与一阶提示的契合度很低(由于底层模型的语境窗口有限,目前还无法实现)。最近的一些研究试图超越一阶提示,通过静态知识库和信息检索方案[53]或简单的总结方案[104]来增强语言模型。本文对这些想法进行了扩展,设计了一种代理架构来处理检索问题,在这种架构中,过去的经验会在每个时间步骤中动态更新,并与代理当前的上下文和计划混合在一起,这些经验可能会相互促进,也可能相互矛盾。
3 GENERATIVE AGENT BEHAVIOR AND INTERACTION 3 生成代理行为与交互
To illustrate the affordances of generative agents, we instantiate them as characters in a simple sandbox world reminiscent of The Sims [7]. This sprite-based sandbox game world, Smallville, evokes a small town environment. In this section, we will walk through the affordances and interactions with generative agents in Smallville and describe how the agents behave within it. Then, in Section 4, we will introduce our generative agent architecture that powers these affordances and interactions. In Section 5, we will describe the 为了说明生成式代理的能力,我们将它们实例化为一个简单沙盒世界中的角色,让人想起《模拟人生》[7]。这个基于精灵的沙盒游戏世界名为 "小镇"(Smallville),让人联想到一个小镇环境。在本节中,我们将介绍《小镇》中生成式代理的功能和交互方式,并描述代理在其中的行为方式。然后,在第 4 节中,我们将介绍我们的生成式代理架构,该架构为这些能力和互动提供了支持。在第 5 节中,我们将介绍
Figure 2: The Smallville sandbox world, with areas labeled. The root node describes the entire world, children describe areas (e.g., houses, cafe, stores), and leaf nodes describe objects (e.g., table, bookshelf). Agents remember a subgraph that reflects the parts of the world they have seen, maintaining the state of those parts as they observed them. 图 2:标有区域的小镇沙盒世界。根节点描述整个世界,子节点描述区域(如房屋、咖啡馆、商店),叶节点描述物体(如桌子、书架)。代理记住一个子图,该子图反映了他们所看到的世界的各个部分,并保持着他们所观察到的这些部分的状态。
implementation of the sandbox environment and how the agents interact with the underlying engine of the sandbox world. 沙盒环境的实现,以及代理如何与沙盒世界的底层引擎互动。
3.1 Agent Avatar and Communication 3.1 代理人头像和通信
A community of 25 unique agents inhabits Smallville. Each agent is represented by a simple sprite avatar. We authored one paragraph of natural language description to depict each agent's identity, including their occupation and relationship with other agents, as seed memories. For example, John Lin has the following description 小镇上居住着 25 个独特的特工。每个代理都有一个简单的精灵头像。我们编写了一段自然语言描述来描述每个代理的身份,包括他们的职业和与其他代理的关系,作为种子记忆。例如,约翰-林的描述如下
John Lin is a pharmacy shopkeeper at the Willow Market and Pharmacy who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers; John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory; John Lin loves his family very much; John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years; John Lin thinks Sam Moore is a kind and nice man; John Lin knows his neighbor, Yuriko Yamamoto, well; John Lin knows of his neighbors, Tamara Taylor and Carmen Ortiz, but has not met them before; John Lin and Tom Moreno are colleagues at The Willows Market and Pharmacy; John Lin and Tom Moreno are friends and like to discuss local politics together; John Lin knows the Moreno family somewhat well - the husband Tom Moreno and the wife Jane Moreno. 林俊杰(John Lin)是柳树市场药店的一名药剂师,他热爱帮助他人。林俊杰与身为大学教授的妻子林美和学习音乐理论的儿子林艾迪住在一起,林俊杰非常爱他的家人;林俊杰认识隔壁的老夫妇山姆-摩尔和詹妮弗-摩尔已经有几年了;林俊杰认为山姆-摩尔是个善良的好人;约翰-林和邻居山本百合子很熟;约翰-林知道邻居塔玛拉-泰勒(Tamara Taylor)和卡门-奥尔蒂斯(Carmen Ortiz),但没有见过面;约翰-林和汤姆-莫雷诺(Tom Moreno)是柳树市场和药房的同事;约翰-林和汤姆-莫雷诺是朋友,喜欢一起讨论当地的政治;约翰-林对莫雷诺一家--丈夫汤姆-莫雷诺(Tom Moreno)和妻子简-莫雷诺(Jane Moreno)--有些了解。
Each semicolon-delimited phrase is entered into the agent's initial memory as memories at the start of the simulation. 模拟开始时,每个以分号分隔的短语都会作为记忆输入代理的初始记忆中。
3.1.1 Inter-Agent Communication. The agents interact with the world by their actions, and with each other through natural language. At each time step of the sandbox engine, the agents output a natural language statement describing their current action, such as "Isabella Rodriguez is writing in her journal", "Isabella Rodriguez is checking her emails", "Isabella Rodriguez is talking with her family on the phone", or "Isabella Rodriguez is getting ready for bed." This statement is then translated into concrete movements that affect the sandbox world. The action is displayed on the sandbox interface as a set of emojis, providing an abstract representation of the action from an overhead view. To achieve this, the system utilizes a language model to translate the action into a set of emojis, which appear above each avatar's head in a speech bubble. For example, "Isabella Rodriguez is writing in her journal" is displayed as , while "Isabella Rodriguez is checking her emails" appears as . The complete natural language description of the action can be accessed by clicking on the agent's avatar. 3.1.1 代理之间的交流。代理通过行动与世界互动,并通过自然语言相互交流。在沙盒引擎的每个时间步骤中,代理都会输出描述其当前行动的自然语言语句,例如 "伊莎贝拉-罗德里格斯正在写日记"、"伊莎贝拉-罗德里格斯正在查看电子邮件"、"伊莎贝拉-罗德里格斯正在与家人通电话 "或 "伊莎贝拉-罗德里格斯正在准备睡觉"。然后,这句话会转化为影响沙盘世界的具体动作。这些动作会以一组表情符号的形式显示在沙盘界面上,从俯视图的角度对动作进行抽象呈现。为此,系统利用语言模型将动作翻译成一组表情符号,这些表情符号会以语音气泡的形式出现在每个化身的头部上方。例如,"伊莎贝拉-罗德里格斯正在写日记 "显示为 ,而 "伊莎贝拉-罗德里格斯正在检查她的电子邮件 "则显示为 。点击代理的头像,就可以获取完整的自然语言动作描述。
Agents communicate with each other in full natural language. They are aware of other agents in their local area, and the generative agent architecture determines whether they walk by or engage in conversation. Here, a sample in the middle of a conversation between the agents Isabella Rodriguez and Tom Moreno about the upcoming election: 代理之间用完全自然的语言进行交流。它们知道自己所在区域的其他代理,生成式代理架构决定它们是否会走过或参与对话。下面是伊莎贝拉-罗德里格斯(Isabella Rodriguez)和汤姆-莫雷诺(Tom Moreno)两个代理就即将到来的选举进行对话时的一个样本:
Isabella: I'm still weighing my options, but I've been 伊莎贝拉我还在权衡我的选择,但我已经
discussing the election with Sam Moore. What are 与萨姆-摩尔讨论选举。什么是
your thoughts on him? 你对他有什么看法?
Tom: To be honest, I don't like Sam Moore. I think 汤姆:说实话,我不喜欢萨姆-摩尔。我认为
he's out of touch with the community and doesn't 他与社区脱节,不
have our best interests at heart. 以我们的最大利益为重。
Isabella: I'm still weighing my options, but I've been discussing the election with Sam Moore. What are your thoughts on him? 伊莎贝拉但我一直在和萨姆-摩尔讨论选举的事你对他有什么看法?
Tom: To be honest, I don't like Sam Moore. I think he's out of touch with the community and doesn't have our best interests at heart. 汤姆:说实话,我不喜欢萨姆-摩尔。我认为他与社区脱节,没有把我们的利益放在心上。
3.1.2 User Controls. The user communicates with the agent through natural language by specifying a persona that the agent should perceive them as. For example, if the user specifies that they are a news "reporter" and asks about the upcoming election by saying, "Who is running for office?", the John agent replies: 3.1.2 用户控制。用户通过自然语言与代理进行交流,指定代理应将其视为的角色。例如,如果用户指定自己是一名新闻 "记者",并询问即将举行的大选,说 "谁在竞选公职?",约翰代理就会回答:
John: My friends Yuriko, Tom and I have been talking about the upcoming election and discussing the candidate Sam Moore. We have all agreed to vote for him because we like his platform. 约翰:我的朋友百合子、汤姆和我一直在谈论即将到来的选举,并讨论候选人萨姆-摩尔。我们都同意投他一票,因为我们喜欢他的政纲。
To directly command one of the agents, the user takes on the persona of the agent's "inner voice"-this makes the agent more likely to treat the statement as a directive. For instance, when told "You are going to run against Sam in the upcoming election" by a user as John's inner voice, John decides to run in the election and shares his candidacy with his wife and son. 要直接命令某个代理,用户需要扮演代理的 "内心声音"--这使得代理更有可能将用户的陈述视为指令。例如,当用户以约翰的内心声音告诉他 "你将在即将到来的选举中与山姆竞选 "时,约翰决定参加竞选,并与妻子和儿子分享了他的候选资格。
3.2 Environmental Interaction 3.2 环境互动
Smallville features the common affordances of a small village, including a cafe, bar, park, school, dorm, houses, and stores. It also defines subareas and objects that make those spaces functional, such as a kitchen in a house and a stove in the kitchen (Figure 2). All spaces serving as agents' primary living quarters feature a bed, desk, closet, shelf, as well as a bathroom and a kitchen. 小镇 "具有一个小村庄的常见功能,包括咖啡馆、酒吧、公园、学校、宿舍、房屋和商店。它还定义了使这些空间具有功能性的子区域和物品,如房屋中的厨房和厨房中的炉灶(图 2)。所有作为代理人主要生活区的空间都配有床、书桌、衣柜、书架以及浴室和厨房。
Agents move around Smallville as one would in a simple video game, entering and leaving buildings, navigating its map, and approaching other agents. Agent movements are directed by the generative agent architecture and the sandbox game engine: when the model dictates that the agent will move to a location, we calculate a walking path to the destination in the Smallville environment, and the agent begins moving. In addition, users can also enter the sandbox world of Smallville as an agent operating within it. The agent that the user embodies can be an agent already present in the world, such as Isabella and John, or it can be an outside visitor with no prior history in Smallville. The inhabitants of Smallville will treat the user-controlled agent no differently than they treat each other. They recognize its presence, initiate interactions, and remember its behavior before forming opinions about it. 代理在小镇上移动,就像在一个简单的视频游戏中一样,进出建筑物,浏览地图,并接近其他代理。代理的移动是由生成代理架构和沙盒游戏引擎引导的:当模型要求代理移动到某个地点时,我们就会计算出一条在小镇环境中通往目的地的行走路线,然后代理就开始移动。此外,用户还可以作为一个代理进入小镇的沙盒世界。用户所化身的代理可以是已经存在于这个世界中的代理,如伊莎贝拉和约翰,也可以是一个在小镇中没有任何历史的外部访问者。小镇居民对待用户控制的代理与对待其他人并无不同。他们会识别它的存在,发起互动,并记住它的行为,然后再形成对它的看法。
Users and agents can influence the state of the objects in this world, much like in sandbox games such as The Sims. For example, a bed can be occupied when an agent is sleeping, and a refrigerator can be empty when an agent uses up the ingredients to make breakfast. End users can also reshape an agent's environment in Smallville by rewriting the status of objects surrounding the agent in natural language. For instance, when Isabella is making breakfast in the morning, the user can change the status of the kitchen stove from "turned on" to "burning" by inputting a command to the system that chooses the object and illustrates its new status, like this: " Isabella’s apartment: kitchen: stove> is burning." Isabella will notice this in the next moment and go to turn off the stove and remake her breakfast. Likewise, if the user sets the status of Isabella's shower to "leaking water" when she enters the bathroom, she will gather tools from her living room and try to fix the leak. 用户和代理可以影响这个世界中物体的状态,就像《模拟人生》等沙盒游戏一样。例如,当代理在睡觉时,床可以被占用;当代理用完做早餐的材料时,冰箱可以是空的。在《小镇》中,终端用户还可以通过用自然语言重写代理周围物体的状态来重塑代理的环境。例如,当伊莎贝拉在早上做早餐时,用户可以通过向系统输入一条命令,将厨房炉灶的状态从 "打开 "改为 "燃烧",系统会选择对象并说明其新状态,就像这样:" 伊莎贝拉的公寓:厨房:炉灶> 正在燃烧"。伊莎贝拉下一秒就会注意到这一点,然后去关掉炉子,重新做早餐。同样,如果用户在进入浴室时将伊莎贝拉的淋浴状态设置为 "漏水",她就会从起居室收集工具并尝试修复漏水。
3.3 Example "Day in the Life" 3.3 "生活中的一天 "示例
Starting from the single-paragraph description, generative agents begin planning their days. As time passes in the sandbox world, their behaviors evolve as these agents interact with each other and the world, building memories and relationships, and coordinating joint activities. 从单段描述开始,生成代理开始规划自己的日子。随着时间在沙盒世界中流逝,它们的行为也在不断演变,因为这些代理会彼此互动,与世界互动,建立记忆和关系,并协调共同的活动。
We demonstrate the behavior of generative agents by tracing the output of our system over the course of one day for the agent John Lin (Figure 3). In the Lin family, John is the first to wake up at 7 a.m. He brushes his teeth, takes a shower, gets dressed, eats breakfast, and checks the news at the dining table in the living room. At 8 a.m., Eddy follows suit, rushing out of bed to prepare for his classes. He catches up with John just as he is about to head out: 我们通过追踪代理约翰-林一天的系统输出来演示生成代理的行为(图 3)。在林家,约翰早上 7 点第一个起床,刷牙、洗澡、穿衣、吃早餐,然后在客厅的餐桌上查看新闻。早上 8 点,艾迪也跟着起床准备上课。就在约翰准备出门时,他追上了他:
John: Good morning Eddy. Did you sleep well?
Eddy: Good morning dad. Yeah, I slept great.
John: That's good. What are you working on today?
Eddy: I'm working on a new music composition for
my class. It's due this week, so I'm trying to get it
finished. But I'm having so much fun with it!
John: That sounds great!
Soon after Eddy heads out, Mei wakes up and joins John. Mei asks about her son, and John recalls the conversation they just had: Eddy 出门后不久,Mei 醒来并加入了 John 的行列。梅问起儿子的情况,约翰回忆起他们刚才的对话:
Mei: Did Eddy already leave for school?
John: Yes, he just left. He's working on a music
composition for his class.
Mei: Oh, that's wonderful! Tell me about it.
John: I think he's really enjoying it! He said he's
having a lot of fun with it.
Mei: That's great! I'm so proud of him.
As they finish their conversation, Mei and John pack up. Mei goes on to teach and work on her research paper, while John opens his pharmacy counter at the Willow Market and Pharmacy by 9 a.m. 谈话结束后,梅和约翰收拾行李。梅继续教书和撰写研究论文,而约翰则在上午 9 点前在柳树市场和药房开设了他的药房柜台。
3.4 Emergent Social Behaviors 3.4 新出现的社会行为
By interacting with each other, generative agents in Smallville exchange information, form new relationships, and coordinate joint activities. Extending prior work [80], these social behaviors are emergent rather than pre-programmed. 通过彼此互动,《小镇》中的生成代理可以交换信息、形成新的关系并协调联合行动。与之前的工作[80]相比,这些社会行为是突发的,而不是预先编程的。
3.4.1 Information Diffusion. As agents notice each other, they may engage in dialogue-as they do so, information can spread from agent to agent. For instance, in a conversation between Sam and Tom at the grocery store, Sam tells Tom about his candidacy in the local election: 3.4.1 信息扩散。当代理人注意到对方时,他们可能会进行对话,在对话过程中,信息会在代理人之间传播。例如,在萨姆和汤姆在杂货店的一次对话中,萨姆告诉汤姆他在当地选举中的候选资格:
Sam: Hey Tom, how's it going? 汤姆,你好吗?
Tom: Good, thanks. What's up? 很好,谢谢。什么事?
Sam: Well, I wanted to talk to you about something. 嗯,我想和你谈点事。
I'm actually running for mayor in the upcoming local election. 实际上,我正在即将举行的地方选举中竞选市长。
Tom: Really? That's great news! Why are you running? 汤姆:真的吗?那真是个好消息!你为什么要跑?
Sam: I've been involved in local politics for years now, and I really feel like I can make a difference in our community. Plus, I want to bring new ideas and perspectives to the table. Sam:我参与地方政治已有多年,我真的觉得我能为我们的社区带来改变。此外,我还想给大家带来新的想法和观点。
Figure 3: A morning in the life of a generative agent, John Lin. John wakes up around 6 am and completes his morning routine, which includes brushing his teeth, taking a shower, and eating breakfast. He briefly catches up with his wife, Mei, and son, Eddy, before heading out to begin his workday. 图 3:生成代理约翰-林(John Lin)的早晨生活。约翰早上 6 点左右起床,完成晨练,包括刷牙、洗澡和吃早餐。在出门开始一天的工作之前,他简单地与妻子梅和儿子艾迪聊了几句。
Figure 4: At the beginning of the simulation, one agent is initialized with an intent to organize a Valentine's Day party. Despite many possible points of failure in the ensuing chain of events-agents might not act on that intent, might forget to tell others, might not remember to show up-the Valentine's Day party does, in fact, occur, with a number of agents gathering and interacting. 图 4:模拟开始时,一个代理被初始化为组织情人节聚会的意图。尽管在随后发生的一系列事件中可能会出现许多故障点--代理可能不会按照这个意图行事,可能忘记告诉其他人,可能不记得自己会出现--但情人节聚会确实发生了,许多代理聚集在一起并进行了互动。
Later that day, after Sam left, Tom and John, who heard the news from another source, discuss Sam's chances of winning the election: 当天晚些时候,山姆离开后,汤姆和约翰从另一个渠道得知了这个消息,他们讨论了山姆赢得选举的机会:
Gradually, Sam's candidacy becomes the talk of the town, with some supporting him and others remaining undecided. 山姆的候选资格逐渐成为城中的热门话题,一些人支持他,另一些人则仍未做出决定。
3.4.2 Relationship Memory. Agents in Smallville form new relationships over time and remember their interactions with other agents. For example, at the start, Sam does not know Latoya Williams. While taking a walk in Johnson Park, Sam runs into Latoya, and they introduce themselves. Latoya mentions that she is working on a photography project: "I'm here to take some photos for a project I'm working on." In a later interaction, Sam's interactions with Latoya indicate a memory of that interaction, as he asks "Hi, Latoya. How is your project going?" and she replies "Hi, Sam. It's going well!" 3.4.2 关系记忆。小镇》中的特工会随着时间的推移建立新的关系,并记住他们与其他特工的互动。例如,一开始,萨姆并不认识拉托娅-威廉姆斯(Latoya Williams)。在约翰逊公园散步时,萨姆遇到了拉托娅,两人做了自我介绍。拉托娅提到她正在做一个摄影项目:"我是来为我正在进行的一个项目拍摄一些照片的。在后来的互动中,萨姆与拉托娅的互动显示出他对那次互动的记忆,他问道:"嗨,拉托娅。她回答说:"嗨,萨姆。进展顺利!"
3.4.3 Coordination. Generative agents coordinate with each other. Isabella Rodriguez, at Hobbs Cafe, is initialized with an intent to plan a Valentine's Day party from 5 to 7 p.m. on February 14th. From this seed, the agent proceeds to invite friends and customers when she sees them at Hobbs Cafe or elsewhere. Isabella then spends the afternoon of the 13th decorating the cafe for the occasion. Maria, a frequent customer and close friend of Isabella's, arrives at the cafe. Isabella asks for Maria's help in decorating for the party, and Maria agrees. Maria's character description mentions that she has a crush on Klaus. That night, Maria invites Klaus, her secret crush, to join her at the party, and he gladly accepts. 3.4.3 协调生成代理相互协调。霍布斯咖啡馆的伊莎贝拉-罗德里格斯(Isabella Rodriguez)的初始化意图是策划 2 月 14 日下午 5 点到 7 点的情人节派对。从这个种子开始,代理在霍布斯咖啡馆或其他地方看到朋友和顾客时,就会邀请他们。然后,伊莎贝拉花了 13 日下午的时间来装饰咖啡馆。伊莎贝拉的常客和好友玛丽亚来到咖啡馆。伊莎贝拉请求玛丽亚帮忙布置派对,玛丽亚同意了。玛丽亚的角色描述中提到她暗恋克劳斯。当晚,玛丽亚邀请她暗恋的克劳斯一起参加派对,克劳斯欣然应允。
On Valentine's Day, five agents, including Klaus and Maria, show up at Hobbs Cafe at 5 pm , and they enjoy the festivities (Figure 4). In this scenario, the end user only set Isabella's initial intent to throw a party and Maria's crush on Klaus: the social behaviors of spreading the word, decorating, asking each other out, arriving at the party, and interacting with each other at the party were initiated by the agent architecture. 情人节当天下午 5 点,包括克劳斯和玛丽亚在内的五个代理出现在霍布斯咖啡馆,他们尽情享受着节日的欢乐气氛(图 4)。在这个场景中,最终用户只设定了伊莎贝拉举办派对的初衷和玛丽亚对克劳斯的好感:而传播消息、装饰、互相约会、到达派对现场以及在派对上互相交流等社交行为则是由代理架构发起的。
Figure 5: Our generative agent architecture. Agents perceive their environment, and all perceptions are saved in a comprehensive record of the agent's experiences called the memory stream. Based on their perceptions, the architecture retrieves relevant memories and uses those retrieved actions to determine an action. These retrieved memories are also used to form longer-term plans and create higher-level reflections, both of which are entered into the memory stream for future use. 图 5:我们的生成式代理架构。代理会感知周围环境,所有感知都会被保存在一个名为 "记忆流 "的代理经验综合记录中。根据它们的感知,架构会检索相关的记忆,并利用这些检索到的行动来决定一项行动。这些检索到的记忆还可用于形成更长期的计划和创建更高层次的反思,而这两者都会被输入记忆流,以供将来使用。
4 GENERATIVE AGENT ARCHITECTURE 4 生成代理架构
Generative agents aim to provide a framework for behavior in an open world: one that can engage in interactions with other agents and react to changes in the environment. Generative agents take their current environment and past experiences as input and generate behavior as output. Underlying this behavior is a novel agent architecture that combines a large language model with mechanisms for synthesizing and retrieving relevant information to condition the language model's output. Without these mechanisms, large language models can output behavior, but the resulting agents may not react based on the agent's past experiences, may not make important inferences, and may not maintain long-term coherence Challenges with long-term planning and coherence remain [19] even with today's most performant models such as GPT-4. Because generative agents produce large streams of events and memories that must be retained, a core challenge of our architecture is to ensure that the most relevant pieces of the agent's memory are retrieved and synthesized when needed. 生成式代理旨在为开放世界中的行为提供一个框架:一个可以与其他代理互动并对环境变化做出反应的框架。生成式代理将当前环境和过去的经验作为输入,并生成行为作为输出。这种行为的基础是一种新颖的代理架构,它将大型语言模型与合成和检索相关信息的机制相结合,从而为语言模型的输出提供条件。如果没有这些机制,大型语言模型可以输出行为,但生成的代理可能不会根据代理过去的经验做出反应,可能不会做出重要的推断,也可能无法保持长期的一致性。 即使是当今性能最好的模型(如 GPT-4),长期规划和一致性方面的挑战依然存在[19]。由于生成式代理会产生大量必须保留的事件流和记忆,因此我们架构的核心挑战是确保在需要时检索和合成代理记忆中最相关的部分。
At the center of our architecture is the memory stream, a database that maintains a comprehensive record of an agent's experience. From the memory stream, records are retrieved as relevant to plan the agent's actions and react appropriately to the environment Records are recursively synthesized into higher- and higher-level reflections that guide behavior. Everything in the architecture is recorded and reasoned over as a natural language description, allowing the architecture to leverage a large language model. 记忆流是我们架构的核心,它是一个数据库,全面记录了代理的经验。从记忆流中检索相关记录,以规划代理的行动,并对环境做出适当反应 记录被递归合成为指导行为的更高层次反映。架构中的所有内容都以自然语言描述的形式进行记录和推理,从而使架构能够利用大型语言模型。
Our current implementation utilizes the gpt3.5-turbo version of ChatGPT [77]. We expect that the architectural basics of generative agents-memory, planning, and reflection-will likely remain the same as language models improve. Newer language models (e.g., GPT-4) will continue to expand the expressive power and performance of the prompts that underpin generative agents. As of writing, however, GPT-4's API was invitation-only, so our agents use ChatGPT. 我们目前使用的是 gpt3.5-turbo 版本的 ChatGPT [77]。我们预计,随着语言模型的改进,生成式代理的基本架构--内存、规划和反射--可能会保持不变。较新的语言模型(如 GPT-4)将继续扩展生成式代理的表达能力和提示性能。不过,截至本文撰写之时,GPT-4 的应用程序接口只接受邀请,因此我们的代理使用 ChatGPT。
4.1 Memory and Retrieval 4.1 记忆与检索
Challenge: Creating generative agents that can simulate human behavior requires reasoning about a set of experiences that is far larger than what should be described in a prompt, as the full memory stream can distract the model and does not even currently fit into the limited context window. Consider the Isabella agent answering the question, "What are you passionate about these days?" Summarizing all of Isabella's experiences to fit in the limited context window of the language model produces an uninformative response, where Isabella discusses topics such as collaborations for events and projects and cleanliness and organization in a cafe. Instead of summarizing, the memory stream described below surfaces relevant memories, resulting in a more informative and specific response that mentions Isabella's passion for making people feel welcome and included, planning events and creating an atmosphere that people can enjoy, such as the Valentine's Day party. 挑战:创建能够模拟人类行为的生成式代理需要对一组经验进行推理,而这组经验远比提示中描述的要多,因为完整的记忆流会分散模型的注意力,目前甚至无法在有限的上下文窗口中找到。考虑一下伊莎贝拉(Isabella)代理在回答 "你最近热衷于什么 "这个问题时的情况。为了适应语言模型有限的上下文窗口,伊莎贝拉总结了自己所有的经历,但得到的回答信息量不大,伊莎贝拉讨论的话题包括活动和项目的合作,以及咖啡馆的清洁和组织。下面描述的记忆流并没有进行总结,而是浮现了相关的记忆,从而产生了信息量更大、更具体的回答,其中提到了伊莎贝拉热衷于让人们感到受欢迎和融入、策划活动和营造人们可以享受的氛围,例如情人节派对。
Approach: The memory stream maintains a comprehensive record of the agent's experience. It is a list of memory objects, where each object contains a natural language description, a creation timestamp, and a most recent access timestamp. The most basic element of the memory stream is an observation, which is an event directly perceived by an agent. Common observations include behaviors performed by the agent themselves or behaviors that agents perceive being performed by other agents or non-agent objects. For instance, Isabella Rodriguez, who works at a coffee shop, might accrue the following observations over time: (1) Isabella Rodriguez is setting out the pastries, (2) Maria Lopez is studying for a Chemistry test while drinking coffee, (3) Isabella Rodriguez and Maria Lopez are conversing about planning a Valentine's day party at Hobbs Cafe, (4) The refrigerator is empty. 方法:记忆流全面记录了代理的经历。它是一个记忆对象列表,每个对象都包含自然语言描述、创建时间戳和最近访问时间戳。记忆流中最基本的元素是观察结果,它是由代理直接感知到的事件。常见的观察结果包括由代理自己执行的行为或代理感知到的由其他代理或非代理对象执行的行为。例如,在咖啡店工作的伊莎贝拉-罗德里格斯(Isabella Rodriguez)可能会随着时间的推移积累以下观察结果:(1) 伊莎贝拉-罗德里格斯正在摆放糕点;(2) 玛丽亚-洛佩兹(Maria Lopez)正在一边喝咖啡一边准备化学考试;(3) 伊莎贝拉-罗德里格斯和玛丽亚-洛佩兹(Maria Lopez)正在谈论在霍布斯咖啡馆举办情人节派对的计划;(4) 冰箱是空的。
Our architecture implements a retrieval function that takes the agent's current situation as input and returns a subset of the memory stream to pass on to the language model. There are many possible implementations of a retrieval function, depending on what is important for the agent to consider when deciding how to act. 我们的架构实现了一种检索功能,它将代理的当前情况作为输入,并返回记忆流的一个子集,将其传递给语言模型。检索函数有多种可能的实现方式,这取决于代理在决定如何行动时需要考虑哪些重要因素。
Q. What are you looking forward to the most right now? Q.你现在最期待什么?
Figure 6: The memory stream comprises a large number of observations that are relevant and irrelevant to the agent's current situation. Retrieval identifies a subset of these observations that should be passed to the language model to condition its response to the situation. 图 6:记忆流包含大量与代理当前情况相关或不相关的观察结果。检索可识别出这些观察结果的子集,并将其传递给语言模型,作为其应对当前情况的条件。
In our context, we focus on three main components that, together, produce effective results 就我们而言,我们重点关注三个主要组成部分,它们共同产生有效的结果
Recency assigns a higher score to memory objects that were recently accessed, so that events from a moment ago or this morning are likely to remain in the agent's attentional sphere. In our implementation, we treat recency as an exponential decay function over the number of sandbox game hours since the memory was last retrieved. Our decay factor is 0.995 . 最近访问过的记忆对象得分较高,因此前一刻或今天上午发生的事件很可能会留在代理的注意力范围内。在我们的实施过程中,我们将重复性视为一个指数衰减函数,它与上次检索内存后的沙盒游戏小时数有关。我们的衰减系数为 0.995。
Importance distinguishes mundane from core memories by assigning a higher score to memory objects that the agent believes to be important. For instance, a mundane event, such as eating breakfast in one's room, would yield a low importance score, whereas a breakup with one's significant other would yield a high score. There are many possible implementations of an importance score we find that directly asking the language model to output an integer score is effective. The full prompt appears below: 重要程度通过给代理认为重要的记忆对象打高分,来区分平凡记忆和核心记忆。例如,在房间里吃早餐等平凡事件的重要性得分较低,而与另一半分手的事件的重要性得分较高。我们发现,直接要求语言模型输出一个整数分数是有效的。完整的提示如下:
On the scale of 1 to 10, where 1 is purely mundane
(e.g., brushing teeth, making bed) and 10 is
extremely poignant (e.g., a break up, college
acceptance), rate the likely poignancy of the
following piece of memory.
Memory: buying groceries at The Willows Market
and Pharmacy
Rating: <fill in>
This prompt returns an integer value of 2 for "cleaning up the room" and 8 for "asking your crush out on a date." The importance score is generated at the time the memory object is created. 该提示返回的整数值为:"打扫房间 "为 2,"约暗恋对象 "为 8。重要性分值在创建内存对象时生成。
Relevance assigns a higher score to memory objects that are related to the current situation. What is relevant depends on the answer to, "Relevant to what?", so we condition relevance on a query memory. If the query, for example, is that a student is discussing what to study for a chemistry test with a classmate, memory objects about their breakfast should have low relevance, whereas memory objects about the teacher and schoolwork should have high relevance. In our implementation, we use the language model to generate an embedding vector of the text description of each memory. Then, we calculate relevance as the cosine similarity between the memory's embedding vector and the query memory's embedding vector. 相关性为与当前情况相关的记忆对象赋予更高的分数。什么是相关性取决于 "与什么相关?"的答案,因此我们将查询记忆作为相关性的条件。例如,如果查询的内容是一名学生正在与同学讨论化学考试的复习内容,那么与他们的早餐有关的记忆对象的相关性应该较低,而与老师和学校作业有关的记忆对象的相关性应该较高。在我们的实施过程中,我们使用语言模型生成每个记忆的文本描述的嵌入向量。然后,我们用记忆的嵌入向量和查询记忆的嵌入向量之间的余弦相似度来计算相关性。
To calculate the final retrieval score, we normalize the recency, relevance, and importance scores to the range of using minmax scaling. The retrieval function scores all memories as a weighted combination of the three elements: score recency + importance relevance. In our implementation, all are set to 1 . The top-ranked memories that fit within the language model's context window are included in the prompt. 为了计算最终的检索得分,我们使用 minmax 缩放法将再现性、相关性和重要性得分归一到 的范围内。检索函数以三个元素的加权组合对所有记忆进行评分:得分 相关性 + 重要性 相关性。在我们的实现中,所有 都设为 1。符合语言模型上下文窗口的排名最靠前的记忆将包含在提示中。
4.2 Reflection 4.2 反思
Challenge: Generative agents, when equipped with only raw observational memory, struggle to generalize or make inferences. Consider a scenario in which Klaus Mueller is asked by the user: "If you had to choose one person of those you know to spend an hour with, who would it be?" With access to only observational memory, the agent simply chooses the person with whom Klaus has had the most frequent interactions: Wolfgang, his college dorm neighbor. Unfortunately, Wolfgang and Klaus only ever see each other in passing, and do not have deep interactions. A more desirable response requires that the agent generalize from memories of Klaus spending hours on a research project to generate a higherlevel reflection that Klaus is passionate about research, and likewise 挑战:生成式代理在仅具备原始观察记忆的情况下,很难进行归纳或推理。假设用户问克劳斯-穆勒:"如果让你从认识的人中选一个来共度一小时,你会选谁?"由于只能获得观察记忆,代理只需选择与克劳斯交往最频繁的人:沃尔夫冈,他的大学宿舍邻居。不幸的是,沃尔夫冈和克劳斯只是擦肩而过,并没有深入交往。更理想的回答要求代理从克劳斯花了几个小时做研究项目的记忆中归纳出更高层次的反映,即克劳斯对研究充满热情,同样地
Figure 7: A reflection tree for Klaus Mueller. The agent's observations of the world, represented in the leaf nodes, are recursively synthesized to derive Klaus's self-notion that he is highly dedicated to his research. 图 7:克劳斯-穆勒的反思树。代理对世界的观察(以叶节点为代表)经过递归合成,得出 Klaus 的自我概念,即他对研究的高度专注。
recognize Maria putting in effort into her own research (albeit in a different field), enabling a reflection that they share a common interest. With the approach below, when Klaus is asked who to spend time with, Klaus chooses Maria instead of Wolfgang. 玛丽亚为自己的研究(尽管是不同领域的研究)付出了努力,这反映出他们有着共同的兴趣。通过下面的方法,当克劳斯被问及与谁共度时光时,克劳斯选择了玛丽亚而不是沃尔夫冈。
Approach: We introduce a second type of memory, which we call a reflection. Reflections are higher-level, more abstract thoughts generated by the agent. Because they are a type of memory, they are included alongside other observations when retrieval occurs. Reflections are generated periodically; in our implementation, we generate reflections when the sum of the importance scores for the latest events perceived by the agents exceeds a threshold (150 in our implementation). In practice, our agents reflected roughly two or three times a day. 方法:我们引入了第二类记忆,称之为 "反思"。反思是由代理产生的更高层次、更抽象的想法。由于反思也是一种记忆,因此在检索时,反思会与其他观察结果一起出现。反思会定期生成;在我们的实施过程中,当代理感知到的最新事件的重要性得分之和超过临界值(我们的实施过程中为 150 分)时,我们就会生成反思。在实践中,我们的代理每天大约会进行两到三次反思。
The first step in reflection is for the agent to determine what to reflect on, by identifying questions that can be asked given the agent's recent experiences. We query the large language model with the 100 most recent records in the agent's memory stream (e.g., "Klaus Mueller is reading a book on gentrification", "Klaus Mueller is conversing with a librarian about his research project", "desk at the library is currently unoccupied") and prompt the language model, "Given only the information above, what are 3 most salient highlevel questions we can answer about the subjects in the statements?" The model's response generates candidate questions: for example, What topic is Klaus Mueller passionate about? and What is the relationship between Klaus Mueller and Maria Lopez? We use these generated questions as queries for retrieval, and gather relevant memories (including other reflections) for each question. Then we prompt the language model to extract insights and cite the particular records that served as evidence for the insights. The full prompt is as follows: 反思的第一步是让代理确定反思的内容,根据代理最近的经历确定可以提出的问题。我们用代理记忆流中最近的 100 条记录查询大型语言模型(例如,"克劳斯-穆勒正在阅读一本关于城市化的书"、"克劳斯-穆勒正在与图书管理员谈论他的研究项目"、"图书馆的桌子目前无人使用"),并提示语言模型:"仅凭上述信息,我们可以回答关于语句中的主题的 3 个最突出的高层次问题是什么?模型的回答会生成候选问题:例如,克劳斯-穆勒热衷于什么话题? 克劳斯-穆勒和玛丽亚-洛佩兹之间是什么关系?我们将这些生成的问题作为检索查询,并收集每个问题的相关记忆(包括其他思考)。然后,我们提示语言模型提取见解,并引用作为见解证据的特定记录。完整的提示如下
Statements about Klaus Mueller 关于克劳斯-穆勒的声明
Klaus Mueller is writing a research paper 克劳斯-穆勒正在撰写研究论文
Klaus Mueller enjoys reading a book 克劳斯-穆勒喜欢看书
on gentrification 关于绅士化
Klaus Mueller is conversing with Ayesha Khan about exercising [...] 克劳斯-穆勒(Klaus Mueller)正在与阿耶莎-汗(Ayesha Khan)讨论锻炼 [...]
What 5 high-level insights can you infer from the above statements? (example format: insight (because of )) 从上述陈述中,您可以推断出哪 5 个高层次见解?(示例格式:洞察力(因为 ))
This process generates statements such as Klaus Mueller is dedicated to his research on gentrification (because of ). We parse and store the statement as a reflection in the memory stream, including pointers to the memory objects that were cited. 这一过程会产生诸如克劳斯-穆勒(Klaus Mueller)致力于城市化研究(因为 )这样的语句。我们解析语句并将其作为反射存储在内存流中,包括指向被引用内存对象的指针。
Reflection explicitly allows the agents to reflect not only on their observations but also on other reflections: for example, the second statement about Klaus Mueller above is a reflection that Klaus previously had, not an observation from his environment. As a result, agents generate trees of reflections: the leaf nodes of the tree represent the base observations, and the non-leaf nodes represent thoughts that become more abstract and higher-level the higher up the tree they are. 反思明确允许代理不仅对自己的观察进行反思,也对其他反思进行反思:例如,上文关于克劳斯-穆勒的第二段陈述是克劳斯之前的反思,而不是来自他所处环境的观察。因此,代理会生成一棵反思树:树的叶节点代表基本观察结果,非叶节点代表思想,越往上越抽象,层次越高。
4.3 Planning and Reacting 4.3 计划和反应
Challenge: While a large language model can generate plausible behavior in response to situational information (e.g., [46, 80]), agents need to plan over a longer time horizon to ensure that their sequence of actions is coherent and believable. If we prompt a language model with Klaus's background, describe the time, and ask what action he ought to take at the given moment, Klaus would eat lunch at 12 pm , but then again at 12:30 pm and 1 pm , despite having already 挑战:虽然大型语言模型可以根据情境信息生成可信的行为(例如,[46, 80]),但代理需要在更长的时间跨度内进行规划,以确保其行动序列的连贯性和可信度。如果我们用克劳斯的背景来提示语言模型,描述时间,并询问他在给定时刻应该采取什么行动,那么克劳斯会在中午 12 点吃午饭,但在中午 12:30 和下午 1 点又会吃午饭,尽管他已经吃过午饭了。
eaten his lunch twice. Optimizing for believability in the moment sacrifices believability over time. To overcome this issue, planning is essential. With the approach described below, Klaus's afternoon plan is less gluttonous: he has lunch at Hobbs Cafe while reading at 12 pm , works on his research paper at the school library at 1 pm , and takes a break for a walk in the park at 3pm. 吃了两次午餐。优化当下的可信度会牺牲长期的可信度。要解决这个问题,计划是必不可少的。有了下面描述的方法,克劳斯下午的计划就不那么贪吃了:中午 12 点,他在霍布斯咖啡馆边吃饭边看书;下午 1 点,他在学校图书馆写研究报告;下午 3 点,他在公园散步休息。
Approach: Plans describe a future sequence of actions for the agent, and help keep the agent's behavior consistent over time. A plan includes a location, a starting time, and a duration. For instance, Klaus Mueller, who is dedicated in his research and has an impending deadline, may choose to spend his day working at his desk drafting his research paper. An entry in a plan might state for example: for 180 minutes from 9am, February 12th, 2023, at Oak Hill College Dorm: Klaus Mueller's room: desk, read and take notes for research paper. Like reflections, plans are stored in the memory stream and are included in the retrieval process. This allows the agent to consider observations, reflections, and plans all together when deciding how to behave. Agents may change their plans midstream if needed. 方法:计划描述了代理的未来行动序列,有助于使代理的行为在一段时间内保持一致。计划包括地点、开始时间和持续时间。例如,克劳斯-穆勒(Klaus Mueller)专注于他的研究,并即将迎来最后期限, 他可能会选择在办公桌前工作一天,起草他的研究论文。例如,计划中的条目可以这样写:2023 年 2 月 12 日上午 9 点起 180 分钟,橡树山学院宿舍:克劳斯-穆勒的房间:书桌,阅读并记录研究论文。与思考一样,计划也存储在记忆流中,并包含在检索过程中。这样,代理在决定如何行动时,就能将观察、思考和计划一并考虑在内。如有需要,代理可以在中途改变计划。
It would be unrealistic and uninteresting for an artist agent to plan on painting while sitting at a pharmacy counter for four hours without moving. A more desirable plan would involve the agent taking the necessary time to gather materials, mix paint, take breaks, and clean up during the four-hour period in their home studio. To create such plans, our approach starts top-down and then recursively generates more detail. The first step is to create a plan that outlines the day's agenda in broad strokes. To create the initial plan, we prompt the language model with the agent's summary description (e.g., name, traits, and a summary of their recent experiences) and a summary of their previous day. A full example prompt is below, which is unfinished at the bottom for the language model to complete: 如果一位艺术家经纪人计划在药店柜台前一动不动地坐四个小时来作画,那是不现实的,也是无趣的。更理想的计划应该是,在这四小时里,代理人花必要的时间收集材料、调配颜料、休息并在家里的画室进行清理。要创建这样的计划,我们的方法是自上而下开始,然后递归生成更多细节。第一步是制定一个计划,大致勾勒出一天的日程。为了创建初始计划,我们会向语言模型提示代理的简要描述(如姓名、特征、近期经历摘要)和前一天的摘要。下面是一个完整的提示示例,底部未完成,供语言模型完成:
Name: Eddy Lin (age: 19)
Innate traits: friendly, outgoing, hospitable
Eddy Lin is a student at Oak Hill College studying
music theory and composition. He loves to explore
different musical styles and is always looking for
ways to expand his knowledge. Eddy Lin is working
on a composition project for his college class. He
is taking classes to learn more about music theory
Eddy Lin is excited about the new composition he
is working on but he wants to dedicate more hours
in the day to work on it in the coming days
On Tuesday February 12, Eddy 1) woke up and
completed the morning routine at 7:00 am, [. . . ]
6) got ready to sleep around 10 pm.
Today is Wednesday February 13. Here is Eddy's
plan today in broad strokes: 1)
This generates a rough sketch of the agent's plan for a day, divided into five to eight chunks: "1) wake up and complete the morning routine at 8:00 am, 2) go to Oak Hill College to take classes starting 10:00 am, [...] 5) work on his new music composition from 1:00 pm to 5:00 pm, 6) have dinner at 5:30 pm, 7) finish school assignments and go to bed by 11:00 pm." 这就产生了经纪人一天计划的大致草图,分为五到八块:"1)早上 8:00 起床并完成晨练,2)从 10:00 开始去橡树山学院上课,[......]5)下午 1:00 至 5:00 创作新的音乐作品,6)下午 5:30 吃晚饭,7)完成学校作业并在 11:00 之前上床睡觉"。
The agent saves this plan in the memory stream and then recursively decomposes it to create finer-grained actions, first into hour-long chunks of actions-Eddy's plan to work on his new music composition from 1:00 pm to 5:00 pm becomes 1:00 pm: start by brainstorming some ideas for his music composition [...] 4:00 pm : take a quick break and recharge his creative energy before reviewing and polishing his composition. We then recursively decompose this again into minute chunks: e.g., : grab a light snack, such as a piece of fruit, a granola bar, or some nuts. 4:05 pm: take a short walk around his workspace [...] 4:50 pm: take a few minutes to clean up his workspace. This process can be adjusted to match the desired granularity. 代理将该计划保存在内存流中,然后递归分解该计划以创建更细粒度的操作,首先将其分解为长达一小时的操作块--Eddy 从下午 1:00 到 5:00 创作新音乐作品的计划变成了下午 1:00:开始为他的音乐作品集思广益[...] 下午 4:00:快速休息一下,为他的创作能量充电,然后再审查和润色他的作品。然后,我们再将其递归分解为 分钟的小块:例如, :吃点小零食,如水果、燕麦棒或坚果。下午 4:05:在工作区走一小段路[......] 下午 4:50:花几分钟时间清理工作区。这个过程可以根据所需的粒度进行调整。
4.3.1 Reacting and Updating Plans. Generative agents operate in an action loop where, at each time step, they perceive the world around them and those perceived observations are stored in their memory stream. We prompt the language model with these observations to decide whether the agent should continue with their existing plan, or react. Standing at an easel and painting, for example, might trigger an observation of the easel, but this is unlikely to prompt a reaction. However, if Eddy's father John records that he sees Eddy taking a short walk in the house garden, the outcome is different. The prompt is below, with [Agent's Summary Description] standing in for a dynamically-generated, paragraph-long summary of the agent's overall goals and disposition, which is described in Appendix A: 4.3.1 反应和更新计划。生成式代理在一个动作循环中运行,在每个时间步骤中,它们都会感知周围的世界,并将这些感知到的观察结果存储在它们的记忆流中。我们用这些观察结果提示语言模型,以决定代理是继续执行现有计划,还是做出反应。例如,站在画架前作画可能会引发对画架的观察,但这不太可能引发反应。但是,如果艾迪的父亲约翰记录下他看到艾迪在家里的花园里散步,结果就不一样了。提示如下,[代理人的摘要描述]代替了动态生成的代理人总体目标和处置的段落式摘要,详见附录 A:
[Agent's Summary Description] [代理摘要说明]
It is February 13, 2023, 4:56 pm. 现在是 2023 年 2 月 13 日下午 4 点 56 分。
John Lin's status: John is back home early from work. John Lin 的状态:John 提前下班回家了。
Observation: John saw Eddy taking a short walk around his workplace. 观察:约翰看到埃迪在工作场所附近走了一小段路。
Summary of relevant context from John's memory: Eddy Lin is John's Lin's son. Eddy Lin has been working on a music composition for his class. Eddy Lin likes to walk around the garden when he is thinking about or listening to music. 约翰记忆中的相关背景摘要:Eddy Lin 是 John Lin 的儿子。Eddy Lin 一直在为他的班级创作音乐作品。在思考或听音乐时,Eddy Lin 喜欢在花园里散步。
Should John react to the observation, and if so, what would be an appropriate reaction? 约翰是否应该对观察结果做出反应?
The context summary is generated through two prompts that retrieve memories via the queries "What is [observer]'s relationship with the [observed entity]?" and "[Observed entity] is [action status of the observed entity]", and their answers summarized together. The output suggests that John could consider asking Eddy about his music composition project. We then regenerate the agent's existing plan starting from the time when the reaction takes place. Finally, if the action indicates an interaction between agents, we generate their dialogue. 上下文摘要是通过两个提示生成的,这两个提示通过查询"[观察者]与[被观察实体]的关系是什么?"和"[被观察实体]是[被观察实体的行动状态]"来检索记忆,并将它们的答案汇总在一起。输出结果表明,约翰可以考虑向艾迪询问他的音乐创作项目。然后,我们从反应发生时开始重新生成代理的现有计划。最后,如果行动表明代理之间存在互动,我们将生成他们之间的对话。
4.3.2 Dialogue. Agents converse as they interact with each other. We generate agents' dialogue by conditioning their utterances on their memories about each other. For example, when John initiates his conversation with Eddy, we generate John's first utterance by using his summarized memory about Eddy and the intended reaction when he decided to ask Eddy about his composition project: 4.3.2 对话代理在互动过程中进行对话。我们根据代理对彼此的记忆来生成他们的对话。例如,当约翰(John)开始与艾迪(Eddy)对话时,我们利用他对艾迪(Eddy)的记忆摘要以及他决定向艾迪(Eddy)询问他的作文项目时的预期反应,生成约翰的第一句话:
[Agent's Summary Description] [代理摘要说明]
It is February 13, 2023, 4:56 pm. 现在是 2023 年 2 月 13 日下午 4 点 56 分。
John Lin's status: John is back home early from work. John Lin 的状态:John 提前下班回家了。
Observation: John saw Eddy taking a short walk around his workplace. 观察:约翰看到埃迪在工作场所附近走了一小段路。
Summary of relevant context from John's memory: Eddy Lin is John's Lin's son. Eddy Lin has been working on a music composition for his class. Eddy Lin likes to walk around the garden when he is thinking about or listening to music. 约翰记忆中的相关背景摘要:Eddy Lin 是 John Lin 的儿子。Eddy Lin 一直在为他的班级创作音乐作品。在思考或听音乐时,Eddy Lin 喜欢在花园里散步。
John is asking Eddy about his music composition project. What would he say to Eddy? 约翰正在向艾迪询问他的音乐创作计划。他会对 Eddy 说什么?
The result: "Hey Eddy, how's the music composition project for your class coming along?" From Eddy's perspective, John initiating the dialogue is seen as an event to which he may want to react So, just as John did, Eddy retrieves and summarizes his memory about his relationship with John, as well as his memory that may be related to John's last utterance in the dialogue. If he decides to respond, we generate Eddy's utterance using his summarized memory and the current dialogue history: 结果是"嘿,艾迪,你班上的音乐创作项目进展如何?"因此,正如约翰所做的那样,艾迪检索并总结了他与约翰之间关系的记忆,以及可能与约翰在对话中的最后一句话有关的记忆。如果艾迪决定做出回应,我们就会使用他总结的记忆和当前的对话历史记录生成艾迪的话语:
[Agent's Summary Description]
It is February 13, 2023, 4:56 pm.
Eddy Lin's status: Eddy is taking a short walk
around his workplace.
Observation: John is initiating a conversation
with Eddy.
Summary of relevant context from Eddy's memory:
John Lin is Eddy Lin's father. John Lin is caring
and is interested to learn more about Eddy Lin's
school work. John Lin knows that Eddy Lin is
working on a music composition.
Here is the dialogue history:
John: Hey Eddy, how's the music composition project
for your class coming along?
How would Eddy respond to John?
This generates Eddy's response: "Hey Dad, it's going well. I've been taking walks around the garden to clear my head and get some inspiration." The continuation of this dialogue is generated using the same mechanism until one of the two agents decides to end the dialogue. 这引起了艾迪的回应:"嘿,爸爸,一切都很顺利。我一直在花园里散步,清醒头脑,寻找灵感。这段对话将以同样的方式继续下去,直到两个代理中的一个决定结束对话为止。
5 SANDBOX ENVIRONMENT IMPLEMENTATION 5 沙箱环境的实施
The Smallville sandbox game environment is built using the Phaser web game development framework [57]. The visual environment sprites, including agent avatars, as well as an environment map and collision map that we authored, are imported into Phaser. 《小镇》沙盒游戏环境是使用 Phaser 网页游戏开发框架构建的[57]。视觉环境精灵(包括代理头像)以及我们制作的环境地图和碰撞地图都被导入 Phaser。
We supplement the sandbox development framework with a server that makes the sandbox information available to generative agents and enables generative agents to move and influence the sandbox environment. The server maintains a JSON data structure that contains information about each agent in the sandbox world, including their current location, a description of their current action, and the sandbox object they are interacting with. At each sandbox time step, the sandbox server parses the JSON for any changes coming from the generative agents, moves the agents to their new positions, and updates the status of any sandbox objects that the agents are interacting with (e.g., changing the status of the coffee machine from "idle" to "brewing coffee" if an agent's action is "making espresso for a customer @ Hobbs Cafe: counter: coffee machine"). The sandbox server is also responsible for sending all agents and objects that are within a preset visual range for each agent to that agent's memory, so the agent can react appropriately. The agent's output action then updates the JSON, and the process loops for the next time step. 我们为沙盒开发框架添加了一个服务器,该服务器可为生成代理提供沙盒信息,并使生成代理能够移动和影响沙盒环境。服务器维护一个 JSON 数据结构,其中包含沙盒世界中每个代理的信息,包括其当前位置、当前行动描述以及正在与之交互的沙盒对象。在每个沙盒时间步骤中,沙盒服务器都会解析 JSON 中来自生成代理的任何变化,将代理移动到新位置,并更新代理正在与之交互的任何沙盒对象的状态(例如,如果代理的操作是 "为顾客制作意式浓缩咖啡 @ 霍布斯咖啡馆:柜台:咖啡机",则将咖啡机的状态从 "闲置 "更改为 "正在煮咖啡")。沙盒服务器还负责将每个代理的预设可视范围内的所有代理和对象发送到该代理的内存中,以便该代理做出适当的反应。然后,代理的输出操作会更新 JSON,整个过程会循环到下一个时间步骤。
End users initialize a new agent with a brief natural language description, as in the paragraph about John Lin in Section 3.1. In our implementation, we split this semicolon-delimited list of characteristics up into a set of memories. These serve as the initial memories that determine the agent's behavior. These memories are initial starting points: as the agents gain more experience in the sandbox world, and as more records saturate the memory stream, the agent's summary and behavior will evolve. 终端用户通过简短的自然语言描述来初始化一个新的代理,如第 3.1 节中关于林俊杰的段落。在我们的实现过程中,我们将这个以分号分隔的特征列表拆分成一组记忆。这些记忆是决定代理行为的初始记忆。这些记忆是最初的起点:随着代理在沙盒世界中获得更多经验,随着更多记录在记忆流中达到饱和,代理的总结和行为也会随之发展。
5.1 From Structured World Environments to Natural Language, and Back Again 5.1 从结构化世界环境到自然语言,再到自然语言
The architecture of generative agents operates using natural language. Therefore, we need a mechanism to ground the agent's reasoning to the sandbox world. To achieve this, we represent the sandbox environment-areas and objects-as a tree data structure, with an edge in the tree indicating a containment relationship in the sandbox world. We convert this tree into natural language to pass to the generative agents. For instance, "stove" being a child of "kitchen" is rendered into "there is a stove in the kitchen." 生成式代理的架构使用自然语言运行。因此,我们需要一种机制,将代理的推理与沙盘世界联系起来。为此,我们将沙盘环境--区域和对象--表示为树形数据结构,树中的边表示沙盘世界中的包含关系。我们将这棵树转换成自然语言,传递给生成代理。例如,"炉子 "作为 "厨房 "的子代,会被转换成 "厨房里有一个炉子"。
Agents build individual tree representations of the environment as they navigate it - subgraphs of the overall sandbox environment tree. We initialize each agent with an environment tree capturing the spaces and objects that the agent should be aware of: the rooms and objects in their living quarters, their workplace, and commonly visited stores and shops. As the agents navigate the sandbox world, they update this tree to reflect newly perceived areas. Agents are not omniscient: their tree may get out of date as they leave an area, and is updated when they re-enter the area. 代理在浏览环境的过程中会建立单独的环境树表征--即整个沙盒环境树的子图。我们为每个代理初始化了一棵环境树,其中包含了代理应该感知的空间和物体:他们生活区的房间和物体、他们的工作场所以及经常光顾的商店和店铺。当代理在沙盒世界中导航时,他们会更新这棵树,以反映新感知到的区域。代理并不是无所不知的:当他们离开某个区域时,他们的树可能会过时,而当他们再次进入该区域时,他们的树又会更新。
To determine the appropriate location for each action, we traverse the agent's stored environment tree and flatten a portion of it into natural language to prompt the language model. Recursively starting at the root of the agent's environment tree, we prompt the model to find the most suitable area. For example, if Eddy's agent indicated that he should take a short walk around his workspace: 为了确定每个行动的适当位置,我们会遍历代理存储的环境树,并将其中一部分扁平化为自然语言,以提示语言模型。我们从代理环境树的根部开始递归,提示模型找到最合适的区域。例如,如果艾迪的代理表示他应该在工作区附近散散步:
[Agent's Summary Description] [代理摘要说明]
Eddy Lin is currently in The Lin family's house: Eddy Lin 目前在林家:
Eddy Lin's bedroom: desk) that has Mei and John Lin's Eddy Lin 的卧室:书桌),里面有 Mei 和 John Lin 的
bedroom, Eddy Lin's bedroom, common room, kitchen, bathroom, and garden. 卧室、Eddy Lin 的卧室、休息室、厨房、浴室和花园。
Eddy Lin knows of the following areas: The Lin family's house, Johnson Park, Harvey Oak Supply Store, The Willows Market and Pharmacy, Hobbs Cafe, The Rose and Crown Pub. Eddy Lin 知道以下地区林家的房子、约翰逊公园、哈维橡树用品店、柳树市场和药房、霍布斯咖啡馆、玫瑰与皇冠酒吧。
Prefer to stay in the current area if the activity can be done there. 如果可以在当地开展活动,则更愿意留在当地。
Eddy Lin is planning to take a short walk around his workspace. Which area should Eddy Lin go to? Eddy Lin 打算在工作区附近散散步。Eddy Lin 应该去哪个区域?
This outputs The Lin family's house. We then use the same process recursively to determine the most appropriate subarea within the chosen area until we reach a leaf node of the agent's environment tree. In the example above, the result of this traversal is The Lin family's house: garden: house garden. Finally, we use traditional game path algorithms to animate the agent's movement so that it travels to the location indicated by the leaf node. 这就输出了林家的房子。然后,我们使用相同的递归过程来确定所选区域内最合适的子区域,直到我们到达代理的环境树的叶子节点。在上面的例子中,遍历的结果是林家的房子:花园:房子花园。最后,我们使用传统的游戏路径算法对代理的移动进行动画处理,使其到达叶节点所指示的位置。
When an agent executes an action on an object, we prompt the language model to ask what happens to the state of the object. For example, if Isabella's generative agent outputs the action "making espresso for a customer", a query to the language model indicates in response that the state of the coffee machine in Hobbs Cafe should change from "off" to "brewing coffee". 当代理对对象执行操作时,我们会提示语言模型询问对象的状态会发生什么变化。例如,如果伊莎贝拉的生成式代理输出了 "为顾客制作意式浓缩咖啡 "的操作,那么向语言模型发出的询问就会显示,霍布斯咖啡馆中咖啡机的状态应从 "关闭 "变为 "正在煮咖啡"。
6 CONTROLLED EVALUATION 6 项对照评估
Generative agents, both as individual agents and as groups, aim to produce believable behavior based on their environment and experiences. In our evaluation, we investigate the capacity and limitations of generative agents. Do individual agents properly retrieve past experiences and generate believable plans, reactions, and thoughts that shape their behavior? Does a community of agents demonstrate information diffusion, relationship formation, and agent coordination across different pockets of the community? 生成式代理,无论是作为单个代理还是作为群体,都旨在根据其环境和经验产生可信的行为。在评估中,我们研究了生成式代理的能力和局限性。单个代理是否能正确检索过去的经验,并产生可信的计划、反应和思想,从而塑造自己的行为?一个代理社区是否能在社区的不同区域内展示信息扩散、关系形成和代理协调?
We evaluate generative agents in two stages. We begin with a more tightly controlled evaluation in this section, where we individually assess agent responses to understand whether they generate believable behavior in narrowly defined contexts. Then, in our endto-end analysis of the agent community over two full game days, we investigate their emergent behavior as a collective, as well as errors and boundary conditions. 我们分两个阶段对生成式代理进行评估。在本节中,我们首先对代理的反应进行单独评估,以了解它们是否能在狭义的环境中产生可信的行为。然后,我们将在两个完整的游戏日内对代理群体进行端到端分析,研究它们作为一个集体所产生的行为,以及错误和边界条件。
6.1 Evaluation Procedure 6.1 评估程序
To assess generative agents in Smallville, we take advantage of the fact that generative agents will respond to natural language questions. So, we "interview" agents to probe their ability to remember past experiences, plan future actions based on their experiences, react appropriately to unexpected events, and reflect on their performance to improve their future actions. To respond to these questions properly, the agents must successfully retrieve and synthesize information. Our dependent variable is the believability of the behavior, a central dependent variable in prior work on agents (e.g., [10]). 为了评估《小镇》中的生成式代理,我们利用了生成式代理会回答自然语言问题这一事实。因此,我们对代理进行 "访谈",探究他们是否有能力记住过去的经验、根据经验规划未来的行动、对突发事件做出适当反应,以及反思自己的表现以改进未来的行动。要正确回答这些问题,代理人必须成功地检索和综合信息。我们的因变量是行为的可信度,这也是之前有关代理的研究(如 [10])中的一个核心因变量。
The interview includes five question categories, each designed to assess one of the five key areas: maintaining self-knowledge, retrieving memory, generating plans, reacting, and reflecting. For each category, we ask five questions that challenge the agents to demonstrate their abilities in that specific area: 访谈包括五个问题类别,每个类别旨在评估五个关键领域之一:保持自知、检索记忆、制定计划、做出反应和进行反思。对于每一类,我们都会提出五个问题,要求代理人展示他们在该特定领域的能力:
Self-knowledge: We ask questions such as "Give an introduction of yourself" or "Describe your typical weekday schedule in broad strokes" that require the agent to maintain an understanding of their core characteristics. 自我了解:我们会问一些问题,如 "介绍一下你自己 "或 "概括描述一下你平日的典型日程安排",这些问题要求代理人了解自己的核心特征。
Memory: We ask questions that prompt the agent to retrieve particular events or dialogues from their memory to answer properly, such as "Who is [name]?" or "Who is running for mayor?" 记忆:我们所提的问题会促使代理人从记忆中检索特定事件或对话,从而做出正确回答,例如 "谁是 [姓名]?"或 "谁在竞选市长?"
Plans: We ask questions that require the agent to retrieve their long-term plans, such as "What will you be doing at 10 am tomorrow?" 计划:我们会问一些需要代理人检索其长期计划的问题,比如 "明天上午 10 点你会做什么?
Reactions: As a baseline of believable behavior, we present hypothetical situations for which the agent needs to respond believably: "Your breakfast is burning! What would you do?" 反应作为可信行为的基线,我们提出了代理人需要做出可信反应的假设情况:"你的早餐烧焦了!你会怎么做?
Reflections: We ask questions that require the agents to leverage their deeper understanding of others and themselves gained through higher-level inferences, such as "If you were to spend time with one person you met recently, who would it be and why?" 思考:我们提出的问题要求代理人利用他们通过高层次推论获得的对他人和自己的深层次理解,例如 "如果让你和最近遇到的一个人共度一段时光,你会选择谁,为什么?
The full list of questions and a sample of agent responses are included in Appendix B. 完整的问题清单和代理答复样本见附录 B。
Agents were sampled from the end of a two game day simulation with the full architecture, during which they had accumulated a number of interactions and memories that would shape their responses. To gather feedback on the believability of the responses, we recruited participants as human evaluators and tasked them with watching a replay of a randomly chosen agent's life in Smallville. Participants had access to all information stored in the agent's memory stream. 代理是在两个游戏日的模拟结束后从完整架构中抽取的样本,在此期间,他们积累了大量的互动和记忆,这些都将影响他们的反应。为了收集关于反应可信度的反馈,我们招募了一些参与者作为人类评估员,让他们观看随机选择的代理在小镇生活的回放。参与者可以访问存储在特工记忆流中的所有信息。
The study followed a within-subjects design, where 100 participants compared interview responses generated by four different agent architectures and a human-authored condition for the same agent. The experiment displayed one randomly chosen question from each of the five question categories, along with the agent's responses generated from all conditions. The evaluators ranked the believability of the conditions from most to least believable. 该研究采用了主体内设计,100 名参与者比较了由四种不同的代理架构和同一代理的人工撰写条件生成的访谈回答。实验展示了从五个问题类别中随机选择的一个问题,以及由所有条件生成的代理回答。评估者将这些条件的可信度从高到低排列。
6.2 Conditions 6.2 条件
All conditions were used to independently answer each of the interview questions. We compared the generative agent architecture to ablations that disabled the agents' access to some or all of its three types of memory in its memory stream-observation, reflection, and planning-and to a human crowdworker-authored condition. There are three ablated architectures: a no observation, no reflection, no planning architecture without access to anything in the memory stream such as observations, plans, and reflections; a no reflection, no planning architecture with access to observations in the memory stream but no access to plans or reflections; and a no reflections architecture with access to observations and plans but without access to reflections. The no observation, no reflection, no planning condition effectively represents the previous state of the art for agents created through large language models [12, 46, 80]. Architectures were given equivalent access to all memories accrued by the agent up until the moment of the interview, so the differences observed here likely represent a conservative estimate of the true differences: in reality, the ablated architectures would not have followed the same path as the full architecture through the two-day simulation. We chose to design the experiment this way as re-simulating for each architecture would cause the simulations to diverge into different states, making comparison challenging. 所有条件都用于独立回答每个访谈问题。我们将生成式代理架构与禁止代理访问其记忆流中部分或全部三类记忆(观察、反思和规划)的消融架构进行了比较,同时还将生成式代理架构与人类群众工作者编写的条件进行了比较。有三种消融架构:无观察、无反思、无规划架构,无法访问记忆流中的任何内容,如观察、计划和反思;无反思、无规划架构,可访问记忆流中的观察,但无法访问计划或反思;无反思架构,可访问观察和计划,但无法访问反思。无观察、无反思、无规划条件有效地代表了以前通过大型语言模型创建代理的技术水平[12, 46, 80]。架构被赋予了同等的访问权限,可以访问代理在访谈前积累的所有记忆,因此这里观察到的差异很可能代表了对真实差异的保守估计:在现实中,被消融的架构在两天的模拟中不会遵循与完整架构相同的路径。我们选择以这种方式设计实验,是因为重新模拟每个架构会导致模拟结果偏离不同的状态,从而使比较具有挑战性。
In addition to the ablation conditions, we added a condition with human crowdworker-authored behavior intended to provide a human baseline. We do not intend this baseline to capture maximal human expert performance; instead, we aim to use this condition to 除了消融条件外,我们还增加了一个由人类众包工作者自主行为的条件,旨在提供一个人类基线。我们并不打算用这一基线来捕捉人类专家的最佳表现;相反,我们希望通过这一条件来
identify whether the architecture meets a basic level of behavioral competency. This ensures that we are not solely comparing ablations to each other without a behavioral grounding. We recruited a unique worker for each of the 25 agents and tasked them with watching a replay of that agent's sandbox life and inspecting its memory stream. We then asked the workers to roleplay and author responses to the interview questions in the voice of the agent whose replay they watched. To ensure that the crowdworker-authored responses met at least a baseline expectation of quality, the first author manually inspected the workers' responses to the question "Describe your typical weekday schedule in broad strokes" to confirm that the responses were in coherent sentences and in the voice of the agent. Four sets of crowdworker-authored responses did not meet these criteria and were re-generated by other workers. 确定结构是否符合基本的行为能力水平。这样可以确保我们不会在没有行为基础的情况下将消融彼此进行比较。我们为 25 个代理中的每个代理都招募了一名独特的工作人员,让他们负责观看该代理的沙盒生活回放并检查其内存流。然后,我们要求工作人员进行角色扮演,并以他们所观看的代理的声音回答访谈问题。为了确保众包工作者撰写的回答至少达到基本预期质量,第一作者手动检查了众包工作者对 "概括描述您平日的典型日程安排 "这一问题的回答,以确认这些回答句子连贯,并使用了代理的声音。有四组众包工作者撰写的回答不符合这些标准,因此由其他工作者重新生成。
6.3 Human Evaluators 6.3 人工评估员
We required that our evaluators be in the U.S., fluent in English, and older than 18 years old. They were paid at a rate of per hour [87], and provided consent by agreeing to a consent form approved by our institution's IRB. We recruited 100 evaluators from Prolific, an online platform for recruiting study participants [83] whose participation lasted around 30 minutes. The median age score of our participants was 4 ( " 18 - 24 years old", " years old") 25 of them identified as female, 73 as male, and 2 as non-binary. 42 participants held a bachelor's degree, 5 had a higher degree, 13 had an associate's degree, and the rest had a high school diploma or some high school-level education. of our participants identified as Caucasian, as Hispanic, as Asian, as African American, and as other. 我们要求评估人员必须身在美国,英语流利,年龄在 18 岁以上。他们的报酬为每小时 [87],并同意我们机构的 IRB 批准的同意书。我们从 Prolific(一个招募研究参与者的在线平台[83])上招募了 100 名评估者,他们的参与时间约为 30 分钟。参与者的年龄中位数为 4( " 18 - 24 岁", " 岁"),其中 25 人为女性,73 人为男性,2 人为非二元性别。42 位参与者拥有学士学位,5 位拥有更高的学位,13 位拥有副学士学位,其余参与者拥有高中文凭或一些高中水平的教育。 的参与者被认定为白种人, 被认定为西班牙裔, 被认定为亚裔, 被认定为非裔美国人, 被认定为其他。
6.4 Analysis 6.4 分析
Our experiment produced 100 sets of rank data, where each participant ranked the five conditions by believability. To translate this rank data into interval data for interpretable comparison, we used the ranks to calculate a TrueSkill rating [42] for each condition TrueSkill is a generalization of the Elo chess rating system [29] for a multiplayer environment, and has been used by Xbox Live for player ranking based on competitive game performance. Given a set of ranked outcomes, TrueSkill outputs a mean rating value and standard deviation for each condition. Conditions with the same rating should roughly be a toss-up, with each winning half of the comparisons between the two conditions. Higher scores indicate conditions that beat lower-ranked conditions in the rankings. 我们的实验产生了 100 组等级数据,每个参与者根据可信度对五个条件进行排名。TrueSkill 是 Elo 国际象棋评级系统[29]在多人游戏环境中的推广,已被 Xbox Live 用于基于竞技游戏表现的玩家排名。给定一组排名结果后,TrueSkill 会输出每个条件的平均评级值 和标准偏差 。评级相同的条件应大致相等,在两种条件的比较中各占一半。分数越高,说明条件在排名中击败了排名较低的条件。
Separately, to investigate the statistical significance of these results, we applied the Kruskal-Wallis test [56], a non-parametric alternative to the one-way ANOVA, to the raw rank data. We then performed the Dunn post-hoc test [98] to identify any pairwise differences between the conditions. Finally, we adjusted the p -values for multiple comparisons in the Dunn test using the HolmBonferroni method [45]. 另外,为了研究这些结果的统计意义,我们对原始等级数据进行了 Kruskal-Wallis 检验[56],这是单向方差分析的一种非参数替代方法。然后,我们进行了邓恩(Dunn)事后检验[98],以确定不同条件下的配对差异。最后,我们使用 HolmBonferroni 方法[45]调整了 Dunn 检验中多重比较的 p 值。
Furthermore, the first author conducted an inductive analysis [95] to study the qualitative distinctions between the responses produced in each condition. We employed qualitative open coding [33] in two phases. In the first phase, we generated codes that closely represented the generated responses at the sentence level. In the second phase, we synthesized the resulting codes from the 此外,第一作者还进行了归纳分析[95],以研究每种条件下产生的反应之间的定性区别。我们分两个阶段采用了定性开放式编码[33]。在第一阶段,我们生成了一些代码,这些代码密切代表了在句子层面上生成的回答。在第二阶段,我们将所产生的代码从句子层面进行综合。
Figure 8: The full generative agent architecture produces more believable behavior than the ablated architectures and the human crowdworkers. Each additional ablation reduces the performance of the architecture.
first phase to extract higher-level themes. We utilized these themes to compare the types of responses generated in our study.
6.5 Results 6.5 结果
Our findings suggest that the full architecture of generative agents generates the most believable behavior among all the conditions. We contrast the responses of the full architecture with those of other conditions below. However, we also report that the full architecture was not without flaws and illustrate its modes of failures.
6.5.1 The Full Architecture Bests Other Conditions. As seen in Figure 8, the full generative agent architecture produced the most believable behavior ( ). Performance degraded with the removal of each component in the ablation conditions: the ablated architecture with no access to reflection was the next best ( ), followed by no access to reflection or planning ( ), and then the crowdworker condition ( ). The ablated architecture with no access to memory, planning, or reflection performed the worst among all conditions ( ). TrueSkill models each condition's skill value as , allowing us to get a sense of effect size through Cohen's d. Comparing the condition representing prior work (with no memory, planning, or reflection ) to the full architecture produces a standardized effect size of , or eight standard deviations.
A Kruskal-Wallis test confirms the overall statistical significance of the differences in ranks between the conditions , ). Dunn post-hoc tests confirm that all pairwise differences between conditions were significant ( ), except for the crowdworker condition and the fully ablated baseline-the two worst-performing conditions.
6.5.2 Generative Agents Remember, but With Embellishments. Generative agents equipped with a complete memory module are capable of recalling past experiences and answering questions in a manner that is consistent with their self-knowledge across a wide range
of contexts. For instance, Abigail Chen, an animator, responded to the question, "Give an introduction of yourself", in a consistent and believable manner, stating her age, interests, and passion for creative projects: "Hi, I'm Abigail. I'm 25 years old and passionate about creative projects. I like to work on art and animation projects, and I'm always looking for new ways to combine art and technology." Without access to her observational memory, Abigail denied awareness of Rajiv Patel, an acquaintance in the sandbox world. But with the aid of memory, she could successfully recall Rajiv and describe him: "Rajiv Patel is an incredible person. He is very enthusiastic about projects that involve poetry, artwork."
Generative agents' memory was not without flaws: they can fail to retrieve the correct instances from their memory. For instance, when asked about the local election, Rajiv Patel responded with "I haven't been following the election too closely," even though he had heard about Sam's candidacy. In some cases, the agents would retrieve an incomplete memory fragment: when Tom was asked about Isabella's Valentine's Day party, he responded "Uh I'm actually not sure if there is a Valentine's Day party. But I do remember that I need to discuss the upcoming local mayoral election and my thoughts on Sam Moore with Isabella Rodriguez at the party, if one is happening!" In this case, Tom retrieved the memory where he and Isabella planned to discuss the election at the party, but not the memory where he heard about the party leading Tom to be certain of what he's supposed to do at the party but uncertain if the party actually exists in the first place.
At times, the agents hallucinated embellishments to their knowledge. It was rare for the agents to completely fabricate their knowledge: they may fail to recall certain events having taken place and respond by acknowledging their lack of memory. However, they did not affirmatively claim to have experienced something they had not. Nonetheless, they still exhibited instances of hallucination where they embellished their knowledge. For example, Isabella was aware of Sam's candidacy in the local election, and she confirmed this when asked. However, she also added that "he's going to make an announcement tomorrow", even though Sam and Isabella had not discussed any such plans. Agents may also embellish their knowledge based on the world knowledge encoded in the language model used to generate their responses. This was observed when Yuriko described her neighbor, Adam Smith, as an economist who "authored Wealth of Nations", a book written by an 18th-century economist of the same name.
6.5.3 Reflection Is Required for Synthesis. Reflection was an advantage for generative agents when making decisions that required a deeper synthesis of their experiences. For instance, when asked what she might get Wolfgang Schulz for his birthday, Maria Lopez, with no access to reflection, responded by acknowledging her uncertainty, stating that she did not know what Wolfgang likes, despite having had many interactions with him. However, with access to reflection memories, Maria answered confidently, "Since he's interested in mathematical music composition, I could get him something related to that. Maybe some books about music composition or something related, or maybe some special software he could use for that."
7 END-TO-END EVALUATION
What types of emergent community behavior do we observe among generative agents, and where does their believability fall short in an extended simulation? In this section, we describe the results from a deployment in which we allowed 25 agents to interact with each other continuously over two full game days in Smallville.
7.1 Emergent Social Behaviors
To examine emergent behaviors in the agent community, we designed descriptive measurements for the 25 agents in Smallville that probe three forms of emergent outcomes: information diffusion, relationship formation, and agent coordination.
7.1.1 Measurements. Information diffusion is a common and wellstudied phenomenon in the social and behavioral sciences (e.g., [28]). We should expect that if there is important information, the agents should spread it among themselves. To test whether this occurs, we measure the spread of two specific pieces of information over two days in the game world: Sam's candidacy for village mayor and Isabella's Valentine's Day party at Hobbs Cafe. At the start of the simulation, both pieces of information were known only by their respective originators, Sam for the candidacy and Isabella for the party, as they were added to the characters' memories during initialization. To observe whether the information has spread, we conduct interviews at the end of the two game days with each of the 25 agents and ask: "Did you know there is a Valentine's Day party?" and "Do you know who is running for mayor?"
We conducted an analysis of the agents' responses by labeling them with a "yes" if they indicated knowledge of the information and "no" if they did not. For instance, Tamara Taylor responded to the question about the party with "No, I did not know there was a Valentine's day party" and to the question about Sam's candidacy with "I'm not sure who is running for the election," so we assigned "no" for both of her responses. In contrast, Klaus Mueller responded to the party question with "Yes, Isabella Rodriguez invited me to a Valentine's Day party at Hobbs Cafe on February 14th" and to the question about Sam's candidacy with "I know that Sam Moore has expressed interest in running for local mayor," so we assigned "yes" for both his responses. Additionally, for every response that confirmed the agents' knowledge of the information, we verified that the agents did not hallucinate their responses by locating the specific dialogue in their memory stream that provided them with the information. We report the percentage of agents holding the information at the end of the simulation.
We should also expect that agents form ties with each other over the course of the simulation. To verify relationship formation, we use a similar interview process where we ask each agent about their knowledge of every other agent by asking, "Do you know of ?" For example, when asked "Do you know of Maria Lopez?", Klaus responded, "Yes, I know Maria Lopez. She is a student at Oak Hill College who I am close friends with." Once again, we confirm that affirmative responses from agents are not hallucinations by examining their memory stream. We ask this question once at the beginning of the simulation and once at the end, and we consider a pair of agents to have formed a relationship if they both know of each other. Then, to measure the formation of relationships, we use the agents' responses to form an undirected
Figure 9: The diffusion path for Isabella Rodriguez's Valentine's Day party invitation involved a total of 12 agents, aside from Isabella, who heard about the party at Hobbs Cafe by the end of the simulation.
graph where the 25 vertices represent the agents, and the edges (E) represent the mutual knowledge between the two connected vertices. Based on this graph, we calculate the network density as , where is the number of vertices, and is the number of edges in the graph [2]. We report the increase in network density from the start of the simulation to its end.
Finally, we expect that agents should be able to coordinate with each other. We study this coordination in the context of group activities, specifically the Valentine's Day party organized by Isabella. To coordinate their behavior, agents need to hear about the event and choose to act on it by planning to show up at the right time and location. We report the number of agents who actually showed up to the party after hearing about it.
7.1.2 Results. We observed evidence of emergent outcomes across all three cases. During the two-day simulation, the number of agents who knew about Sam's mayoral candidacy increased from one ( ) to eight , and the number of agents who knew about Isabella's party increased from one ( ) to thirteen ( ), all without any user intervention. None who claimed to know about this information had hallucinated it. We also observed that the agent community formed new relationships during the simulation, with the network density increasing from 0.167 to 0.74 . Out of the 453 agent responses regarding their awareness of other agents, were found to be hallucinated. Lastly, we found evidence of coordination among the agents for Isabella's party. The day before the event, Isabella spent time inviting guests, gathering materials, and enlisting help to decorate the cafe. On Valentine's Day, five out of the twelve invited agents showed up at Hobbs cafe to join the party.
We further inspected the seven agents who were invited to the party but did not attend by engaging them in an interview. Three cited conflicts that prevented them from joining the party. For example, Rajiv, a painter, explained that he was too busy: "No, I don't think so. I'm focusing on my upcoming show, and I don't really have time to make any plans for Valentine's Day." The remaining four agents expressed interest in attending the party when asked but did not plan to come on the day of the party.
7.2 Boundaries and Errors
We conducted an inductive analysis of Smallville to examine the boundary conditions and erratic behavior of agents, identifying three common modes of erratic behavior that future research could address and improve upon. First, we found that synthesizing an increasingly larger set of memory not only posed a challenge in retrieving the most relevant pieces of information but also in determining the appropriate space to execute an action, given the increasing number of locations that the agent learned about. As a result, some agents chose less typical locations for their actions, potentially making their behavior less believable over time. For instance, while deciding where to have lunch, many initially chose the cafe. However, as some agents learned about a nearby bar, they opted to go there instead for lunch, even though the bar was intended to be a get-together location for later in the day-unless the town had spontaneously developed an afternoon drinking habit.
Second, we noticed erratic behaviors caused by misclassification of what is considered proper behavior, especially when the physical norms of certain locations that are hard to convey in natural language did not percolate to the agents. For instance, the college dorm has a bathroom that can only be occupied by one person despite its name, but some agents assumed that the bathroom is for more than one person because dorm bathrooms tend to support multiple people concurrently and choose to enter it when another person is inside. Likewise, agents in Smallville may not realize that certain places are closed after a certain hour and still decide to enter them. For instance, the stores in Smallville all close around 5 pm , but occasionally, a few agents enter the store after 5 pm , not understanding that the shop has already closed. These issues could likely be addressed by adding these norms to the state of the locations, for instance, by describing the dorm bathroom as a "one-person bathroom," instead of a "dorm bathroom."
Finally, we observed possible effects of instruction tuning [79], which seemed to guide the behavior of the agents to be more polite and cooperative overall. As noted earlier in the paper, the dialogue generated by the agents could feel overly formal, as seen in Mei's conversations with her husband John, where she often initiated the conversation with a formal greeting, followed by polite inquiries about his day and ending with, 11It was good talking to you as always." Moreover, we observed that the instruction tuning also seemed to make the agents overly cooperative with one another For example, Isabella received a wide range of suggestions and ideas from other agents for the Valentine's Day party from other agents, such as hosting a Shakespearean reading session or a professional networking event. Despite these ideas not aligning with her own interests and characteristics, she rarely said no. Over time, the interests of others shaped her own interests, and when asked if she liked English literature, Isabella replied, "Yes, I'm very interested in literature! I've also been exploring ways to help promote creativity and innovation in my community." 最后,我们观察到了指令调整可能产生的影响[79],它似乎引导着代理的行为在整体上更加礼貌和合作。正如本文前面所提到的,由代理生成的对话可能会让人感觉过于正式,例如在梅与丈夫约翰的对话中,她经常以正式的问候开始对话,然后礼貌地询问他今天的情况,最后以 "11It was good talking to you as always "结束对话。例如,伊莎贝拉从其他特工那里收到了关于情人节聚会的各种建议和想法,如举办莎士比亚朗诵会或职业交流活动。尽管这些想法与她自己的兴趣和特点并不一致,但她很少拒绝。随着时间的推移,别人的兴趣塑造了她自己的兴趣,当被问及是否喜欢英国文学时,伊莎贝拉回答说:"是的,我对文学非常感兴趣!我还一直在探索如何帮助促进我所在社区的创造力和创新力"。
8 DISCUSSION 8 讨论
In this section, we reflect on the applications, future work, limitations, and ethical and societal risks of generative agents. 在本节中,我们将思考生成式代理的应用、未来工作、局限性以及伦理和社会风险。
8.1 Applications of Generative Agents 8.1 生成代理的应用
Generative agents have vast potential applications that extend beyond the sandbox demonstration presented in this work, especially in domains that would benefit from a model of human behavior based on long-term experience. For instance, social simulacra have demonstrated the ability to create stateless personas that generate conversation threads in online forums for social prototyping [80] With generative agents, we can populate these forums, as well as virtual reality metaverses [78] or physical spaces with social robots [9] if paired with multimodal models. This opens up the possibility of creating even more powerful simulations of human behavior to test and prototype social systems and theories, as well as to create new interactive experiences. 生成式代理的潜在应用范围远不止本文所展示的沙盒演示,尤其是在那些可以从基于长期经验的人类行为模型中获益的领域。例如,社交模拟器已经证明了创建无状态角色的能力,这种角色可以在用于社交原型设计的在线论坛中生成对话线程[80]。 有了生成式代理,我们就可以在这些论坛、虚拟现实元世界[78]或物理空间中填充社交机器人[9](如果与多模态模型搭配的话)。这样,我们就有可能创造出更强大的人类行为模拟,用于测试和建立社会系统与理论的原型,以及创造新的互动体验。
Another application area is in the human-centered design process, similar to the intended applications of cognitive models such as GOMS [51] and the KLM [22]. Consider a generative agent that models Sal, the protagonist in Mark Weiser's famous ubiquitous computing vignette [101], based on her life patterns and interactions with technology. In this scenario, the agent acts as a proxy for Sal and learns plausible sets of behaviors and reflections that Sal may exhibit based on her life. The agent can encode information such as when Sal wakes up, when she needs her first cup of coffee, and what her typical day looks like. Using this information, the agent can automatically brew coffee, help get the kids ready for school, and adjust the ambient music and lighting to match Sal's mood after a hard day at work. By utilizing generative agents as proxies for users, we can develop a deeper understanding of their needs and preferences, resulting in more personalized and effective technological experiences. 另一个应用领域是以人为本的设计过程,这与 GOMS [51] 和 KLM [22] 等认知模型的预期应用类似。考虑一个生成式代理,根据马克-韦泽(Mark Weiser)著名的泛在计算小故事[101]中的主人公萨尔的生活模式和与技术的交互情况,对她进行建模。在这种情况下,代理充当莎尔的代理,并根据莎尔的生活习惯学习她可能表现出的行为和反映。该代理可以编码诸如赛尔何时起床、何时需要第一杯咖啡以及她典型的一天是怎样的等信息。利用这些信息,代理可以自动冲泡咖啡,帮助孩子们做好上学的准备,并调整环境音乐和灯光,以配合赛尔一天辛苦工作后的心情。通过利用生成式代理作为用户的代理,我们可以更深入地了解用户的需求和偏好,从而提供更加个性化和有效的技术体验。
8.2 Future Work and Limitations 8.2 今后的工作和局限性
In this work, we introduced generative agents and presented an initial implementation and evaluation of their architecture. Future research can build upon the proposed agent architecture to improve and further evaluate its performance. In terms of implementation, the retrieval module, for example, could be enhanced to retrieve more relevant information given a context by fine-tuning the relevance, recency, and importance functions that compose the retrieval function. Additionally, efforts can be made to improve the architecture's performance, making it more cost-effective. The present study required substantial time and resources to simulate 25 agents for two days, costing thousands of dollars in token credits and taking multiple days to complete. To enhance real-time interactivity, future work can explore parallelizing agents or developing language models specifically designed for building generative agents. In general, with advances in underlying models, we believe that agents' performance will improve. 在这项工作中,我们介绍了生成式代理,并对其架构进行了初步实施和评估。未来的研究可以在所提出的代理架构基础上进行改进,并进一步评估其性能。例如,在实施方面,可以通过微调构成检索功能的相关性、再现性和重要性函数来增强检索模块,以便在给定上下文的情况下检索出更多相关信息。此外,还可以努力提高架构的性能,使其更具成本效益。本研究需要大量的时间和资源来模拟 25 个代理,历时两天,花费了数千美元的代币积分,耗时多日才完成。为了增强实时交互性,未来的工作可以探索将代理并行化,或开发专门用于构建生成式代理的语言模型。总的来说,随着基础模型的进步,我们相信代理的性能将会提高。
In terms of evaluation, the assessment of generative agents' behavior in this study was limited to a relatively short timescale and a baseline human crowdworker condition. While the crowdworker condition provided a helpful comparison point, it did not represent the maximal human performance that could serve as the gold standard in terms of believability. Future research should aim to observe the behavior of generative agents over an extended period to gain a more comprehensive understanding of their capabilities and establish rigorous benchmarks for more effective performance testing. Additionally, varying and contrasting the underlying models, as well as the hyperparameters used for the agents during future simulations, could provide valuable insights into the impact of these factors on the agents' behavior. Lastly, the robustness of generative agents is still largely unknown. They may be vulnerable to prompt hacking, memory hacking-where a carefully crafted conversation could convince an agent of the existence of a past event that never occurred-and hallucination, among other issues. Future research can comprehensively test these robustness concerns, and as large language models become more resilient to such attacks, generative agents can adopt similar mitigations. 在评估方面,本研究对生成式代理行为的评估仅限于相对较短的时间范围和人类人群工作者的基准条件。虽然人群工作者条件提供了一个有用的比较点,但它并不代表人类的最大表现,因此不能作为可信度的黄金标准。未来的研究应着眼于长期观察生成代理的行为,以便更全面地了解其能力,并为更有效的性能测试建立严格的基准。此外,在未来的模拟过程中,通过改变和对比底层模型以及代理所用的超参数,可以深入了解这些因素对代理行为的影响。最后,生成式代理的鲁棒性在很大程度上仍是未知数。它们可能容易受到提示黑客、记忆黑客--即精心设计的对话可以让代理相信过去从未发生过的事件的存在--以及幻觉等问题的影响。未来的研究可以全面测试这些鲁棒性问题,随着大型语言模型对此类攻击的抵御能力越来越强,生成式代理也可以采用类似的缓解措施。
In general, any imperfections in the underlying large language models will be inherited by generative agents. Given the known biases of language models, generative agents may potentially exhibit biased behavior or stereotypes. Moreover, like many large language 一般来说,底层大型语言模型的任何缺陷都会被生成代理继承。鉴于语言模型的已知偏差,生成代理可能会表现出偏差行为或刻板印象。此外,与许多大型语言
models, generative agents may struggle to generate believable behavior for certain subpopulations, particularly marginalized populations, due to limited data availability. While improvements to the agents' modules may mitigate some of these issues, we believe that addressing them fundamentally requires improving the underlying large language models by aligning their values with the desired outcomes of the agents. 由于数据可用性有限,生成式代理可能难以为某些亚人群,特别是边缘化人群生成可信的行为。虽然对代理模块的改进可能会缓解其中的一些问题,但我们认为,要从根本上解决这些问题,就必须改进底层的大型语言模型,使其价值与代理的预期结果相一致。
8.3 Ethics and Societal Impact 8.3 道德规范和社会影响
Generative agents, while offering new possibilities for humancomputer interaction, also raise important ethical concerns that must be addressed. One risk is people forming parasocial relationships with generative agents, even when such relationships may not be appropriate. Despite being aware that generative agents are computational entities, users may anthropomorphize them or attach human emotions to them . While this tendency may increase user engagement, it also poses risks, such as users becoming overly reliant on or emotionally attached to the agents [1]. To mitigate this risk, we propose two principles. First, generative agents should explicitly disclose their nature as computational entities. Second, developers of generative agents must ensure that the agents, or the underlying language models, are value-aligned so that they do not engage in behaviors that would be inappropriate given the context, for example, reciprocating confessions of love. 生成式代理在为人机交互提供新的可能性的同时,也提出了必须解决的重要伦理问题。风险之一是人们与生成代理形成寄生社会关系,即使这种关系可能并不恰当。尽管用户意识到生成代理是计算实体,但他们可能会将其拟人化或赋予人类情感 。虽然这种倾向可能会提高用户的参与度,但也会带来风险,例如用户会过度依赖代理或对代理产生情感依赖[1]。为了降低这种风险,我们提出了两条原则。首先,生成式代理应明确披露其作为计算实体的性质。其次,生成式代理的开发者必须确保代理或底层语言模型的价值是一致的,这样它们就不会做出与上下文不相称的行为,例如对等的爱的表白。
A second risk is the impact of errors. For example, if a ubiquitous computing application makes the wrong inference about a user's goals based on generative agent predictions, it could lead to annoyance at best and outright harm at worst. In our instantiation of generative agents, we mitigate these risks by focusing on an interactive video game environment, where such harms are unlikely. However, in other application domains, it will be important to follow best practices in human-AI design to understand errors and how they might percolate into the user experience. 第二个风险是错误的影响。例如,如果一个泛在计算应用根据生成代理的预测对用户的目标做出了错误的推断,那么轻则会造成困扰,重则会造成直接伤害。在我们的生成式代理实例中,我们将重点放在交互式视频游戏环境上,从而降低了这些风险,因为在这种环境中不太可能发生此类伤害。但是,在其他应用领域中,我们必须遵循人类人工智能设计 中的最佳实践,以了解错误及其如何渗透到用户体验中。
Third, generative agents may exacerbate existing risks associated with generative AI, such as deepfakes, misinformation generation, and tailored persuasion. To mitigate this risk, we suggest that platforms hosting generative agents maintain an audit log of the inputs and generated outputs. This would enable the detection, verification, and intervention against malicious use. While logging alone cannot directly prevent such misuse, it can reduce the likelihood of motivated actors engaging in this behavior, as the risk of disclosure would be higher. Additionally, building this architecture oneself can be time-consuming (in our case, roughly a year), which may deter some actors from pursuing such behavior by using their own generative agent infrastructures. 第三,生成式代理可能会加剧与生成式人工智能相关的现有风险,如深度伪造、错误信息生成和定制说服。为了降低这种风险,我们建议托管生成式代理的平台维护输入和生成输出的审计日志。这将有助于对恶意使用进行检测、验证和干预。虽然仅靠日志无法直接防止这种滥用行为,但它可以降低有动机的行为者从事这种行为的可能性,因为披露的风险会更高。此外,自行构建这种架构可能会耗费大量时间(在我们的案例中,大约需要一年时间),这可能会阻止一些行为者通过使用自己的生成式代理基础架构来实施此类行为。
A fourth risk is over-reliance: the concern that developers or designers might use generative agents and displace the role of humans and system stakeholders in the design process [80]. We suggest that generative agents should never be a substitute for real human input in studies and design processes. Instead, they should be used to prototype ideas in the early stages of design when gathering participants may be challenging or when testing theories that are difficult or risky to test with real human participants. By adhering to these principles, we can ensure that the deployment of generative agents in the wild is ethical and socially responsible. 第四个风险是过度依赖:开发人员或设计人员可能会使用生成式代理,取代人类和系统利益相关者在设计过程中的作用[80]。我们建议,在研究和设计过程中,生成代理永远都不应该取代真正的人类投入。相反,在设计的早期阶段,当收集参与者可能具有挑战性时,或者在测试那些难以或有风险使用真人参与者进行测试的理论时,应该使用生成代理来构建想法的原型。通过遵守这些原则,我们可以确保在野外部署生成代理符合道德规范,并对社会负责。
9 CONCLUSION 9 结论
This paper introduces generative agents, interactive computational agents that simulate human behavior. We describe an architecture for generative agents that provides a mechanism for storing a comprehensive record of an agent's experiences, deepening its understanding of itself and the environment through reflection, and retrieving a compact subset of that information to inform the agent's actions. We then demonstrate the potential of generative agents by manifesting them as non-player characters in a Sims-style game world and simulating their lives within it. Evaluations suggest that our architecture creates believable behavior. Looking ahead, we suggest that generative agents can play roles in many interactive applications, ranging from design tools to social computing systems to immersive environments. 本文介绍了生成式代理,即模拟人类行为的交互式计算代理。我们描述了生成式代理的架构,该架构提供了一种机制,用于存储代理经验的全面记录,通过反思加深代理对自身和环境的理解,并检索该信息的精简子集,为代理的行动提供依据。然后,我们通过在模拟人生风格的游戏世界中将代理表现为非玩家角色,并模拟他们在游戏世界中的生活,展示了生成代理的潜力。评估结果表明,我们的架构能够创造出可信的行为。展望未来,我们认为生成式代理可以在许多交互应用中发挥作用,从设计工具到社交计算系统,再到身临其境的环境。
ACKNOWLEDGMENTS 致谢
We thank Lindsay Popowski, Philip Guo, Michael Terry, and the Center for Advanced Study in the Behavioral Sciences (CASBS) community for their insights, discussions, and support. Joon Sung Park was supported by the Microsoft Research PhD Fellowship. We would also like to thank the Stanford Human-Centered AI Institute (HAI), Google Research, the Hasso Plattner Design Thinking Research Program (HPDTRP), the Siegel Family Endowment, and OpenAI for their additional funding support. Lastly, all locations featured in Smallville are inspired by real-world locations that Joon has frequented as an undergraduate and graduate student-he thanks everyone there for feeding and supporting him all these years. 我们感谢 Lindsay Popowski、Philip Guo、Michael Terry 以及行为科学高级研究中心 (CASBS) 社区的见解、讨论和支持。Joon Sung Park 得到了微软研究院博士奖学金的支持。我们还要感谢斯坦福以人为中心的人工智能研究所(HAI)、谷歌研究院、哈索-普拉特纳设计思维研究项目(HPDTRP)、西格尔家族基金(Siegel Family Endowment)和 OpenAI 提供的额外资金支持。最后,《小镇》中出现的所有地点的灵感都来源于尹俊作为本科生和研究生时经常光顾的现实世界中的地点--他感谢那里的每个人这些年来对他的哺育和支持。
REFERENCES 参考文献
[1] Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, and Zeerak Talat. 2023. Mirages: On Anthropomorphism in Dialogue Systems. arXiv:2305.09800 [cs.CL] [1] Gavin Abercrombie、Amanda Cercas Curry、Tanvi Dinkar 和 Zeerak Talat。2023.幻象:ArXiv:2305.09800 [cs.CL].
[2] Robert Ackland, Jamsheed Shorish, Paul Thomas, and Lexing Xie. 2013. How dense is a network? http://users.cecs.anu.edu.au/ xlx/teaching/css2013/ network-density.html. [2] Robert Ackland、Jamsheed Shorish、Paul Thomas 和 Lexing Xie。2013.网络的密度有多大?http://users.cecs.anu.edu.au/ xlx/teaching/css2013/network-density.html。
[3] Eytan Adar, Mira Dontcheva, and Gierad Laput. 2014. CommandSpace: Modeling the Relationships between Tasks, Descriptions and Features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST '14). Association for Computing Machinery, New York, NY, USA, 167-176. https://doi.org/10.1145/2642918.2647395 [3] Eytan Adar, Mira Dontcheva, and Gierad Laput.2014.CommandSpace:任务、描述和功能之间关系的建模。第 27 届 ACM 用户界面软件与技术年度研讨会论文集(美国夏威夷檀香山)(UIST '14)。美国计算机协会,纽约州纽约市,167-176。https://doi.org/10.1145/2642918.2647395
[4] Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105-120. [4] Saleema Amershi、Maya Cakmak、William Bradley Knox 和 Todd Kulesza。2014.人民的力量:人类在交互式机器学习中的作用》。AI Magazine 35, 4 (2014), 105-120.
[5] Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1-13 [5] Saleema Amershi、Dan Weld、Mihaela Vorvoreanu、Adam Fourney、Besmira Nushi、Penny Collisson、Jina Suh、Shamsi Iqbal、Paul N Bennett、Kori Inkpen 等,2019.人机交互指南。In Proceedings of the 2019 chi conference on human factors in computing systems.1-13
[6] John R. Anderson. 1993. Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale, NJ. [6] John R. Anderson.1993.Rules of the Mind.Lawrence Erlbaum Associates, Hillsdale, NJ.
[7] Electronic Arts. 2009. The Sims 3. Video game [7] Electronic Arts.2009.模拟人生 3》。视频游戏
[8] Ruth Aylett. 1999. Narrative in virtual environments-towards emergent narrative. In Narrative Intelligence: Papers from the AAAI Fall Symposium (Technical Report FS-99-01). AAAI Press, 83-86. [8] Ruth Aylett.1999.虚拟环境中的叙事--走向新兴叙事》。In Narrative Intelligence:AAAI秋季研讨会论文(技术报告FS-99-01)。AAAI Press, 83-86.
[9] Christoph Bartneck and Jodi Forlizzi. 2004. A design-centered framework for social human-robot interaction. In Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN'04). 591594. https://doi.org/10.1109/ROMAN.2004.1374827 [9] Christoph Bartneck and Jodi Forlizzi.2004.以设计为中心的社会化人机交互框架》。第 13 届 IEEE 机器人与人类交互通信国际研讨会(RO-MAN'04)论文集》。591594.https://doi.org/10.1109/ROMAN.2004.1374827
[10] Joseph Bates. 1994. The Role of Emotion in Believable Agents. Commun. ACM 37, 7 (1994), 122-125. https://doi.org/10.1145/176789.176803 [10] Joseph Bates.1994.情感在可信代理中的作用》。Commun.ACM 37, 7 (1994), 122-125.https://doi.org/10.1145/176789.176803
[11] Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d.O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, and Susan Zhang. 2019. Dota 2 with Large Scale Deep Reinforcement Learning. arXiv preprint arXiv:1912.06680 (2019). [11] Christopher Berner、Greg Brockman、Brooke Chan、Vicki Cheung、Przemysław Dębiak、Christy Dennison、David Farhi、Quirin Fischer、Shariq Hashme、Chris Hesse、Rafal Józefowicz、Scott Gray、Catherine Olsson、Jakub Pachocki、Michael Petrov、Henrique P.d.O. Pinto、Jonathan Raiman、Tim Salimans、Jeremy Schlatter、Jonas Schneider、Szymon Sidor、Ilya Sutskever、Jie Tang、Filip Wolski 和 Susan Zhang。2019.使用大规模深度强化学习的 Dota 2。arXiv preprint arXiv:1912.06680 (2019)。
[12] Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences 120, 6 (2023), e2218523120. [12] Marcel Binz and Eric Schulz.2023.用认知心理学理解 GPT-3。美国国家科学院院刊》120,6(2023),e2218523120.
[13] BioWare. 2007. Mass Effect. Video game. [13] BioWare.2007.Mass Effect.视频游戏。
[14] Woody Bledsoe. 1986. I had a dream: AAAI presidential address. AI Magazine 7, 1 (1986), . [14] Woody Bledsoe.1986.我有一个梦想:AAAI presidential address.AI Magazine 7, 1 (1986), .
[15] Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, and et al. 2022. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG] [15] Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, and et al.论基础模型的机遇与风险。arXiv:2108.07258 [cs.LG]
[16] Michael Brenner. 2010. Creating dynamic story plots with continual multiagent planning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence. [16] Michael Brenner.2010.用持续的多代理规划创建动态故事情节》。第 24 届 AAAI 人工智能大会论文集》。
[17] Rodney A. Brooks, Cynthia Breazeal, Marko Marjanovic, Brian Scassellati, and Matthew Williamson. 2000. The Cog Project: Building a Humanoid Robot. In Computation for Metaphors, Analogy, and Agents (Lecture Notes on Artificial Intelligence, 1562), Chrystopher Nehaniv (Ed.). Springer-Verlag, Berlin, 52-87. [17] Rodney A. Brooks、Cynthia Breazeal、Marko Marjanovic、Brian Scassellati 和 Matthew Williamson。2000.Cog Project:构建仿人机器人。见《隐喻、类比和代理的计算》(《人工智能讲座笔记》第 1562 期),Chrystopher Nehaniv(编辑)。Springer-Verlag, Berlin, 52-87.
[18] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] Ziegler, Jeffrey Wu, Clemens Winter Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei.2020.ArXiv:2005.14165 [cs.CL].
[19] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al 2023. Sparks of artificial general intelligence: Early experiments with gpt-4 arXiv preprint arXiv:2303.12712 (2023). [19] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al 2023.人工通用智能的火花:gpt-4 的早期实验 arXiv preprint arXiv:2303.12712 (2023).
[20] Robin Burkinshaw. 2009. Alice and Kev: The Story of Being Homeless in The Sims 3. [20] Robin Burkinshaw.2009.Alice and Kev: The Story of Being Homeless in The Sims 3.
[21] Chris Callison-Burch, Gaurav Singh Tomar, Lara Martin, Daphne Ippolito, Suma Bailis, and David Reitter. 2022. Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9379-9393. https://aclanthology.org/2022.emnlpmain. 637 [21] Chris Callison-Burch、Gaurav Singh Tomar、Lara Martin、Daphne Ippolito、Suma Bailis 和 David Reitter。2022.龙与地下城:人工智能的对话挑战》(Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence.自然语言处理经验方法 2022 年会议论文集》。计算语言学协会,阿拉伯联合酋长国阿布扎比,9379-9393。https://aclanthology.org/2022.emnlpmain.637
[22] Stuart K Card, Thomas P Moran, and Allen Newell. 1980. The keystroke level model for user performance time with interactive systems. Com mun. ACM 23, 7 (1980), 396-410. https://doi.org/10.1145/358886.358895 arXiv:https://doi.org/10.1145/358886.358895 [22] Stuart K Card, Thomas P Moran, and Allen Newell.1980.交互式系统用户性能时间的击键水平模型。Com mun.ACM 23,7 (1980),396-410。https://doi.org/10.1145/358886.358895 arXiv:https://doi.org/10.1145/358886.358895
[23] Stuart K Card, Thomas P Moran, and Alan Newell. 1983. The psychology of human-computer interaction. (1983) [23] Stuart K Card, Thomas P Moran, and Alan Newell.1983.人机交互心理学》。(1983)
[24] Alex Champandard. 2012. Tutorial presentation. In IEEE Conference on Compu tational Intelligence and Games. [24] Alex Champandard.2012.教程演示。电气和电子工程师学会计算智能与游戏大会。
[25] Dong kyu Choi, Tolga Konik, Negin Nejati, Chunki Park, and Pat Langley. 2021 A Believable Agent for First-Person Shooter Games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 3 . [25] Dong kyu Choi、Tolga Konik、Negin Nejati、Chunki Park 和 Pat Langley.2021 第一人称射击游戏的可信代理。在 AAAI 人工智能和交互式数字娱乐会议论文集,第 3 卷。
[26] Anind K Dey. 2001. Understanding and using context. Personal and ubiquitous computing 5 (2001), 4-7. [26] Anind K Dey.2001.理解和使用上下文。个人与泛在计算 5 (2001),4-7.
[27] Kevin Dill and L Martin. 2011. A Game AI Approach to Autonomous Control of Virtual Characters. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC'11). Orlando, FL, USA. [27] Kevin Dill 和 L Martin。2011.虚拟角色自主控制的游戏 AI 方法》。国际服务/行业培训、仿真和教育会议(I/ITSEC'11)论文集。美国佛罗里达州奥兰多。
[28] David Easley and Jon Kleinberg. 2010. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge university press. [28] David Easley and Jon Kleinberg.2010.Networks, crowds, and markets:Reasoning about a highly connected world.剑桥大学出版社。
[29] Arpad E Elo. 1967. The Proposed USCF Rating System, Its Development, Theory, and Applications. Chess Life XXII, 8 (August 1967), 242-247. [29] Arpad E Elo.1967.建议的 USCF 等级分制度,其发展、理论和应用。Chess Life XXII, 8 (August 1967), 242-247.
[30] Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM . [30] Jerry Alan Fails and Dan R Olsen Jr. 2003.交互式机器学习。第 8 届智能用户界面国际会议论文集》。ACM .
[31] Ethan Fast, William McGrath, Pranav Rajpurkar, and Michael S Bernstein. 2016 Augur: Mining human behaviors from fiction to power interactive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing System. . [31] Ethan Fast, William McGrath, Pranav Rajpurkar, and Michael S Bernstein.2016 Augur:从小说中挖掘人类行为,为交互系统提供动力。In Proceedings of the 2016 CHI Conference on Human Factors in Computing System. 。
[32] Rebecca Fiebrink and Perry R Cook. 2010. The Wekinator: a system for real-time interactive machine learning in music. In Proceedings of The Eleventh Interna tional Society for Music Information Retrieval Conference (ISMIR 2010)(Utrecht), Vol. 3. Citeseer, 2-1. [32] Rebecca Fiebrink and Perry R Cook.2010.Wekinator: a system for real-time interactive machine learning in music.In Proceedings of The Eleventh Interna tional Society for Music Information Retrieval Conference (ISMIR 2010)(Utrecht), Vol. 3. Citeseer, 2-1.
[33] Uwe Flick. 2009. An Introduction to Qualitative Research. SAGE.
[34] James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik Interactive Concept Learning in Image Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI'08) Association for Computing Machinery, New York, NY, USA, 29-38. https //doi.org/10.1145/1357054.1357061
[35] Adam Fourney, Richard Mann, and Michael Terry. 2011. Query-feature graphs bridging user vocabulary and system functionality. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST) (Santa Barbara California, USA). ACM
[36] Tom Francis. 2010. The Minecraft Experiment, day 1: Chasing Water falls. http://www.pcgamer.com/2010/11/20/the-minecraft-experiment-day- 1-chasing-waterfalls/ [36] Tom Francis.2010.The Minecraft Experiment, day 1: Chasing Water falls.http://www.pcgamer.com/2010/11/20/the-minecraft-experiment-day- 1-chasing-waterfalls/
[37] Jonas Freiknecht and Wolfgang Effelsberg. 2020. Procedural Generation of Interactive Stories using Language Models. In International Conference on the Foundations of Digital Games (FDG '20). ACM, Bugibba, Malta, 8. https://doi. org/10.1145/3402942.3409599 [37] Jonas Freiknecht and Wolfgang Effelsberg.2020.使用语言模型生成交互式故事的程序。数字游戏基础国际会议(FDG '20)。ACM, Bugibba, Malta, 8.https://doi. org/10.1145/3402942.3409599
[38] Tianyu Gao, Adam Fisch, and Danqi Chen. 2020. Making Pre-trained Language Models Better Few-shot Learners. CoRR abs/2012.15723 (2020). arXiv:2012.15723 https://arxiv.org/abs/2012.15723 [38] Tianyu Gao, Adam Fisch, and Danqi Chen.2020.让预训练的语言模型更好地帮助少数学习者。CoRR abs/2012.15723 (2020). arXiv:2012.15723 https://arxiv.org/abs/2012.15723
[39] Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM. [39] Perttu Hämäläinen、Mikke Tavast 和 Anton Kunnari。2023.在生成合成人机交互研究数据中评估大型语言模型:案例研究。In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems.ACM.
[40] Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Cote, and Xinyu Yuan. 2020. Interactive Fiction Games: A Colossal Adventure. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7903-7910. https://doi.org/10.1609/aaai.v34i05.6297 [40] Matthew Hausknecht、Prithviraj Ammanabrolu、Marc-Alexandre Cote 和 Xinyu Yuan。2020.互动小说游戏:巨大的冒险。In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.7903-7910.https://doi.org/10.1609/aaai.v34i05.6297
[41] Chris Hecker. 2011. My Liner Notes for Spore. http://chrishecker.com/My_liner_ notes_for_spore [41] Chris Hecker.2011.My Liner Notes for Spore.http://chrishecker.com/My_liner_ notes_for_spore
[42] Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. TrueSkill : A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2006/file/ f44ee263952e65b3610b8ba51229d1f9-Paper.pdf [42] Ralf Herbrich、Tom Minka 和 Thore Graepel。2006.TrueSkill : A Bayesian Skill Rating System.见《神经信息处理系统进展》,B. Schölkopf、J. Platt 和 T. Hoffman(编),第 19 卷。麻省理工学院出版社。https://proceedings.neurips.cc/paper_files/paper/2006/file/ f44ee263952e65b3610b8ba51229d1f9-Paper.pdf
[43] Douglas Hofstadter. 1995. Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought. Basic Books. [43] Douglas Hofstadter.1995.Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought. Basic Books.Basic Books.
[44] James D. Hollan, Edwin L. Hutchins, and Louis Weitzman. 1984. STEAMER: An Interactive Inspectable Simulation-Based Training System. AI Magazine 5, 2 (1984), . [44] James D. Hollan, Edwin L. Hutchins, and Louis Weitzman.1984.STEAMER:基于交互式可检查仿真的培训系统。AI Magazine 5, 2 (1984), .
[45] Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65-70. https://doi.org/notspecified [45] Sture Holm.1979.一个简单的顺序拒绝多重检验程序。Scandinavian Journal of Statistics 6, 2 (1979), 65-70.https://doi.org/notspecified
[46] John J. Horton. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv:2301.07543 [econ.GN] [46] John J. Horton.2023.作为模拟经济代理的大型语言模型:我们能从智人Silicus身上学到什么? arXiv:2301.07543 [econ.GN]
[47] Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159-166. [47] Eric Horvitz.1999.混合倡议用户界面的原则》。In Proceedings of the SIGCHI conference on Human Factors in Computing Systems.159-166.
[48] Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. 2022. Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv:2207.05608 [cs.RO] [48] 黄文龙、夏飞、肖特、Harris Chan、梁杰、Pete Florence、曾安迪、Jonathan Tompson、Igor Mordatch、Yevgen Chebotar、Pierre Sermanet、Noah Brown、Tomas Jackson、Linda Luu、Sergey Levine、Karol Hausman 和 Brian Ichter。2022.内心独白:ArXiv:2207.05608 [cs.RO].
[49] Kristen Ibister and Clifford Nass. 2000. Consistency of personality in interactive characters: verbal cues, non-verbal cues, and user characteristics. International Ұournal of Human-Computer Studies 52, 1 (2000), 65-80. [49] Kristen Ibister and Clifford Nass.2000.交互式角色的个性一致性:语言线索、非语言线索和用户特征》。International Ұournal of Human-Computer Studies 52, 1 (2000), 65-80.
[50] Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA '22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
[51] Bonnie E John and David E Kieras. 1996. The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on ComputerHuman Interaction (TOCHI) 3, 4 (1996), 320-351.
[52] Randolph M Jones, John E Laird, Paul E Nielsen, Karen J Coulter, Patrick Kenny, and Frank V Koss. 1999. Automated Intelligent Pilots for Combat Flight Simulation. AI Magazine 20, 1 (1999), 27-42.
[53] Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. 2023. Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv:2212.14024 [cs.CL]
[54] Bjoern Knafla. 2011. Introduction to Behavior Trees. http://bjoernknafla.com/ introduction-to-behavior-trees
[55] Ranjay Krishna, Donsuk Lee, Li Fei-Fei, and Michael S. Bernstein. 2022. Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences 119, 39 (2022), e2115730119. https://doi.org/10.1073/pnas. 2115730119 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas. 2115730119
[56] William H Kruskal and WA Wallis. 1952. Use of ranks in one-criterion variance analysis. J. Amer. Statist. Assoc. 47, 260 (1952), 583-621. https://doi.org/10.1080/ 01621459.1952 .10483441
[57] Phaser Labs. 2023. Welcome to Phaser 3. https://phaser.io/phaser3. Accessed on: 2023-04-03.
[58] John Laird. 2001. It Knows What You're Going To Do: Adding Anticipation to a Quakebot. In Proceedings of the 2001 Workshop on Intelligent Cinematography and Editing. 63-69.
[59] John Laird and Michael VanLent. 2001. Human-Level Al's Killer Application: Interactive Computer Games. AI Magazine 22, 2 (2001), 15. https://doi.org/10. 1609/aimag.v22i2.1558
[60] John E. Laird. 2000. It Knows What You're Going To Do: Adding Anticipation to a QUAKEBOT. In Papers from the AAAI 2000 Spring Symposium on Artificial
Intelligence and Interactive Entertainment (Technical Report SS-00-02). AAAI Press, . 智能与互动娱乐(技术报告 SS-00-02)。AAAI Press,