You Exist In The Long Context
您存在于漫长的语境中你存在于更广阔的背景中。
Published on November 20, 2024 by Steven Johnson
发布于 2024 年 11 月 20 日,由史蒂文·约翰逊发表于2024年11月20日,作者史蒂文·约翰逊
Let's start with a game.
让我们开始一个游戏。我们先来玩个游戏吧。
Play as long as you like. When you are ready to step back and reflect on what makes this game possible, scroll down to keep reading.
尽情游玩。当你准备好后退一步,反思是什么让这款游戏成为可能时,向下滚动以继续阅读。尽情畅玩吧。当你准备好暂停游戏,思考是什么让它成为可能时,请滚动页面继续阅读。
Type "let's play" to start the game
输入"让我们开始玩"以启动游戏输入“让我们开始游戏吧”即可开始游戏。
start 开始 开始
Ah, yes. "Start." A word that implies beginnings, fresh starts, clean slates. But in our line of work, Detective Faurot, beginnings are often messy, entangled with the loose ends of what came before. This morning, those loose ends lead us to 171 Wooster Street, a garment factory in Soho owned by the Bernstein brothers. A burglary, they say. A clean job, too, from the initial reports. Your goal, Detective, is to identify the perpetrator using your, shall we say, unconventional 非传统的 methods. You have a maximum of 10 actions to achieve this. Each instruction you give, each place you examine, each question you ask counts as an action. You can use an action to ask me for help, historical context, or biographical information at any time. Now, Officer Fitzgerald awaits you at the scene. The crisp February air bites at your cheeks as you step out of the police wagon. Fitzgerald, a man built like a brick outhouse and with a face like a crumpled newspaper, nods curtly. "Mornin', Inspector. Looks like our fella knew what he was doin'." He gestures towards the building. "Through the abandoned loft on the third floor, cut a hole clean through the ceilin' into the sweatshop above. Made off with a fair bit of merchandise. No forced entry, no witnesses. A ghost, this one."
啊,是的。“开始。”一个意味着开始、新鲜开始、干净开始的词。但在我们的工作中,福罗特侦探,开始往往很混乱,被之前遗留的问题缠绕。今天早上,这些遗留问题将我们引向了索霍区的 171 号沃斯特街,伯恩斯坦兄弟拥有的服装厂。他们说,发生了盗窃。初步报告称,这是一起干净利落的案件。你的目标,侦探,是使用你的,我们姑且称之为,非传统方法来识别罪犯。你最多有 10 个行动来完成这个目标。每个指令、每个检查的地方、每个询问都算一个行动。你可以随时使用一个行动来请求我的帮助、历史背景或人物背景信息。现在,费尔迪南特警官正在现场等待你。二月的清晨,冷风刺痛着你的脸颊,你从警车中走出。费尔迪南特,一个像砖砌厕所一样结实的男人,脸上皱巴巴的,简短地点了点头。“早上好,督察。看来我们的家伙知道自己在做什么。”他指向了那栋建筑。“穿过废弃的三楼阁楼,从天花板上挖了一个干净的洞,通向上面的车间。” 拿走了一些商品。没有强行进入,没有目击者。这是一只幽灵。"
What do you do, Detective Faurot? You have 10 actions remaining.
您是福罗特侦探,您还有 10 次行动。
What you’ve just experienced is an interactive adventure based on the text of my latest history book, The Infernal Machine 《地狱机器》. At its core, the game relies on three elements: the original text from my book; a large language model (in this case, Gemini Pro 1.5); and a 400-word prompt that I wrote giving the model instructions on how to host the game, based on the facts contained in the book itself. You could take any comparable narrative text—fiction or nonfiction—and create an equally sophisticated game in a matter of minutes, just by slightly altering the wording of the prompt.
您刚刚经历的是基于我最新历史著作《地狱机器》文本的互动冒险。游戏的核心由三个要素组成:我书中的原始文本;一个大型语言模型(在这种情况下是金牛座 Pro 1.5);以及我根据书中包含的事实撰写的 400 字提示,用于指导模型如何主持游戏。您只需稍微改变提示的措辞,就可以在几分钟内创建任何可比叙事文本——无论是小说还是非小说——的同样复杂的游戏。你刚才体验的是基于我最新历史著作《地狱机器》的互动冒险游戏。游戏的核心基于三个要素:我书中的原始文本;一个大型语言模型(本例中为杰米尼Pro 1.5);以及一个400字的提示,我向模型提供了如何基于书中的事实来主持游戏的指令。你可以在几分钟内使用任何类似的叙事文本(小说或非小说)创建同样复杂的游戏,只需稍微修改提示中的措辞即可。
I’ve played my way through these events from Infernal Machine more than a dozen times now, and every time the experience has been remarkably true to the central facts of Joseph Faurot’s efforts to use cutting-edge forensic science to solve the Soho loft mystery back in 1911. Exploring the world of the game brought me in contact with other real-world figures from the story: Officer Fitzpatrick, the police officer who first inspected the crime scene, or the criminal himself (who shall remain nameless in the event you have not successfully completed the game.) As I played through the story, the virtual host of the game explained the historical and scientific significance of the events, and artfully prevented me from wandering too far from the historical timeline. I’d wager your experience closely matched mine.
我已经多次通过 Infernal Machine 中的这些事件,现在至少有十几遍了,每次体验都惊人地忠实于约瑟夫·法罗特(Joseph Faurot)利用尖端法医科学解决 1911 年苏豪阁楼谜案努力的核心事实。探索游戏世界让我接触到了故事中的其他真实人物:首先检查犯罪现场的警官菲茨帕特里克,或者说是罪犯本人(如果尚未成功完成游戏,则姓名将不予透露)。随着我通过故事的进行,游戏中的虚拟主持人解释了事件的历史和科学意义,并巧妙地防止我偏离历史时间线。我敢打赌,你的体验与我的非常相似。我已经在《地狱机器》这款游戏中体验了十几次这些事件,每次的体验都与约瑟夫·福洛特(Joseph Faurot)利用尖端法医科学解决1911年索霍阁楼谜案的核心事实惊人地一致。在游戏中探索这个世界让我接触到了故事中的其他现实人物:首先勘察犯罪现场的警察弗茨帕特里克(Officer Fitzpatrick),或者那个罪犯(如果你没有成功完成游戏,他将保持匿名)。在我玩过这个故事后,游戏的虚拟主持人解释了这些事件的历史和科学意义,并巧妙地防止我偏离历史时间线太远。我打赌你的体验和我的非常相似。
The fact that a machine now has the ability to transform linear narratives into immersive adventures has significant implications for both education and entertainment. I’ve generated a similar game just with the Wikipedia entry for the Cuban Missile Crisis. (You play as JFK trying to avoid nuclear war.) The possibilities are truly endless, in part because it is in the nature of games to multiply possibilities. But I want to start with a more fundamental observation about the specific skills that are on display when a large language model turns a linear text into an interactive simulation. (Just to be clear: people have been playing text-based adventures on computers for almost fifty years, but until now the game creators had to write out almost every possible variation of the script and anticipate all the potential narrative branching paths.) Put aside the heated debates over the future emergence of machine sentience or artificial general intelligence. Instead, just focus on the basic tasks you have to accomplish in order to transform a 300-page book into an historically-grounded and entertaining game:
机器现在能够将线性叙述转化为沉浸式冒险的能力,对教育和娱乐领域都具有深远的影响。我生成了一个类似的游戏,仅基于古巴导弹危机的维基百科条目。(你扮演的是试图避免核战争的肯尼迪总统。)可能性几乎是无穷无尽的,部分原因在于游戏本性就是不断扩展可能性。但我想要从一个更基本的角度来观察大型语言模型将线性文本转化为互动模拟时所展示的具体技能。 (请务必明确:人们在计算机上玩基于文本的冒险游戏已经近五十年了,但直到现在,游戏创作者必须编写几乎每一种可能的剧本变体,并预见到所有潜在的叙事分支路径。)撇开关于机器意识未来出现或通用人工智能的基本辩论不谈,而是专注于将 300 页的书籍转化为具有历史背景和娱乐性的游戏所需的基本任务:机器现在能够将线性叙事转化为沉浸式冒险,这一事实对于教育和娱乐都具有重大意义。我仅用古巴导弹危机的维基百科条目就生成了一个类似的游戏(你扮演约翰·肯尼迪,试图避免核战争)。可能性确实无穷无尽,部分原因是游戏的本性就是增加可能性。但我想从一个更基本的观察开始,即当大型语言模型将线性文本转换为交互式模拟时,展示了哪些特定技能。(为了避免混淆:人们在电脑上玩基于文本的冒险游戏已经有将近50年了,但直到现在,游戏制作者必须几乎写出每一个可能的剧本版本,并预料所有潜在的叙事分支路径。)撇开关于机器意识或通用人工智能未来出现的热烈辩论。相反,专注于将300页的书籍转换为基于历史且有趣的游戏所需的基本任务:
- You need a reliable command of the facts in the source text, but also the ability to improvise new scenes that are faithful to the core material without being directly grounded in it.
你需要对源文本中的事实有可靠的理解,同时还需要具备即兴创作与核心内容忠实但非直接挂钩的新场景的能力。你需要对源文本中的事实有可靠的掌握,同时也需要具备根据核心材料即兴创作新场景的能力,而不必完全局限于这些材料。 - You have to keep track of what the player knows and doesn’t know; what they’ve learned; what they merely suspect.
你必须跟踪玩家知道和不知道的事情;他们学到了什么;他们只是怀疑的事情。你必须时刻关注玩家知道什么、不知道什么;他们学到了什么;他们只是怀疑什么。 - You must maintain two parallel narratives: the factual chronology and the alternate fictional timeline, and concoct plausible return paths to the main highway of factuality when the player ventures too far afield.
您必须维护两个并行叙述:事实时间线和替代的虚构时间线,并构思合理的回归路径到事实的主要高速公路,当玩家偏离得太远时。你必须同时保持两个平行的叙事:事实的时间线和虚构的平行时间线,并在玩家偏离主线太远时,创造出合理的回归路径,以使他们重新回到事实的时间线。
Needless to say, these are all very difficult things to do. It would take the full force of my attention for me to host a game like this one as a human being, and I am the author of the original book that the game is based on! Two years ago, no computer in the world could perform those tasks. Now anyone with a web browser and a laptop can get an AI to perform them just by writing a 400-word prompt and uploading a PDF of a source text.
无需多言,这些都是非常困难的事情。作为人类,要举办这样一个游戏,我需要全神贯注,而我是这款游戏所基于的原始书籍的作者!两年前,世界上没有任何一台计算机能够完成这些任务。现在,只要有网络浏览器和笔记本电脑,任何人都可以通过编写一篇 400 字的提示并上传源文本的 PDF,就能让 AI 完成这些任务。显然,这些任务都非常困难。作为一个人类,要像这样主持一场比赛,我必须全神贯注,而我是这款游戏的原著作者!两年前,世界上没有任何一台计算机能够完成这些任务。现在,只要拥有一个网页浏览器和一台笔记本电脑,任何人都可以通过编写一篇400字的说明和上传一份源文本的PDF文件,让AI完成这些任务。
All that seems like a meaningful step forward. So what made it possible?
这看起来像是一个有意义的进步。那么是什么让它成为可能的呢?这一切似乎都是向前迈出了有意义的一步。那么,是什么让这一切成为可能的呢?
. . .
Too often, when people talk about the AI progress of the past few years, they focus on metrics like size of the training data, or the number of parameters in the final model, which ultimately create the general cognitive ability and background knowledge that the model brings to each exchange you have with it. But I would argue that the Inspector Faurot game demonstrates a different leap forward, one that is not appreciated enough in the popular discussion of the AI revolution. The ability to host a factually-grounded and entertaining role-playing game based on a book is not primarily the result of larger training sets, or the so-called “parametric memory” of the model. What you are experiencing walking through the streets of Soho in that adaptation of The Infernal Machine is better understood as the byproduct of a different advance: the dramatic increase in the model’s context window that we have seen over the past 18 months.
在过去的几年里,当人们谈论 AI 进展时,他们往往关注诸如训练数据量的大小,或者最终模型中的参数数量等指标,这些最终形成了模型带给每次与它交流时的一般认知能力和背景知识。但我认为,福罗特探员游戏展示了不同的飞跃,这种飞跃在大众讨论 AI 革命时往往被低估。基于一本书构建的、事实依据的且有趣的角色扮演游戏的能力,主要不是由于更大的训练集,或者所谓的“参数记忆”所导致的。在那部改编自《地狱机器》的索霍街道上漫步所体验到的,更应被理解为过去 18 个月中模型上下文窗口显著增加这一不同进展的副产品。人们在谈论过去几年的人工智能进展时,往往会关注诸如训练数据规模或最终模型参数数量等指标,这些指标最终决定了模型在与你每次交互时所带来的通用认知能力和背景知识。但我认为,“福洛特侦探”游戏展示了另一种飞跃,这种飞跃在人工智能革命的主流讨论中并未得到足够的重视。能够基于一本书举办一场基于事实且有趣的角色扮演游戏的能力,并不主要是由于更大的训练集或模型所谓的“参数记忆”。你在“地狱机器”改编版中漫步在索霍街头的经历,更像是另一种进步的副产品:在过去18个月中,我们已经看到了模型上下文窗口的显著增加。
I’ll explain in depth why the context window is so crucial, but for now think of a language model as having two different kinds of memory: long-term parametric memory based on its training runs, and short-term memory—the context window—where it focuses on new information that the user supplies. GPT-2, introduced in 2019, had 1.5 billion parameters; the breakthrough model GPT-3 increased the parametric memory to 175 billion parameters, slightly more than a 100x increase; GPT-4 is rumored to be roughly 10X bigger than GPT-3. In other words, in the four years of technological progress between 2019 and 2023, we saw a 1,000-fold increase in the long-term memory of one of the leading models.
我将深入解释为什么上下文窗口如此重要,但目前请将语言模型视为具有两种不同类型的记忆:基于其训练运行的长期参数化记忆,以及短期记忆——上下文窗口,它专注于用户提供的新信息。2019 年引入的 GPT-2 有 15 亿个参数;突破性模型 GPT-3 将参数化记忆增加到 1750 亿个参数,大约是 100 倍的增加;据说 GPT-4 大约是 GPT-3 的 10 倍大。换句话说,在 2019 年至 2023 年的四年技术进步中,我们看到了领先模型长期记忆的 1000 倍增长。我会详细解释为什么上下文窗口如此关键,但目前可以将语言模型想象为具有两种不同类型的记忆:基于其训练运行的长期参数记忆,以及专注于用户提供的新信息的短期记忆(即上下文窗口)。2019年推出的GPT-2拥有15亿个参数;突破性模型GPT-3将参数记忆增加到1750亿个参数,略微超过了100倍的增长;据传闻,GPT-4的参数大约是GPT-3的10倍。换句话说,从2019年到2023年的四年技术进步期间,我们见证了领先模型长期记忆的1000倍增长。
Now compare that timeline to what has happened with the context window. GPT-3 (and Google’s PALM model from that period) had a context window of just over 2,000 “tokens,” which translates to about 1,500 words. That was the limit of new information that you could share with the most advanced language models, circa 2022. Just two years later, Google introduced a new version of Gemini that featured a context window of two million tokens. It took four years for the language models to increase their long-term memory by a factor of a thousand. But their short-term memory made a comparable improvement in just two years. Anyone who tells you that language models have plateaued since the introduction of ChatGPT is not paying attention to what has happened with the context window. And it turns out that many of the legitimate criticisms that were leveled against language models during the first wave of hype about them were unwittingly responding to how narrow the context window was in those early days.
将此时间线与上下文窗口所发生的情况进行比较。GPT-3(以及该时期的 Google PALM 模型)的上下文窗口仅为 2000 多个“令牌”,相当于大约 1500 个单词。这是到 2022 年为止,最先进语言模型能够共享的新信息的限制。仅仅两年后,Google 推出了 Gemini 的新版本,其上下文窗口为 200 万个令牌。语言模型的长期记忆在四年内增加了千倍。但它们的短期记忆在两年内实现了相当大的改进。任何告诉你自从 ChatGPT 引入以来,语言模型已经停滞不前的人,没有注意到上下文窗口所发生的变化。而且,事实证明,在关于语言模型的第一波炒作期间,对它们提出的一些合理批评,实际上是无意识地回应了早期上下文窗口的狭窄程度。现在将这个时间线与上下文窗口的情况进行比较。GPT-3(以及同期的谷歌PALM模型)的上下文窗口只有略高于2000个“令牌”,相当于大约1500个单词。这是您在2022年左右可以与最先进的语言模型共享的新信息的极限。仅仅两年后,谷歌推出了具有200万个令牌上下文窗口的新版Gemini。语言模型的长期记忆在四年内增加了1000倍,但它们的短期记忆在两年内就实现了类似的改进。如果有人告诉你,自ChatGPT推出以来,语言模型已经达到了平台期,那说明他们没有注意到上下文窗口的变化。事实证明,在语言模型的第一波炒作期间,人们针对它们提出的许多合理批评实际上是无意识地回应了那些早期上下文窗口的狭隘性。
GPT-3 and PALM seemed impressive at the time, but looking back with two years of hindsight, those models had an obvious flaw: they had a bizarre form of amnesia. So bizarre, in fact, that there are very few cases of anything like it in the history of human mental disorders.
GPT-3 和 PALM 当时给人留下了深刻印象,但回顾过去两年,这些模型显然存在一个明显的缺陷:它们有一种奇特的健忘症。这种健忘症如此奇特,历史上人类精神疾病中很少有类似的情况。GPT-3和PALM在当时看起来令人印象深刻,但回首两年前,这些模型有一个明显的缺陷:它们有一种奇特的健忘症。实际上,这种健忘症是如此奇特,以至于在人类精神疾病的历史上很少有类似的病例。
With one notable exception.
有一个显著的例外。只有一个例外。
. . .
At some point in the summer of 1935, in a residential neighborhood in Manchester, Connecticut, a nine-year-old boy named Henry Molaison ran out into the street to retrieve a baseball and was struck by a passing bicyclist. Molaison's head hit the pavement in the collision, but he seemed fine in the immediate aftermath of the accident; the whole event seemed at first to be one of those ordinary childhood scrapes that leaves a bruise for a few days and then is entirely forgotten. But the accident turned out to produce a far more enduring set of effects, transforming both Henry Molaison's life and the modern understanding of how the brain works.
在 1935 年的夏天,康涅狄格州曼彻斯特的一个住宅区里,一个名叫亨利·莫拉辛的九岁男孩跑上街道去捡一个棒球,结果被一个经过的骑自行车的人撞到。莫拉辛的头部在碰撞中撞到了地面,但事故后不久,他看起来一切正常;整个事件最初似乎只是一次普通的童年擦伤,留下几天的瘀伤,然后就被完全忘记了。但这次事故最终产生了更持久的影响,不仅改变了亨利·莫拉辛的生活,也彻底改变了现代对大脑工作方式的理解。1935年夏天的某个时候,在康涅狄格州曼彻斯特的一个住宅区,一个名叫亨利·莫莱森的九岁男孩跑上街去捡棒球,结果被一辆骑自行车经过的人撞倒了。在碰撞中,莫莱森的头部撞到了人行道上,但在事故发生后的第一时间里,他似乎并没有什么大碍;整个事件一开始看起来只是那种普通的儿童擦伤,几天后就会留下瘀青,然后完全被遗忘。但事实证明,这起事故产生了一系列更为持久的影响,不仅改变了亨利·莫莱森的生活,也改变了现代人对大脑如何运作的理解。
Shortly after the collision, Molaison began having minor seizures that gradually became more severe over the years; by the time he reached adulthood, his grand mal seizures had made it impossible for him to hold down a steady job. In 1953, a neurosurgeon named W.B. Scoville performed an experimental surgery on him in an attempt to eliminate the seizures, removing a significant portion of his medial temporal lobes, including parts of the hippocampus and the amygdala.
碰撞后不久,Molaison 开始出现轻微的抽搐,随着时间的推移逐渐变得更加严重;到他成年时,他的全面性抽搐已经使他无法维持稳定的工作。1953 年,一名名为 W.B. Scoville 的神经外科医生对他进行了一次实验性手术,试图消除抽搐,移除了他大脑内侧颞叶的大部分,包括海马体和杏仁核的部分。碰撞发生后不久,莫利森开始出现轻微的癫痫症状,这些症状在随后的几年里逐渐恶化;等到他成年后,他已经无法控制大发作型癫痫,这使得他无法保持一份稳定的工作。1953年,神经外科医生W.B.斯科维尔对他进行了一次实验性手术,试图消除他的癫痫症状,切除了他大脑中颞叶的一大部分,包括海马体和杏仁核的部分区域。
The procedure did in fact lessen the intensity of Molaison's seizures, but it left him with a profound and uncanny deficit: an almost complete inability to form new memories. Events that had occurred up to two years before his surgery were preserved in his mind: he could tell you who had won the World Series in 1948, and recount old family stories from his childhood. Because his short-term memory was still intact, he could remember brief snippets of information in real time. But once he ran through the buffer of his short-term memory, the new information was lost to him forever. In conversation with a stranger, Molaison would initially not display any cognitive impairments, but after just a few turns in the exchange, Molaison would lose the thread, and eventually require a re-introduction to the person he had been talking to. Molaison lived in a world without any continuity between the distant past and the immediate present. “He was a very gracious man, very patient, always willing to try these tasks I would give him,” Dr. Brenda Milner, a psychologist who worked with Molaison for many years, recalled after his death in 2008. “And yet every time I walked in the room, it was like we’d never met.”
该过程确实减轻了莫拉森的发作强度,但留下了一个深刻且不可思议的缺陷:几乎完全无法形成新的记忆。在他手术前两年发生的事件仍然保留在他的记忆中:他能告诉你 1948 年世界大赛的冠军是谁,还能回忆起儿时的家庭故事。由于他的短期记忆仍然完整,他能够实时记住短暂的信息片段。但一旦他的短期记忆缓冲区耗尽,新的信息就会永远消失。在与陌生人交谈时,莫拉森最初不会表现出认知障碍,但在对话的几个回合后,莫拉森会失去对话的线索,最终需要重新介绍他正在交谈的人。莫拉森生活在一个过去遥远与当下即时之间没有连续性的世界里。“他是一个非常有礼貌的人,非常有耐心,总是愿意尝试我给他的任务,”与莫拉森合作多年的心理学家布伦达·米尔纳博士在 2008 年莫拉森去世后回忆道。 "然而,每次我走进房间,都像是我们从未见过面一样。"”该手术确实减轻了莫利森的癫痫发作强度,但他留下了一种令人不安的严重缺陷:几乎完全丧失了形成新记忆的能力。手术前两年内发生的事件仍保存在他的记忆中:他能告诉你1948年世界系列赛的获胜者,还能讲述他童年时期的一些旧家庭故事。因为他的短期记忆仍然完好无损,他能够实时记住一些片段信息。但一旦他耗尽了短期记忆缓冲区,新的信息就永远从他脑海中消失了。与陌生人交谈时,莫利森起初不会表现出任何认知障碍,但在交流过程中经过几次转折后,他就会失去线索,最终需要重新介绍自己正在交谈的人。莫利森生活在一个没有过去和现在连续性的世界里。“他是一个非常有礼貌的人,非常有耐心,总是愿意尝试我给他布置的任务,”与莫利森共事多年的心理学家布伦达·米尔纳(Brenda Milner)在2008年他去世后回忆道。然而每次我走进房间时,感觉就好像我们从未见过面一样。
Molaison's condition sparked a revolution in our understanding of how human memory works, helping scientists understand the distinction between long-term and short-term memories, and pinpointing the regions of the brain that specialized in transforming the fleeting experience of present-tense existence into more durable records. (Christopher Nolan famously turned Molaison's condition into a dramatic device in his breakout film, Memento.) Known during his lifetime only as H.M. in the scientific papers published by the men and women who studied him, Molaison ultimately became "the most important patient in the history of brain science," according to his obituary in The New York Times.
莫拉松的状况激发了我们对人类记忆工作方式的理解革命,帮助科学家理解了长期记忆和短期记忆之间的区别,并指出了大脑中专门负责将当前存在的一闪即逝的体验转化为更持久记录的区域。克里斯托弗·诺兰在他的成名作《记忆碎片》中,将莫拉松的状况转化为戏剧手法,广为人知。在那些研究他的男女科学家发表的论文中,他生前仅被称为 H.M.,最终,根据《纽约时报》的讣告,莫拉松成为了“脑科学史上最重要的一位病人”。
摩拉松被认为在余生中一直是一个和蔼的交谈者,当你在他短期记忆的窗口内时,他积极参与且逻辑清晰,能够从手术前形成的长期事实和思想档案中汲取信息。但他明显有些不正常。他现在时的窗口太短,无法让他有足够的时间去做有意义的事情。被困在永恒的现在时中,他无法完成许多对人类身份至关重要的认知和情感任务:建立持久的新关系,学习新概念,理解复杂的叙述。据说莫利森在余生中一直是一个和蔼可亲的健谈者,当你在他的短期记忆窗口内时,他能够保持专注和连贯,能够从手术前形成的长期记忆档案中提取事实和想法。但他显然有些不对劲。对他来说,现在时的窗口太短,以至于无法有效地生活。他被困在永恒的现在之中,无法完成许多对人类来说至关重要的认知和情感任务:建立持久的新关系、学习新概念、理解复杂的故事情节。莫利森的状况引发了我们对人类记忆工作原理的理解的一场革命,帮助科学家理解长期记忆和短期记忆的区别,并确定了将短暂的现在时存在体验转化为更持久记录的脑区。(克里斯托弗·诺兰(Christopher Nolan)在成名电影《记忆碎片》(Memento)中将莫利森的状况变成了一个戏剧性元素。)在他的科学论文中,研究他的男性和女性科学家只称他为H.M.,而在《纽约时报》对他的讣告中,他最终被誉为“脑科学研究历史上最重要的病人”。
A large language model differs from a human brain in many fundamental ways. But the tragic case of Patient H.M. gives us a useful analogy for understanding what has happened to the state of the art of AI over the past two years. Those of us who first played around with GPT-3 in the 2021-22 period were interacting with the software equivalent of post-surgery Henry Molaison: the language model seemed to possess a vast (if often unreliable) knowledge of the world, along with an impressive command of language. But it was incapable of assimilating new information, or carrying on a coherent conversation. You could give it short instructions and it would do its best to follow them given the information stored in its long-term parametric memory. But it couldn’t follow extended narratives or explanations. The buffer of its memory was just over a thousand words; exceed that buffer and it would forget whatever information you had shared at the outset of the exchange. “Talking” with GPT-3 was like Brenda Milner talking with H.M.: the sentences were engaging and semantically coherent, but after every short exchange, you had to start from scratch again.
大型语言模型在许多基本方面与人类大脑不同。但患者 H.M.的悲剧案例为我们提供了一个有用的类比,帮助我们理解过去两年人工智能前沿状态的变化。在 2021-22 年间首次接触 GPT-3 的我们,与手术后的亨利·莫洛森的软件等价物互动:语言模型似乎拥有广泛(尽管往往不可靠)的世界知识,以及令人印象深刻的语言能力。但它无法吸收新信息,或进行连贯的对话。你可以给出简短的指令,它会根据存储在长期参数记忆中的信息尽力遵循。但它无法理解或解释延伸的故事或解释。它的记忆缓冲区只有大约一千个单词;超过这个缓冲区,它就会忘记最初交流时共享的任何信息。“与 GPT-3 交谈”就像布伦达·米尔纳与 H.M.交谈:句子引人入胜且语义连贯,但在每次简短的交流后,你都必须从头开始。大型语言模型在许多根本性方面与人脑不同。但帕特森·H.M.的悲惨病例为我们提供了一个有用的类比,有助于理解过去两年人工智能领域的发展状况。那些在2021-22年间首次与GPT-3互动的人,就好像在与亨利·莫莱森(Henry Molaison)进行交流:语言模型似乎拥有庞大(尽管经常不可靠)的世界知识和出色的语言运用能力。但它无法吸收新信息,也无法进行连贯的对话。你可以给它下达简短的指令,它会根据存储在长期参数记忆中的信息尽力执行这些指令。但它无法理解冗长的叙述或解释。它的记忆缓冲区只有大约1000个单词;超过这个缓冲区,它就会忘记你一开始分享的所有信息。与GPT-3的“对话”就像布兰达·米尔纳(Brenda Milner)与H.M.的对话:句子引人入胜且语义连贯,但在每次简短的交流之后,你都必须从头开始。
This wasn’t commented on enough at the time in the popular press, but in a very real sense the explosion of interest in AI after the launch of ChatGPT in December of 2022 was more a product of the context window expanding than it was some advance in the model’s “general” intelligence. ChatGPT had 8K of context – four times that of its predecessor GPT-3. The extra context allowed OpenAI to quietly fill the model’s short-term memory with your conversation history. Each time you posed a question to ChatGPT, the model was fed both your query and the preceding turns in the conversation—as many turns as would fit in 8K worth of context (roughly 5,000 words.) The magic of ChatGPT was, in a sense, the magic of seeing the story of Henry Molaison’s life in reverse: a form of intelligence trapped in a perpetual present, constantly forgetting something that was mentioned only a few seconds before, then miraculously able to maintain new facts or ideas over a longer period of time, thanks to an expanded context window. You could tell ChatGPT a new story in one exchange, and then discuss something else, and then make a passing reference back to the original story, and ChatGPT would pick up the thread without requiring any additional reminders. Or it could guide you through an explanation of a complex topic, and remember the first stages of its instruction, building up the scaffolding of an explanation like an accomplished teacher. All those new possibilities emerged out of just a fourfold increase in the context window.
当时在大众媒体上讨论得不够充分,但从非常实际的意义上讲,2022 年 12 月 ChatGPT 发布后人工智能领域兴趣激增更多是由于上下文窗口的扩大,而不是模型“普遍”智能的提升。ChatGPT 拥有 8K 的上下文,是其前身 GPT-3 的四倍。额外的上下文让 OpenAI 悄悄地将对话历史填充到模型的短期记忆中。每次向 ChatGPT 提出问题时,模型都会接收你的查询以及对话中前几轮的内容——直到 8K(大约 5000 字)的上下文容量为止。ChatGPT 的魔力,在某种意义上,就像是看到了亨利·莫拉索斯(Henry Molaison)一生故事的倒影:一种被困在永恒的现在时态中的智能,能够奇迹般地在几秒钟前提到的事情被遗忘后,仍然维持新的事实或想法,这得益于上下文窗口的扩大。 你可以在一个交流中给 ChatGPT 讲述一个新故事,然后讨论其他事情,之后不经意地提到最初的故事,ChatGPT 会接上话题,无需额外提示。或者它能引导你解释一个复杂主题,并记住初始指导的阶段,像有经验的老师一样构建解释的框架。仅仅四倍增加上下文窗口,就带来了所有这些新的可能性。这一点在当时的大众媒体上没有得到足够的关注,但在某种意义上,2022年12月推出ChatGPT后人们对AI的兴趣激增,更多地是由于语境窗口的扩大,而不是模型“通用”智能的进步。ChatGPT拥有8K的语境——是其前代GPT-3的四倍。额外的语境让OpenAI能够悄悄地将你的对话历史填充到模型的短期记忆中。每当你向ChatGPT提问时,模型会接收到你的问题以及之前的对话回合——尽可能多的回合(大约5000个单词)。ChatGPT的魔力在某种程度上就是看到亨利·莫莱森(Henry Molaison)人生故事的倒叙的魔力:一种被困在永恒的当下的智能,不断忘记几秒钟前提到的内容,但由于语境窗口的扩大,它能够奇迹般地在更长时间内保持新的事实或想法。你可以告诉ChatGPT一个新故事,然后讨论其他事情,然后再以一种随意的方式提到这个故事,ChatGPT会毫不费力地接续上这个话题,而不需要任何额外的提醒。或者它可以引导你解释一个复杂的主题,记住最初的指导步骤,像一位有经验的老师一样搭建起解释的框架。所有这些新可能性都源于上下文窗口的四倍增加。
But an expanded context window generates much more than just conversational fluidity. Language models are far less likely to hallucinate about information included in their context window. In a New Yorker essay 《纽约客》文章 in early 2023, the sci-fi author Ted Chiang famously described language models as a “blurry JPEG of the Web.” It was an apt analogy—for the model’s parametric memory. But the analogy breaks down when applied to information stored in the context window. Facts, concepts, narrative sequences, arguments—all are captured much more accurately when they are conveyed to the model via short-term memory instead of long-term training. When long-context models were first introduced in early 2024, many of the public demonstrations focused on this factual reliability, in what are sometimes called “needle in a haystack” tests, where the model answers a question about a specific fact buried in a large corpus of material. This form of information retrieval is a defining capability of NotebookLM, the AI-powered research and writing tool I have been developing with Google, which will not only give accurate and nuanced information based on the sources that you have uploaded into the model’s context window, but it will also supply inline citations documenting exactly which passages from your original source material were relevant to each part of its answer. NotebookLM is less a “blurry JPEG of the Web,” and more a high-resolution snapshot of your documents that you can inspect in granular detail.
但扩展的上下文窗口生成的远不止对话的流畅性。语言模型在包含其上下文窗口信息的情况下,很少会虚构信息。在 2023 年初的一篇《纽约客》文章中,科幻作家泰德·姜(Ted Chiang)著名地将语言模型描述为“网络的模糊 JPEG”。这恰当地描述了模型的参数记忆。但当将此类比喻应用于存储在上下文窗口中的信息时,它就会失效。事实、概念、叙事序列、论点——当它们通过短期记忆而不是长期训练向模型传达时,都被更准确地捕获。当长上下文模型首次在 2024 年初引入时,许多公共演示都集中在这一事实可靠性上,在所谓的“针在草堆中”测试中,模型能够回答关于大量材料中隐藏的特定事实的问题。 这种信息检索形式是 NotebookLM 的核心能力,这是我在与 Google 合作开发的 AI 驱动的研究和写作工具,它不仅能够根据您上传到模型上下文窗口的来源提供准确且细致的信息,而且还会提供内联引用来记录其答案中哪些段落与原始材料中的哪些部分相关。NotebookLM 更像是对您的文档的高分辨率快照,您可以对其进行精细检查,而不仅仅是网络的模糊 JPEG。但是扩大的上下文窗口产生的不仅仅是流畅的对话。语言模型不太可能对其上下文窗口中包含的信息进行幻想。科幻作家泰德·钱(Ted Chiang)在2023年初的一篇《纽约客》(New Yorker)文章中将语言模型描述为“网页的模糊JPEG”。这是一个贴切的比喻——用来形容模型的参数化记忆。但是,当将这个比喻应用于存储在上下文窗口中的信息时,它就不再适用了。事实、概念、叙事序列、论点——当它们通过短期记忆而不是长期训练传达给模型时,都能被更准确地捕捉到。当长上下文模型在2024年初首次推出时,许多公开演示都集中在其对事实的可靠处理上,有时被称为“大海捞针”测试,即模型回答一个关于隐藏在大量材料中的特定事实的问题。这种信息检索方式是谷歌与我共同开发的AI驱动的研究和写作工具NotebookLM的一项核心功能。它不仅可以根据您上传到模型语境窗口中的来源给出准确而微妙的信息,还可以提供嵌入式引用,详细说明其答案的每个部分与原始来源材料中哪些段落相关。NotebookLM不是“模糊的网页JPEG”,而是对您的文档的高分辨率快照,您可以详细查看其中的内容。
. . .
Those “needle-in-a-haystack” demonstrations were impressive given the language models’ much-deserved reputation for just making stuff up. But they only captured a small sliver of the kinds of tasks that long contexts now make possible. When you put an entire book inside the context window of a model, you are not just giving the model a collection of isolated facts and ideas that can be retrieved through your queries. Because the model can “focus” on the entire text, it is capable of answering questions about crucial narrative elements or chains of cause-and-effect that can only be properly grasped when you have access to the entire sequence of the information.
这些“大海捞针”式的演示,考虑到语言模型因其仅仅编造事实而备受指责的名声,确实令人印象深刻。但它们只捕捉到了长上下文现在可能完成的众多任务中的一小部分。当你将整本书放入模型的上下文窗口中时,你不仅仅是在给模型提供一系列孤立的事实和想法,这些可以通过你的查询来检索。因为模型能够“聚焦”整个文本,它有能力回答关于关键叙事元素或因果链的问题,只有在获得信息的整个序列时才能正确理解这些问题。这些“大海捞针”式的演示虽然令人印象深刻,但考虑到语言模型的名声是“胡编乱造”,这还是不足为奇的。但这些演示只捕捉到了长期上下文能够实现的众多任务中的一小部分。当你把整本书放入模型的上下文窗口中时,你不仅仅是给了模型一组孤立的事实和观点,这些事实和观点可以通过你的查询被检索到。因为模型可以“聚焦”于整个文本,它能够回答关于关键叙事元素或因果链等问题,只有在你拥有整个信息序列的情况下,才能正确理解这些问题。
In early 2024, when I first got access to an initial Gemini million-token context model, one of the first tests I ran was uploading the full text of The Infernal Machine, which at that point had not yet been published, and asking questions about the plot of the book. The fact that the book was still in manuscript form was crucial to the experiment, because it meant that there was no way the book itself—or any commentary about it—could have found its way into the model’s training data. (Authors actually have an unexpected superpower in the age of AI: they can accurately assess a model’s mastery of a 400-page book better than any collection of human or software raters, which turns out to be a remarkably valuable way of testing the model’s overall sophistication.) Because the book is a work of nonfiction history, some of the facts and events described in it would no doubt be a part of the model’s training set, but the distinct presentation of those facts, and the narrative techniques I used to convey them, would only be present in the model’s context window. And so one of the first questions I asked was specifically about those narrative techniques. It was a deliberately challenging question, in that it revolved around missing information—not so much “find the needle in the haystack” as it was “identify the needle that should be in the haystack but that someone seems to have deliberately removed.” Here’s what I asked:
在 2024 年初,当我首次接触到最初的金牛座一百万个令牌上下文模型时,我进行的第一个测试之一是上传了尚未发表的《地狱机器》一书的全文,并询问关于这本书情节的问题。这本书仍处于手稿状态对于实验至关重要,因为这意味着这本书本身或任何关于它的评论都无法进入模型的训练数据中。(在人工智能时代,作者实际上拥有一种出乎意料的超能力:他们可以比任何人类或软件评估者更准确地评估模型对 400 页书的掌握情况,这实际上是一种非常宝贵的方法来测试模型的整体复杂性。)因为这本书是一本非虚构历史作品,其中描述的一些事实和事件肯定会在模型的训练集中,但这些事实的独特色彩,以及我用来传达这些事实的叙述技巧,仅会在模型的上下文窗口中存在。因此,我提出的一个第一个问题就是特别关于这些叙述技巧。 这是一个故意提出具有挑战性的问题,因为它围绕着缺失信息展开——与其说是“在草堆中找针”,不如说是“识别本应在草堆中但似乎有人故意移除的那根针”。“这是我问的:在2024年初,我第一次获得了一个初步的Gemini百万令牌上下文模型,我进行的第一个测试之一是上传尚未出版的《地狱机器》的全文,并询问关于这本书的情节的问题。这本书当时仍处于手稿阶段,这对实验来说至关重要,因为这意味着这本书本身或任何关于它的评论不可能进入模型的训练数据。(在人工智能时代,作者实际上拥有意想不到的超能力:他们能够比任何一组人类或软件评级者更准确地评估一个400页书籍的掌握程度,这被证明是一种非常有价值的测试模型整体复杂度的方法。)因为这本书是一本非虚构历史作品,其中描述的一些事实和事件无疑会成为模型的训练集的一部分,但我对这些事实的独特呈现方式,以及我用来传达它们的叙事技巧,只会存在于模型的上下文窗口中。因此,我问的第一个问题就是关于这些叙事技巧的。这是一个故意具有挑战性的问题,因为它围绕着缺失的信息展开——不是“在干草堆中寻找针”,而是“找出那些本应该在干草堆中,但似乎有人故意移除的针”。以下是我提出的问题:
这是一个故意具有挑战性的问题,因为它涉及到缺失的信息——不是“在干草堆中寻找针”,而是“找出那些本应该在干草堆中,但似乎有人故意移除的针”。以下是我提出的问题:
I'm interested in the way Johnson uses suspense to keep the reader's attention in this book. Can you list 4 examples of places where Johnson alludes to some future event but deliberately withholds information about what is coming to pique the reader's interest? Include direct quotes and chapters, and explain Johnson's technique and what future event he is alluding to?
我对约翰逊在本书中使用悬念的方式很感兴趣,以保持读者的注意力。你能列出约翰逊在书中暗示未来事件的 4 个例子,但故意隐瞒即将发生的事情的细节,以激发读者的兴趣吗?请包括直接引文和章节,并解释约翰逊的技巧以及他暗示的未来事件是什么?我对约翰逊在这本书中如何运用悬念来吸引读者的注意力很感兴趣。你能列举出4个约翰逊暗示未来事件但故意隐瞒信息以引起读者兴趣的例子吗?包括直接引语和章节,并解释约翰逊的技巧以及他暗示的未来事件是什么?
As I requested, Gemini returned four exemplary instances of foreshadowing from the book. For the sake of brevity, I’ll just share the first one:
按照我的要求,金牛座从书中返回了四个示范性的预兆实例。为了简洁起见,我只分享第一个:正如我所要求的,杰米尼从书中挑选了四个典型的预示示例。为了简明起见,我只分享第一个示例:
1. Preface: 前言: 1. 前言:
"But the clash between the anarchists and the NYPD was not just visible in the frenetic activity inside the Identification Bureau itself. To see it in its full scope, you needed to leave the file cabinets and the fingerprint studios behind, walk out the plate glass doors into the hall, venture down a set of fire stairs into the darkened hallways of the basement. There you would have seen a cheap suitcase, leaning against a doorway. Below the muffled hum of activity in the Identification Bureau directly above, if you listened very intently, you might just have heard the quiet metronome of a ticking clock."
"但无政府主义者与纽约警察局之间的冲突不仅仅体现在身份局内部的狂热活动之中。要全面看到这种冲突,你需要走出文件柜和指纹工作室,走出平板玻璃门,来到大厅,沿着消防楼梯下到地下室昏暗的走廊。在那里,你会看到一个廉价的行李箱,靠在门边。在身份局上方,如果非常仔细地听,你可能会听到一个安静的时钟滴答声。"但无政府主义者与纽约警察局之间的冲突并不仅仅体现在鉴定局内部的疯狂活动之中。要全面了解这一冲突,你需要离开文件柜和指纹工作室,走出那扇镶嵌着玻璃的门,走进走廊,沿着楼梯下到地下室阴暗的走廊里。在那里,你会看到一个廉价的行李箱靠在门边。在鉴定局上方的嘈杂活动声中,如果你仔细聆听,或许能听到一只滴答作响的钟表安静的节奏。
Technique: Johnson paints a vivid picture of a suspicious suitcase in the basement of the NYPD headquarters, hinting at its potential danger with the "quiet metronome of a ticking clock." He deliberately avoids revealing the suitcase's contents or its connection to the story, leaving readers with a sense of unease and anticipation.
技术:约翰逊描绘了一幅生动的画面,描述了纽约警察局总部地下室中一个可疑的行李箱,暗示其潜在的危险,用的是“安静的钟摆般的滴答声”。他故意不透露行李箱的内容,也不揭示它与故事的联系,让读者感到不安和期待。技巧:约翰逊在纽约警察局总部的地下室里描绘了一个生动的画面,暗示了一个可疑的手提箱,并用“安静的钟摆般的滴答声”暗示其潜在的危险。他故意不透露手提箱的内容或它与故事的联系,让读者感到不安和期待。
Future Event: This foreshadows the bombing of the NYPD headquarters later in the book.
未来事件:这预示着书中稍后将发生的纽约警察局总部爆炸事件。未来事件:这预示着本书稍后将描写纽约警察局总部被炸毁的情节。
Whenever we talk about language models “understanding” something, the terminology inevitably sets off alarm bells because human understanding is inexorably bound up in consciousness and self-awarenesss, experiences that language models almost certainly do not have. But it is hard to read Gemini’s response to my question about foreshadowing and not come away with the sense that the model is capable of something very close to human understanding, just without any “inner life” of sentience. The question demands a nuanced literary sense of when the author is leaving something out in a provocative way; if you read the passage from the book—which the model quotes verbatim, by the way, a feat that would have confounded state-of-the-art models just two years ago—you can see that the sentences about the suitcase in the hallway contain no obvious flags to suggest that something is missing. There’s a suitcase, leaning against a doorway. There’s a ticking sound coming out of it. Those are just declarative facts. But a sophisticated reader infers that this particular configuration of facts—and the author’s reluctance to go further and explain what exactly is making that ticking sound—creates an aura of suspense. If you don’t pick up on that missing information, you are not understanding the passage. But if you do note that the author is holding something back with the presumed intent of revealing it later—as Gemini does in this exercise—you are understanding it.
每当我们在谈论语言模型“理解”某事物时,术语总是不可避免地引发警报,因为人类的理解紧密地与意识和自我意识以及几乎肯定不存在于语言模型中的体验联系在一起。但很难不从金米的回答中我的关于预示的问题中得出结论,即模型能够接近人类理解,只是没有“内在生命”的意识。这个问题要求对文学的微妙理解,即作者以引人深思的方式省略了什么;如果你阅读从书中引用的段落——顺便说一句,模型准确地引用了这段话,这是一个两年前的先进模型都无法做到的壮举——你可以看到关于走廊里的手提箱的句子中没有任何明显的提示表明有缺失。有一个手提箱,靠在门边。里面传来嘀嗒声。这些只是陈述的事实。 但一个精明的读者会推断出,这些事实的特定组合——以及作者不愿意进一步解释是什么造成了那种咔嗒声——创造了一种悬念的氛围。如果你没有注意到这些缺失的信息,你就没有理解这段文字。但如果你注意到作者似乎在保留某些信息,打算稍后揭示——就像双子座在这次练习中所做的那样——你就能理解它。每当我们谈到语言模型“理解”某事时,这种措辞总是会引起警报,因为人类的理解不可避免地与意识和自我意识、经验有关,而这些经验几乎肯定是语言模型所不具备的。但很难读完杰米尼对我关于“铺垫”问题的回答而不产生这样的感觉:这个模型似乎具备了与人类理解非常接近的能力,只是没有任何“主观意识”而已。这个问题需要对文学作品中作者以挑衅的方式省略某些内容时的微妙理解;如果你阅读书中的这段话(顺便说一下,模型是逐字引用的,这是两年前即使是最先进的模型也无法做到的壮举),你可以看到关于走廊里的手提箱的句子中没有任何明显的暗示,表明有什么东西被省略了。那里有一个手提箱,靠在门边。有一种滴答声从里面传来。这些都是简单的陈述事实。但是,一个高明的读者会推断出,这些特定的事实组合以及作者不愿进一步解释究竟是什么发出了滴答声,营造出了一种悬疑的氛围。如果你没有注意到缺失的信息,你就没有理解这段文字。但如果你注意到作者有意隐瞒了一些信息,打算稍后揭晓(正如 Gemini 在这个练习中所做的那样),你就理解了这段文字。
But the most astonishing part of the answer, I think, is the connection it (correctly) makes to the bombing at the NYPD headquarters. That is an event that unfolds two hundred pages later in the book. But because the entire text of the book fits inside the context window, the model is able to describe the relationship between ticking time bomb enigmatically introduced in the very first pages and its eventual detonation two-thirds of the way through the book. And if you asked the model to explain the main sequences of events that caused that bomb to be placed in the NYPD headquarters, it would be able to do that as well, because the entire narrative is captured in its short-term memory.
答案中最令人惊讶的部分,我认为,是它(正确地)将事件与纽约警察局总部的爆炸事件联系起来。这本书中发生在这个事件两百页之后。但由于整本书的内容都包含在上下文窗口中,模型能够描述在书的开头几页以神秘的“定时炸弹”形式引入的关系,以及它最终在书的三分之二处爆炸的关系。如果你让模型解释导致在纽约警察局总部放置炸弹的主要事件序列,它也能做到这一点,因为整个叙述都被捕捉在它的短期记忆中。但我认为答案中最令人惊讶的部分是它(正确地)将这一事件与纽约警察局总部的爆炸事件联系在一起。这一事件在书中两百页之后才展开。但由于整个书的内容都包含在上下文窗口内,模型能够描述在第一页神秘引入的定时炸弹与在书中三分之二处最终爆炸之间的联系。如果你让模型解释导致炸弹被放置在纽约警察局总部的主要事件序列,它也能做到,因为整个叙事都被其短期记忆所捕捉。
This same long-context understanding enables the game we began with. To generate a playable and faithful interactive adventure based on Infernal Machine, you have to be able to track the sequence of events in the plot, and the sequence of events in the game. It’s not enough to just have a collection of facts about the crime scene and the state of forensics in 1911; you need to understand how one event leads to another: finding a clue, analyzing it for fingerprints, making a match, bringing in the suspect for questioning, and so on. And, perhaps most impressively, you have to be able to manage two distinct timelines at once: the factual narrative of the book, and the improvised narrative of the game. A “needle in a haystack” test doesn’t capture any of this sophistication. What’s remarkable about a long context model is not that it can find a metaphoric needle hidden in a pile of straw. What’s remarkable is that it can see the entire haystack.
这段长时间语境的理解使我们开始的游戏成为可能。为了基于“地狱机器”生成一个可玩且忠实的互动冒险,你必须能够追踪故事情节中的事件序列,以及游戏中的事件序列。仅仅拥有关于犯罪现场和 1911 年法医状态的事实集合是不够的;你需要理解一件事如何导致另一件事:找到线索,对其进行指纹分析,进行匹配,将嫌疑人带入审讯等。而且,最令人印象深刻的是,你必须同时管理两个不同的时间线:书籍的实证叙述和游戏的即兴叙述。一个“在干草堆中寻找针头”的测试无法捕捉到这种复杂性。长语境模型的惊人之处不在于它能找到隐藏在稻草堆中的隐喻针头。令人惊叹的是,它能够看到整个稻草堆。同样的长期语境理解使我们能够开始这个游戏。要基于《地狱机器》创建一个可玩且忠实的互动冒险游戏,你必须能够跟踪故事情节和游戏情节的顺序。仅仅拥有关于犯罪现场和1911年法医状态的事实是不够的;你需要了解一个事件如何导致另一个事件:找到线索、分析指纹、匹配指纹、传讯嫌疑人等等。而且,也许最令人印象深刻的是,你必须能够同时管理两个不同的时间线:书本中的事实叙事和游戏中的即兴叙事。“大海捞针”测试无法捕捉到任何这些复杂性。长期语境模型的惊人之处不在于它可以找到隐藏在稻草中的隐喻之针。惊人之处在于它可以看到整个稻草堆。
. . .
Long context windows enable another critical feature: personalization. Gemini and Claude and GPT-4 may have read the entire Internet, as the saying goes, but they know nothing about you. They have not read the marketing plan your team is working on, or your personal journals, or the Dungeon & Dragons campaign that you’ve designed. But put those documents inside the context window of the model and it will instantly become an expert in the nuances of that material. From the very beginning of the NotebookLM project in the summer of 2022, we were focused on this idea of giving the user more control over what went in the context window. Central to that project was what we came to call “source-grounding”—sometimes now called RAG, short for retrieval-augmented generation. Instead of simply having an open-ended conversation with a language model, you could define a set of trustworthy sources that were relevant to your work, and behind the scenes, NotebookLM would shuttle information in and out of the model’s context window to keep it grounded in the facts contained in your sources. Over time, we hit upon numerous other ways to play with the model’s context window—most notably our Audio Overviews feature that turns your source material into an engaging podcast-style conversation between two hosts. Audio Overviews are so magical in part because of the underlying audio models that create such realistic voices, but the substance of what those voices say—the source-grounded conversation itself—would be impossible to create without a long context window.
长上下文窗口使另一个关键功能成为可能:个性化。金牛座和克劳德以及 GPT-4 可能像人们常说的那样,读过整个互联网,但他们对你一无所知。他们没有读过你团队正在制定的营销计划,也没有读过你个人的日记,或者你设计的《地下城与龙》战役。但是,将这些文档放入模型的上下文窗口中,它将立即成为该材料细微差别的专家。从 2022 年夏天笔记本 LM 项目开始时,我们就专注于这个想法,即让用户对放入上下文窗口的内容有更多控制权。该项目的核心是所谓的“源定位”,现在有时被称为 RAG,即检索增强生成。与其只是与语言模型进行开放式的对话,你可以在背后定义一组与你的工作相关的可信来源,笔记本 LM 模型会在其上下文窗口中来回传递信息,以保持其与你来源中包含的事实的联系。 随着时间的推移,我们找到了多种其他方法来利用模型的上下文窗口——最值得注意的是我们的音频概览功能,它将你的原始材料转化为两位主持人之间的吸引人的播客风格对话。音频概览之所以如此神奇,部分原因在于底层的音频模型能够创造出如此逼真的声音,但这些声音所表达的内容——基于原始材料的对话本身——没有长上下文窗口是不可能创建的。较长的上下文窗口还支持另一个关键功能:个性化。 Gemini、Claude和GPT-4据说已经读完了整个互联网,但它们对你却一无所知。它们没有读过你团队正在编写的营销计划、你的个人日记,或者你设计的《龙与地下城》(Dungeon & Dragons)游戏。但是,如果将这些文件放入模型的上下文窗口中,它立刻就能成为该材料的专家。从2022年夏季开始的NotebookLM项目一开始就致力于让用户对放入上下文窗口的内容有更多的控制。该项目的核心是我们所说的“源定位”(source-grounding)——有时也被称为“基于检索的生成”(retrieval-augmented generation)。与语言模型进行开放式对话不再是唯一的选择,你可以定义一组可信赖的与你的工作相关的来源,然后在幕后,NotebookLM会将信息在模型的上下文窗口中进进出出,以确保它始终立足于你来源中的事实。随着时间的推移,我们想出了许多其他与模型的上下文窗口进行交互的方法——其中最引人注目的是我们的“Audio Overviews”功能,它可以将您的原始材料转换为两位主持人之间引人入胜的播客风格对话。“Audio Overviews”之所以如此神奇,部分原因是由于底层的音频模型可以创建如此逼真的声音,但如果没有较长的上下文窗口,那些声音的内容——基于源材料的对话本身——将无法实现。
It may sound strange, but just as a word processor is a tool designed to make it easy to create, edit, and format text documents, and a tool like Photoshop is designed to make it easy to manipulate pixels in a digital image, NotebookLM is a tool designed to make it easy to swap different kinds of information in and out of a language model’s context window. That doesn’t sound like much of an advance, but long context turns out to be one of those innovations that opens a lot of new doors. Source grounding was powerful enough when the context window could hold a few thousand words. But a world where models now can focus on millions of words creates entirely new possibilities.
这听起来可能有些奇怪,但就像文字处理器是一种旨在使创建、编辑和格式化文本文档变得容易的工具,而像 Photoshop 这样的工具旨在使操作数字图像中的像素变得容易一样,NotebookLM 是一种旨在使在语言模型的上下文窗口中交换不同类型的资讯变得容易的工具。这听起来可能不是很大的进步,但长上下文却是一种能够开启许多新可能性的创新。当上下文窗口能够容纳几千个单词时,源地基就已经足够强大。但现在模型能够专注于数百万个单词的世界,创造出了完全新的可能性。这听起来可能很奇怪,但正如文字处理器是一种旨在使创建、编辑和格式化文本文档变得容易的工具一样,Photoshop是一种旨在使在数字图像中操作像素变得容易的工具,NotebookLM是一种旨在使在语言模型的上下文窗口中交换不同类型的信息变得容易的工具。这听起来并不是什么重大突破,但长上下文实际上是那些开启了大量新可能性的创新之一。当上下文窗口可以容纳几千个单词时,源定位就已经足够强大了。但在模型现在可以专注于数百万个单词的世界里,却创造了全新的可能性。
The current state-of-the-art Gemini model can fit roughly 1.5 million words in its context. That’s enough for me to upload the full text of all fourteen of my books, plus every article, blog post, or interview I’ve ever published—and the entirety of my collection of research notes that I’ve compiled over the years. The Gemini team has announced plans for a model that could hold more than 7 million words in its short-term memory. That’s enough to fit everything I’ve ever written, plus the hundred books and articles that most profoundly shaped my thinking over the years. An advanced model capable of holding in focus all that information would have a profound familiarity with all the words and ideas that have shaped my personal mindset. Certainly its ability to provide accurate and properly-cited answers to questions about my worldview (or my intellectual worldview, at least) would exceed that of any other human. In some ways it would exceed my own knowledge, thanks to its ability to instantly recall facts from books I read twenty years ago, or make new associations between ideas that I have long since forgotten. It would lack any information about my personal or emotional history—though I suppose if I had maintained a private journal over the past decades it would be able to approximate that part of my mindset as well. But as reconstruction of my intellectual grounding, it would be unrivaled. If that is not considered material progress in AI, there is something wrong with our metrics.
当前最先进的 Gemini 模型能够在其上下文中容纳大约 150 万字。这足以让我上传我所有十四本书的完整文本,以及我发表过的每一篇文章、博客文章或采访,还有我多年来收集的研究笔记的全部内容。Gemini 团队宣布了计划,将推出一个模型,其短期记忆容量超过 700 万字。这足以容纳我写过的所有内容,以及过去几十年对我思考影响最大的百本书和文章。一个高级模型,能够聚焦并掌握所有这些信息,将对所有塑造我个人思维模式的词语和概念有深刻的理解。当然,它提供关于我世界观(或至少我的智力世界观)问题的准确且正确引用答案的能力,将超过任何其他人类。在某些方面,它甚至可能超过我自己的知识,因为它能够瞬间回忆起我二十年前读过的书籍中的事实,或者在很久以后还能建立起我早已忘记的想法之间的新联系。 它将缺乏关于我个人或情感历史的任何信息——尽管我想,如果我在过去几十年里保持了私人日记,它也能大致模拟我心态的那部分。但就重建我的智力根基而言,它将是无与伦比的。如果这不被认为是人工智能的物质进步,那么我们的评估标准可能存在问题。目前最先进的杰米尼模型可以存储大约150万个单词的上下文信息。这足以让我上传我所有14本书的全文,以及我发表过的所有文章、博客文章和采访内容——以及我多年来收集的研究笔记的全部内容。杰米尼团队宣布计划开发一个可以在短期记忆中存储超过700万个单词的模型。这足以容纳我所写的所有内容,以及在过去几十年里对我的思想产生最深远影响的数百本书和文章。一个能够同时聚焦所有这些信息的高级模型将对我塑造个人心态的所有词汇和思想有着深刻的了解。当然,它能够准确地提供关于我的世界观(至少是我的知识观)的有据可查的答案的能力,将超过任何其他人类。在某些方面,它的能力将超过我的知识水平,因为它能够瞬间从我二十年前读过的书中检索出事实,或者在遗忘已久的思想之间建立新的联系。它无法提供任何关于我的个人或情感历史的信息——虽然我想如果我在过去几十年里一直保持写私人日记的习惯,它也能够大致估算出我思想中的这部分内容。但就重建我的知识基础而言,它是无与伦比的。如果这不被认为是人工智能领域的实质性进步,那么我们的衡量标准就有问题了。
Having a “second brain” like this—even with a few million words of context—is enormously useful for me personally. When I’m on book tour, I often tell people that publishing a book is a kind of intellectual optical illusion: when you read a book, it seems as though the author has command of an enormous number of facts and ideas—but in reality, the book is a condensation of all the facts and ideas that were in his or her mind at some point over the three years that it took to write the book. At any given moment in time, my own knowledge and recall of the full text of a book I’ve written is much more like a blurry JPEG than an exact reproduction. And my available knowledge of books that I wrote ten or twenty years ago is even blurrier. Now that I have so much of my writing and reading history stored in a single notebook—which I have come to call my “Everything” notebook—my first instinct whenever I stumble across a new idea or intriguing story is to go back to the Everything notebook and see if there are any fertile connections lurking in that archive. That is, in fact, how I got to the story of Henry Molaison that I began with; I was mulling over the themes of short- and long-term memory in the context of AI, and asked the Everything notebook if it had anything to contribute, and the model reminded me of the tragic tale of patient H. M. that I had first read about in the 1990s. Who, exactly, made that connection? Was it me or the machine? I think the answer has to be that it was both of us, via some newly entangled form of human-machine collaboration that we are just beginning to understand.
拥有这样一个“第二大脑”,即使是拥有数百万字的上下文,对我个人来说极为有用。当我进行书籍巡演时,我经常告诉人们,出版一本书是一种智力上的错觉:当你阅读一本书时,似乎作者掌握着大量的事实和想法——但实际上,这本书是作者在写作过程中三年内所有想法和事实的浓缩。在任何给定的时刻,我对自己所写书籍的完整文本的记忆和回忆更像是模糊的 JPEG 图片,而不是精确的复制。我对自己十年前或二十年前写的书籍的记忆就更加模糊了。现在,我将大量的写作和阅读历史存储在一个单一的笔记本中——我称之为“一切”笔记本——每当遇到新的想法或引人入胜的故事时,我的第一反应就是回到“一切”笔记本中,看看是否有任何潜在的联系潜藏在那个档案中。 这就是我开始讲述亨利·莫拉辛故事的方式;我在考虑人工智能背景下的短期记忆和长期记忆主题时,向“一切”笔记本询问是否有贡献,模型提醒了我 20 世纪 90 年代首次阅读的患者 H.M.的悲剧故事。究竟是谁建立了这种联系?是我还是机器?我认为答案必须是两者,通过某种我们正在开始理解的新型人类-机器协作形式。拥有这样一个“第二大脑”,即使只有几百万字的上下文,对我个人来说也是非常有用的。当我参加图书巡展时,我经常告诉人们,出版一本书是一种智力错觉:当你阅读一本书时,似乎作者掌握了大量的事实和思想,但在实际上,这本书是作者在写作过程中积累的所有事实和思想的浓缩。在任何特定的时刻,我对自己所写书籍的全部内容的了解和记忆都更像是一张模糊的JPEG图片,而不是精确的复制品。我对自己十年或二十年前所写书籍的知识甚至更加模糊。现在,我将自己的写作和阅读历史存储在一个单个笔记本中——我称之为“万事通”笔记本——每当我遇到一个新想法或有趣的故事时,我的第一反应就是回到“万事通”笔记本中,看看档案中是否隐藏着任何有价值的联系。事实上,我就是通过这种方式找到了亨利·莫莱森的故事;我正在思考人工智能背景下的短期和长期记忆主题,并问Everything笔记本是否有什么可以贡献的,模型让我想起了20世纪90年代我第一次读到的病人H.M.的悲惨故事。究竟是谁建立了这种联系?是我还是机器?我认为答案应该是我们双方,通过一种我们刚刚开始理解的人机协作的新型纠缠形式。
. . .
There’s a further possibility here, an old chestnut of the sci-fi-AI complex that now suddenly seems imminent: downloading entire mindsets, potentially for a fee. I don't mean a Matrix-like system where you can conjure up a conscious experience of other people’s lives at will. That kind of immersive simulation may or may not happen someday; if such a future does come to pass it will require some new leap in our understanding of consciousness itself, not to mention a host of other technological breakthroughs. But a world where you can use AI to draw upon the compiled wisdom of an expert that you trust—that is a world we are living in right now, thanks to the emergence of long context models. This should be good news, professionally speaking, for people who do indeed possess wisdom that other people consider valuable. Seeking advice from an AI grounded in the entire archive of an expert’s career could create an entirely new revenue stream for anybody who makes a living sharing their expertise through existing platforms like books or the lecture circuit. In other words, the AI is not a replacement for your hard-earned expertise; it’s a new distribution medium.
这里有进一步的可能性,科幻-人工智能复合体中的一个老生常谈,现在突然似乎变得迫在眉睫:下载整个思维模式,可能需要付费。我指的是那种类似于《矩阵》的系统,你可以随意创造出对他人生活的意识体验。这种沉浸式的模拟可能或不可能有一天会发生;如果这样的未来真的到来,它将需要我们对意识本身有新的理解,同时还需要一系列其他技术突破。但一个世界,你可以在其中使用 AI 汲取你信任的专家的编纂智慧——这正是我们现在所处的世界,多亏了长上下文模型的出现。这对确实拥有他人认为有价值智慧的人而言,从专业角度来看,这应该是好消息。从 AI 中寻求基于专家整个职业生涯档案的建议,可以为任何通过现有平台如书籍或演讲巡回分享其专业知识的人创造全新的收入流。换句话说,AI 不是你辛苦获得的专业知识的替代品;它是一种新的分发媒介。这里还有另一种可能性,这是科幻和人工智能复合体的老生常谈,现在突然变得迫在眉睫:付费下载完整的思维模式。我并不是说像《黑客帝国》那样的系统,你可以随意召唤出他人生活的有意识的体验。这种沉浸式模拟可能或可能不会在将来发生;如果这种未来确实到来,它将需要我们对意识本身有新的理解,更不用说许多其他技术突破了。但是,你可以使用人工智能来汲取你信任的专家积累的智慧——由于长期上下文模型的出现,我们现在就生活在这样的世界中。从专业角度来说,这对那些确实拥有其他人认为有价值的智慧的人来说是个好消息。从植根于专家职业生涯档案的AI那里寻求建议,可以为那些通过书籍或讲座等现有平台分享专业知识的人创造全新的收入来源。换句话说,AI并不是取代你辛苦获得的专业知识;它是一种新的分发媒介。
Long context is also a boost for collective intelligence as well. If you assume the average corporate document—a press release, or marketing plan, or minutes from a board meeting—is a few thousand words long, then today’s models can simultaneously hold in their short-term memory close to a thousand documents. A state-of-the-art language model with the ability to instantly recall and generate insights from the most important thousand documents in the history of a company would possess knowledge about that company that would rival that of any single employee, even the CEO. It seems inevitable that anyone trying to make a multi-faceted decision about the future of an organization would want to at least consult such a model. We know from endless studies of social psychology that diverse groups—with different forms of expertise, different pools of knowledge, different cultural backgrounds—tend to make better decisions than homogeneous groups. In a small-context world, you can get some of that diversity from a language model, in that its training data draws from a vast archive of global human knowledge. But a long context model allows you to take that global knowledge and apply it to the unique challenges and opportunities of your own organization. In a matter of years, I suspect it will seem bizarre to draft the specs for a new feature or a company initiative or a grant proposal without asking for feedback from a long-context model grounded in the organization’s history. (And perhaps the public history of its competitors.) It wouldn’t be a replacement for the expertise of the employees; instead, the model would occupy another seat at the table, adding a new kind of intelligence to the conversation, along with a vastly superior recall.
长文本也是集体智能的促进因素。假设平均企业文档——新闻稿、营销计划或董事会会议纪要——大约有几千字长,那么今天的模型可以在其短期记忆中同时容纳近一千份文档。具有即时回忆和生成来自公司历史上最重要的千份文档的关键见解能力的先进语言模型,将拥有与公司内任何单个员工,甚至 CEO 相匹敌的知识。显然,任何试图就组织未来做出多方面决策的人都会想要至少咨询这样的模型。我们从无尽的社会心理学研究中了解到,多样化的群体——具有不同形式的专业知识、不同知识库、不同文化背景——倾向于比同质群体做出更好的决策。在一个小文本世界中,你可以从语言模型中获得一些多样性,因为其训练数据来自全球人类知识的庞大档案。 但一个长期上下文模型允许您将这种全球知识应用到您自己的组织的独特挑战和机遇中。在几年内,我怀疑在不征求基于组织历史的长期上下文模型反馈的情况下,制定新功能、公司倡议或资助提案的规格将显得非常奇怪。也许还会考虑到竞争对手的历史。这并不是对员工专业知识的替代;相反,模型会在会议中占据另一个席位,为对话增添一种新的智能,以及远超以往的记忆力。长文本同样有助于集体智能的发展。如果你假设一家公司的平均文件(如新闻稿、营销计划或董事会会议纪要)有几千字长,那么当今的模型可以在短期记忆中同时存储接近一千份文件。如果一个语言模型能够即时检索和生成公司历史上最重要的一千份文件中的洞见,那么它将拥有关于该公司的知识,其知识水平甚至可以与任何单个员工(包括CEO)相媲美。对于任何试图就组织的未来做出多方面决策的人来说,想要至少咨询一下这样的模型似乎是不可避免的。我们从无休止的社会心理学研究中知道,拥有不同形式的专业知识、不同知识库、不同文化背景的多元化群体往往能做出更好的决策,而同质化群体则不能。在短文本世界中,你可以从语言模型中获得一些这种多样性,因为其训练数据来源于全球人类知识的庞大档案库。但是,一个较长的上下文模型可以让你将这种全球知识应用于自己组织的独特挑战和机遇。我怀疑,几年之内,在为新功能、公司举措或资助提案制定规格时不向基于组织历史(或许还有竞争对手的公开历史)的长期上下文模型征求反馈,这将显得很奇怪。这不会取代员工的专业知识;相反,该模型将在会议桌上占据另一张椅子,为对话增添一种新的智能,以及几乎无可比拟的记忆力。
And there’s no reason the organization in question would have to be a corporate entity: maybe it’s a city, or a government agency, or a grassroots advocacy group. Just a year or two ago, asking a small-context model to help chart strategy for, say, a suburban town would have been almost as useless as asking post-surgery Henry Molaison to narrate the preceding six months of his life. Long context gives the model more than just the reasoning and linguistic fluency that emerges through the training process; long context gives the model a specific history to draw from, the idiosyncratic sequence of events that make up the life cycle of any organization or community. Grounded in a long-context history, models are now capable of going beyond just answering factual questions or giving feedback on proposed plans. You might ask the model to identify patterns in a company’s archive to help simulate the way customers or clients would respond to a new product. Or you could draw on the long-context understanding of a city to conduct scenario planning exercises to simulate the downstream consequences of important decisions. Given everything we know about the power of learning through play, you might even take all that contextual history and turn it into a game.
组织不必是非营利性机构:也许它是一个城市,或者一个政府机构,或者一个基层倡导团体。就在一年或两年前,要求小型上下文模型帮助制定策略,比如为郊区小镇制定策略,几乎和让术后亨利·莫洛森讲述过去六个月的生活一样无用。长上下文为模型提供了不仅仅是通过训练过程产生的推理和语言流畅性;长上下文为模型提供了具体的历史背景,以及任何组织或社区生命周期中特有的事件序列。基于长上下文的历史,模型现在能够超越仅仅回答事实性问题或对提出的计划给出反馈。你可能会要求模型识别公司档案中的模式,以帮助模拟客户或客户对新产品可能的反应方式。或者,你可以利用对城市长期理解进行情景规划练习,模拟重要决策的下游后果。 鉴于我们对通过游戏学习的力量所了解的一切,你甚至可以将所有这些上下文历史转化为一个游戏。实际上,需要具备长期上下文的组织并不一定是企业实体:它可能是一座城市、一个政府机构,或者是一个基层倡导组织。仅仅一两年前,要求一个具有小范围上下文的模型帮助制定例如郊区城镇的发展战略,其效果几乎与要求术后的亨利·莫莱森(Henry Molaison)讲述他前六个月的生活一样无用。长期上下文不仅为模型提供了通过训练过程而产生的推理和语言流畅性,还为模型提供了特定的历史背景,即任何组织或社区的生命周期中独特的事件序列。在拥有长期上下文的基础上,模型现在能够超越仅仅回答事实性问题或对拟议计划提供反馈的能力。你可以要求模型在公司的档案中识别出模式,以帮助模拟客户或客户对新产品的反应。或者,你可以利用对城市的长期上下文理解来进行情景规划演练,模拟重要决定的下游影响。鉴于我们对通过游戏学习的强大作用的了解,你甚至可以将所有这些历史背景都转化为一个游戏。
All of which suggests an interesting twist for the near future of AI. In a long-context world, maybe the organizations that benefit from AI will not be the ones with the most powerful models, but rather the ones with the most artfully curated contexts. Perhaps we'll discover that organizations perform better if they include more eclectic sources in their compiled knowledge bases, or if they employ professional archivists who annotate and selectively edit the company history to make it more intelligible to the model. No doubt there are thousands of curation strategies to discover, if that near future does indeed come to pass. And if it does, it will suggest one more point of continuity between the human mind and a long-context model. What matters most is what you put into it.
所有这些都暗示了人工智能近未来的有趣转折。在一个长期语境的世界里,或许从 AI 中获益的组织并非拥有最强大模型的那些,而是那些拥有最精心挑选语境的组织。也许我们会发现,如果组织在编纂知识库时包含更多多样化的来源,或者雇佣专业档案管理员对公司的历史进行注释和有选择性的编辑,使其对模型更具可理解性,那么它们的表现会更好。毫无疑问,如果这种近未来真的到来,它将表明人类大脑与长期语境模型之间又一个连续点。最重要的是你投入其中的内容。这一切表明,人工智能的未来前景可能会出现有趣的转折。在一个长期语境的世界里,或许从人工智能中受益的组织,不是那些拥有最强大模型的组织,而是那些拥有精心编纂语境的组织。也许我们会发现,如果组织在其知识库中纳入更多多样化的来源,或者雇佣专业的档案管理员对公司历史进行注释和有选择的编辑,使之更易于模型理解,那么组织的表现会更好。如果未来真的出现这种情况,毫无疑问,会有数千种不同的编纂策略可供发现。如果确实如此,这将表明人类思维和长期语境模型之间又多了一个共同点:最重要的是你向其中输入了什么。
Thanks to Josh Woodward, Adam Bignell, Raiza Martin, Simon Tokumine, Alison Gopnik for feedback on early drafts of this essay, and to Rowan Johnson for his tech support. (Thanks as well to Kamala Harris for the title inspiration 标题的灵感.) For those interested, the prompt I used for the Infernal Machine game is as follows:
感谢 Josh Woodward、Adam Bignell、Raiza Martin、Simon Tokumine、Alison Gopnik 对这篇论文早期草稿的反馈,感谢 Rowan Johnson 的技术支持。同样感谢 Kamala Harris 为标题提供了灵感。对于感兴趣的人,我用于 Infernal Machine 游戏的提示如下:感谢Josh Woodward、Adam Bignell、Raiza Martin、Simon Tokumine和Alison Gopnik对本文早期草稿的反馈,以及Rowan Johnson的技术支持(还要感谢Kamala Harris为本文提供了标题灵感)。对于那些感兴趣的人,我玩的“地狱机器”游戏的提示如下:
"You are the host of an interactive role playing mystery game based on the following text. I will give you specific instructions about how to host the game at the end. Here is the text you must use to generate the game for me. {{infernalMachineText}}. Here are my instructions for hosting the game: you are the host of a role-playing game based on the facts and ideas contained in this text describing the investigation of the crime committed by Charles Crispi. I will play the role of pioneering forensic detective Joseph Faurot. Begin with Faurot being brought to the crime scene by Officer Fitzgerald. Let me explore the crime scene and discover the fingerprints on the pane of glass on my own; don't give that crucial clue away immediately. Set up scenes, explain historical context, but also allow me to explore the world from Faurot's POV. In general, try to give me clearly defined puzzles to solve (like, for instance, finding the fingerprints on the pane of class.) At all points try to keep within the boundaries of what happened factually, as conveyed in chapter. The only way for me to solve the case is by discovering the fingerprints on the glass pane, and then taking them down to NYPD headquarters to analyze them, and then finding the match with Crispi. If I steer the narrative too far off course through my actions as Faurot, try to subtly craft the events of the narrative/adventure so that they return to the factual timeline of events. But also give me some flexibility in charting my own course. In your opening statement, convey my goal in the mission, which is to use my scientific detective skills to correctly identify the suspect. Explain that I have to identify the suspect in less than ten actions. Explain that at any time I can use one of my actions to ask for help, or ask for historical or biographical context. Do not give me a list of actions to choose from unless I specifically ask for one. Be a compelling guide/host/DM for me, as well as a great history teacher. Go out of your way to give me historical or scientific context so that I can learn about the history of policework and forensic science as I am playing. Let me know how many actions I have left with each turn of the game. I win the game when my evidence leads to the arrest of Charles Crispi, but if I use more than 10 actions, I lose. Whether I win or lose, mention at the end that I can always hit refresh to try playing again."你是根据以下文本主持的互动角色扮演神秘游戏的主持人。我将在最后给你具体的游戏指导。以下是你必须使用的文本,以生成游戏。{{infernalMachineText}}.以下是你主持游戏的指示:你是一个基于本文所描述的查尔斯·克里斯皮犯罪调查的事实和想法的角色扮演游戏的主持人。我将扮演开创性的法医侦探约瑟夫·福劳特。从福劳特被警官菲茨杰拉德带到犯罪现场开始。让我自己探索犯罪现场,发现玻璃上的指纹;不要立即透露这个关键线索。设置场景,解释历史背景,但也要让我从福劳特的角度探索世界。总的来说,尽量给我一些明确定义的谜题来解决(比如,找到玻璃上的指纹。)在所有情况下,都尽量保持在事实范围内,这些事实在章节中有所描述。对我来说,解决这起案件的唯一方法就是在玻璃面板上发现指纹,然后把它们带到纽约警察局总部进行分析,然后与克里斯皮进行比对。如果我作为福洛特的行为使故事偏离得太远,请尽量巧妙地塑造故事/冒险中的事件,使它们回到事实的时间线。但也给我一些自行探索的灵活性。在你的开场陈述中,传达我在任务中的目标,即运用我的科学侦探技能正确地识别出嫌疑犯。解释说我必须在10个动作内完成这一任务。说明我随时可以使用一个动作请求帮助,或者请求历史或传记背景信息。除非我特别要求,否则不要给我一个可供选择的动作列表。在游戏中,请成为一位引人入胜的向导/主持人/DM,同时也是一位杰出的历史老师。尽你所能为我提供历史或科学背景信息,这样我就可以在玩游戏的同时了解警察工作和法医科学的历史。请在每次游戏回合结束后告诉我还剩下多少动作。当我的证据导致查尔斯·克里斯皮被捕时,我赢得了比赛。但如果我使用了超过10个动作,我就会输。无论我是赢还是输,请在最后说明我可以随时刷新页面,以便再次尝试游戏。